=Start=
缘由:
前几天在朋友圈中看到文章『Google新书:《构建安全可靠的系统》』了解到Google又发福利了——Google的SRE分享了设计、实现和维护一个安全的系统的一些最佳实践,同时提供了免费的 PDF、EPUB 和 MOBI 版本可供下载。
PDF版本有557页,内容很多,需要慢慢阅读、学习和体会。这里先根据目录结构大致整理出全书的内容结构和轮廓,方便后续不断补充和回顾。
正文:
参考解答:
构建安全可靠的系统 — 设计、实现和维护系统的最佳实践
全书一共 5 个部分,共 21 章。
第一部分 入门材料 (2章)
- The Intersection of Security and Reliability (安全性和可靠性的交叉点)
- Understanding Adversaries (了解对手)
第二部分 设计系统 (8章)
- Case Study: Safe Proxies (案例研究:安全代理)
- Design Tradeoffs (设计权衡)
- Design for Least Privilege (最小权限设计)
- Design for Understandability (可理解性设计)
- Design for a Changing Landscape (不断变化的全景设计)
- Design for Resilience (弹性设计)
- Design for Recovery (恢复设计)
- Mitigating Denial-of-Service Attacks (缓解DoS攻击)
第三部分 实施系统 (5章)
- Case Study: Designing, Implementing, and Maintaining a Publicly Trusted CA (案例研究:设计、实施和维护一个公开可信的CA)
- Writing Code (编写代码)
- Testing Code (测试代码)
- Deploying Code (部署代码)
- Investigating Systems (调查系统)
第四部分 维护系统 (3章)
- Disaster Planning (灾难计划)
- Crisis Management (危机管理)
- Recovery and Aftermath (恢复和后果)
第五部分 组织和文化 (3章)
- Case Study: Chrome Security Team (案例研究:Chrome安全团队)
- Understanding Roles and Responsibilities (理解角色和责任)
- Building a Culture of Security and Reliability (建立安全和可靠性文化)
Part I. Introductory Material (入门材料)
1. The Intersection of Security and Reliability (安全性和可靠性的交叉点)
On Passwords and Power Drills (关于密码和电钻)
Reliability Versus Security: Design Considerations (可靠性与安全性:设计注意事项)
Confidentiality, Integrity, Availability (机密性、完整性和可用性)
Confidentiality (机密性)
Integrity (完整性)
Availability (可用性)
Reliability and Security: Commonalities (可靠性与安全性:共性)
Invisibility (不可见性)
Assessment (评估)
Simplicity (简单)
Evolution (演化)
Resilience (弹性)
From Design to Production (从设计到生产)
Investigating Systems and Logging (调查系统和记录)
Crisis Response (危机应对)
Recovery (恢复)
Conclusion (结论)
2. Understanding Adversaries (了解对手)
Attacker Motivations (攻击者的动机)
Attacker Profiles (攻击者的画像)
Hobbyists (业余爱好者)
Vulnerability Researchers (漏洞研究员)
Governments and Law Enforcement (政府和执法者)
Activists (活动家)
Criminal Actors (犯罪演员)
Automation and Artificial Intelligence (自动化和人工智能)
Insiders (内鬼)
Attacker Methods (攻击手法)
Threat Intelligence (威胁情报)
Cyber Kill Chains (网络杀伤链)
Tactics, Techniques, and Procedures (TTP, 战术技术和过程)
Risk Assessment Considerations (风险评估注事事项)
Conclusion (结论)
Part II. Designing Systems (设计系统)
3. Case Study: Safe Proxies (案例研究:安全代理)
Safe Proxies in Production Environments (生产环境中的安全代理)
Google Tool Proxy (Google工具代理)
Conclusion (结论)
4. Design Tradeoffs (设计权衡)
Design Objectives and Requirements (设计目标和要求)
Feature Requirements (功能要求)
Nonfunctional Requirements (非功能性要求)
Features Versus Emergent Properties (功能和突发事项)
Example: Google Design Document (示例:Google设计文档)
Balancing Requirements (平衡要求)
Example: Payment Processing (示例:付款处理)
Managing Tensions and Aligning Goals (处理紧张局势和调整目标)
Example: Microservices and the Google Web Application Framework (示例:微服务和Google的Web应用框架)
Aligning Emergent-Property Requirements (对齐突发事项的需求)
Initial Velocity Versus Sustained Velocity (初始速度和持续速度)
Conclusion (结论)
5. Design for Least Privilege (最小权限设计)
Concepts and Terminology (概念和术语)
Least Privilege (最小权限)
Zero Trust Networking (零信任网络)
Zero Touch (零接触)
Classifying Access Based on Risk (根据风险对访问进行分类)
Best Practices (最佳实践)
Small Functional APIs (小型功能性API)
Breakglass (走特批流程)
Auditing (审计)
Testing and Least Privilege (测试和最小权限)
Diagnosing Access Denials (诊断访问拒绝)
Graceful Failure and Breakglass Mechanisms (优雅的失败和breakglass机制)
Worked Example: Configuration Distribution (工作示例:配置分发)
POSIX API via OpenSSH (基于OpenSSH的POSIX的API)
Software Update API (软件更新API)
Custom OpenSSH ForceCommand (自定义OpenSSH的ForceCommand)
Custom HTTP Receiver (Sidecar) (自定义HTTP接收器,边车模式)
Custom HTTP Receiver (In-Process) (自定义HTTP接收器,直通模式)
Tradeoffs (权衡)
A Policy Framework for Authentication and Authorization Decisions (认证和授权决策的策略框架)
Using Advanced Authorization Controls (使用高级认证控制)
Investing in a Widely Used Authorization Framework (选择广泛使用的授权框架)
Avoiding Potential Pitfalls (避免潜在的陷阱)
Advanced Controls (高级控制)
Multi-Party Authorization (MPA) (多方授权)
Three-Factor Authorization (3FA) (三因素认证)
Business Justifications (商业理由)
Temporary Access (临时访问)
Proxies (代理)
Tradeoffs and Tensions (权衡与紧张)
Increased Security Complexity (持续增加的安全复杂性)
Impact on Collaboration and Company Culture (对合作和公司文化的影响)
Quality Data and Systems That Impact Security (影响安全性的质量数据和系统)
Impact on User Productivity (对用户生产力的影响)
Impact on Developer Complexity (对开发人员复杂度的影响)
Conclusion (结论)
6. Design for Understandability (可理解性设计)
Why Is Understandability Important? (为什么可理解性如此重要)
System Invariants (系统不变式)
Analyzing Invariants (分析不变式)
Mental Models (心智模式)
Designing Understandable Systems (设计可理解的系统)
Complexity Versus Understandability (复杂性和可理解性)
Breaking Down Complexity (打破复杂性)
Centralized Responsibility for Security and Reliability Requirements (要求安全性和可靠性的集中责任)
System Architecture (系统架构)
Understandable Interface Specifications (可理解的接口规范)
Understandable Identities, Authentication, and Access Control (可理解的身份、认证和访问控制)
Security Boundaries (安全边界)
Software Design (软件设计)
Using Application Frameworks for Service-Wide Requirements (使用应用框架满足服务需求)
Understanding Complex Data Flows (理解复杂的数据流)
Considering API Usability (考虑API的可用性)
Conclusion (结论)
7. Design for a Changing Landscape (不断变化的全景设计)
Types of Security Changes (安全变更的类型)
Designing Your Change (设计变更)
Architecture Decisions to Make Changes Easier (简化变更的架构决策)
Keep Dependencies Up to Date and Rebuild Frequently (保持依赖关系更新和经常性的重建)
Release Frequently Using Automated Testing (频繁使用自动化测试发布)
Use Containers (使用容器)
Use Microservices (使用微服务)
Different Changes: Different Speeds, Different Timelines (不同的变化:不同的速度,不同的时间线)
Short-Term Change: Zero-Day Vulnerability (短期改变:0day漏洞)
Medium-Term Change: Improvement to Security Posture (中期改变:安全态势的改善)
Long-Term Change: External Demand (长期改变:外部需求)
Complications: When Plans Change (并发症:当计划发生改变)
Example: Growing Scope—Heartbleed (示例:不断扩大的范围以致陷入困境)
Conclusion (结论)
8. Design for Resilience (弹性设计)
Design Principles for Resilience (弹性设计的原则)
Defense in Depth (深度防御)
The Trojan Horse (特洛伊木马)
Google App Engine Analysis (Google App engine分析)
Controlling Degradation (控制降级)
Differentiate Costs of Failures (差异化的失败成本)
Deploy Response Mechanisms (部署响应机制)
Automate Responsibly (负责任的自动化)
Controlling the Blast Radius (控制爆炸半径)
Role Separation (角色分离)
Location Separation (位置分离)
Time Separation (时间分离)
Failure Domains and Redundancies (失败域和冗余)
Failure Domains (失败域)
Component Types (组件类型)
Controlling Redundancies (控制冗余)
Continuous Validation (持续验证)
Validation Focus Areas (验证重点领域)
Validation in Practice (实践验证)
Practical Advice: Where to Begin (实用建议:从哪里开始)
Conclusion (结论)
9. Design for Recovery (恢复设计)
What Are We Recovering From? (我们从哪开始恢复)
Random Errors (随机错误)
Accidental Errors (意外错误)
Software Errors (软件错误)
Malicious Actions (恶意行为)
Design Principles for Recovery (恢复的设计原则)
Design to Go as Quickly as Possible (Guarded by Policy) (受政策保护的:尽快恢复)
Limit Your Dependencies on External Notions of Time (限制对外部时间观念的依赖)
Rollbacks Represent a Tradeoff Between Security and Reliability (回滚呈现了安全性和可靠性之间的权衡)
Use an Explicit Revocation Mechanism (使用显示吊销机制)
Know Your Intended State, Down to the Bytes (知道你的预期状态,细到字节粒度)
Design for Testing and Continuous Validation (可测试和持续性验证的设计)
Emergency Access (紧急通道)
Access Controls (访问控制)
Communications (通讯)
Responder Habits (响应者的习惯)
Unexpected Benefits (意外的好处)
Conclusion (结论)
10. Mitigating Denial-of-Service Attacks (缓解DOS攻击)
Strategies for Attack and Defense (攻击和防御的策略)
Attacker’s Strategy (攻击者的策略)
Defender’s Strategy (防御者的策略)
Designing for Defense (防御设计)
Defendable Architecture (可防御的架构)
Defendable Services (可防御的服务)
Mitigating Attacks (缓解攻击)
Monitoring and Alerting (监控和报警)
Graceful Degradation (优雅降级)
A DoS Mitigation System (一个DoS缓解系统)
Strategic Response (策略响应)
Dealing with Self-Inflicted Attacks (处理自己造成的攻击)
User Behavior (用户行为)
Client Retry Behavior (客服端重试行为)
Conclusion (结论)
Part III. Implementing Systems (实施系统)
11. Case Study: Designing, Implementing, and Maintaining a Publicly Trusted CA (案例研究:设计、实现和维护一个公开可信的CA)
Background on Publicly Trusted Certificate Authorities (关于公开可信CA的背景介绍)
Why Did We Need a Publicly Trusted CA? (为什么我们需要一个公开可信的CA)
The Build or Buy Decision (构建或购买决策)
Design, Implementation, and Maintenance Considerations (设计、实现和维护的注意事项)
Programming Language Choice (编程语言的选择)
Complexity Versus Understandability (复杂性和可理解性)
Securing Third-Party and Open Source Components (保护第三方和开源组件)
Testing (测试)
Resiliency for the CA Key Material (CA关键材料的弹性)
Data Validation (数据验证)
Conclusion (结论)
12. Writing Code (编写代码)
Frameworks to Enforce Security and Reliability (增强安全性和可靠性的框架)
Benefits of Using Frameworks (使用框架的好处)
Example: Framework for RPC Backends (示例:RPC后端框架)
Common Security Vulnerabilities (常见安全漏洞)
SQL Injection Vulnerabilities: TrustedSqlString (SQL注入:可信任的SQL字符串)
Preventing XSS: SafeHtml (防御XSS:SafeHtml)
Lessons for Evaluating and Building Frameworks (评估和构建框架的经验教训)
Simple, Safe, Reliable Libraries for Common Tasks (用于常见任务的简单、安全、可靠的库)
Rollout Strategy (推广策略)
Simplicity Leads to Secure and Reliable Code (简单从而保证安全和可靠的代码)
Avoid Multilevel Nesting (避免多层嵌套)
Eliminate YAGNI Smells (消除YAGNI气味)
Repay Technical Debt (偿还技术债)
Refactoring (重构)
Security and Reliability by Default (默认安全和可靠)
Choose the Right Tools (选择正确的工具)
Use Strong Types (使用强类型语言)
Sanitize Your Code (净化你的代码)
Conclusion (结论)
13. Testing Code (测试代码)
Unit Testing (单元测试)
Writing Effective Unit Tests (编写有效的单元测试)
When to Write Unit Tests (什么时候编写单元测试)
How Unit Testing Affects Code (单元测试是如何影响代码的)
Integration Testing (集成测试)
Writing Effective Integration Tests (编写有效的集成测试)
Dynamic Program Analysis (动态代码分析)
Fuzz Testing (模糊测试)
How Fuzz Engines Work (模糊测试引擎是如何工作的)
Writing Effective Fuzz Drivers (编写有效的模糊测试驱动程序)
An Example Fuzzer (一个模糊测试程序的例子)
Continuous Fuzzing (持续模糊测试)
Static Program Analysis (静态代码分析)
Automated Code Inspection Tools (自动化代码检查工具)
Integration of Static Analysis in the Developer Workflow (在开发工作流中集成静态代码分析)
Abstract Interpretation (抽象的解释)
Formal Methods (正式的方法)
Conclusion (结论)
14. Deploying Code (部署代码)
Concepts and Terminology (概念和术语)
Threat Model (威胁模型)
Best Practices (最佳实践)
Require Code Reviews (需要代码审查)
Rely on Automation (依赖于自动化)
Verify Artifacts, Not Just People (验证artifacts而不仅仅是人)
Treat Configuration as Code (将配置和代码等同视之)
Securing Against the Threat Model (基于威胁模型做加固)
Advanced Mitigation Strategies (高级缓解策略)
Binary Provenance (二进制来源验证)
Provenance-Based Deployment Policies (Provenance-Based发布策略)
Verifiable Builds (可校验的构建)
Deployment Choke Points (部署卡点)
Post-Deployment Verification (部署后的验证)
Practical Advice (实用的建议)
Take It One Step at a Time (一步一步来)
Provide Actionable Error Messages (提供可操作的错误信息)
Ensure Unambiguous Provenance (确保来源没有问题)
Create Unambiguous Policies (创建没有歧义的政策)
Include a Deployment Breakglass (包括一个部署特批流程)
Securing Against the Threat Model, Revisited (回顾基于威胁模型做的加固)
Conclusion (结论)
15. Investigating Systems (调查系统)
From Debugging to Investigation (从调试到调查)
Example: Temporary Files (示例:临时文件)
Debugging Techniques (调试技术)
What to Do When You’re Stuck (当你被困住的时候该做些什么)
Collaborative Debugging: A Way to Teach (协作调试:一种教学方法)
How Security Investigations and Debugging Differ (安全调查和调试有何不同)
Collect Appropriate and Useful Logs (收集合适和有用的日志)
Design Your Logging to Be Immutable (将你的日志系统设计为不可修改的)
Take Privacy into Consideration (将隐私纳入考虑范围)
Determine Which Security Logs to Retain (决定要保留哪些安全日志)
Budget for Logging (日志的预算)
Robust, Secure Debugging Access (健壮、安全的调试访问)
Reliability (可靠性)
Security (安全性)
Conclusion (结论)
Part IV. Maintaining Systems (维护系统)
16. Disaster Planning (灾难计划)
Defining “Disaster” (定义灾难)
Dynamic Disaster Response Strategies (动态灾难响应策略)
Disaster Risk Analysis (灾难风险分析)
Setting Up an Incident Response Team (成立事件响应团队)
Identify Team Members and Roles (确定团队成员和角色)
Establish a Team Charter (建立团队章程)
Establish Severity and Priority Models (建立严重性和优先级模型)
Define Operating Parameters for Engaging the IR Team (明确和IR团队合作的操作参数)
Develop Response Plans (制定响应计划)
Create Detailed Playbooks (创建详细的剧本)
Ensure Access and Update Mechanisms Are in Place (确保访问和更新机制到位)
Prestaging Systems and People Before an Incident (在事件发生前预先准备好系统和人员)
Configuring Systems (配置系统)
Training (培训)
Processes and Procedures (过程和程序)
Testing Systems and Response Plans (测试系统和响应计划)
Auditing Automated Systems (审计自动化系统)
Conducting Nonintrusive Tabletops (实施非侵入式桌面)
Testing Response in Production Environments (在生产环境中测试响应)
Red Team Testing (红队测试)
Evaluating Responses (评估响应)
Google Examples (Google的例子)
Test with Global Impact (具有全球影响的测试)
DiRT Exercise Testing Emergency Access (DiRT演习测试应急通道)
Industry-Wide Vulnerabilities (行业级别的漏洞)
Conclusion (结论)
17. Crisis Management (危机管理)
Is It a Crisis or Not? (这是一场危机吗)
Triaging the Incident (对事件进行分类)
Compromises Versus Bugs (妥协于错误)
Taking Command of Your Incident (掌控事件)
The First Step: Don’t Panic! (第一步:不要惊慌)
Beginning Your Response (开始回应)
Establishing Your Incident Team (建立事件响应团队)
Operational Security (运营安全)
Trading Good OpSec for the Greater Good (以良好的运营安全换取更大的成果)
The Investigative Process (调查过程)
Keeping Control of the Incident (控制事件)
Parallelizing the Incident (并行化事件)
Handovers (交接)
Morale (士气)
Communications (沟通)
Misunderstandings (误解)
Hedging (限制)
Meetings (会议)
Keeping the Right People Informed with the Right Levels of Detail (让正确的人了解合适的细节)
Putting It All Together (放在一起)
Triage (分类)
Declaring an Incident (宣布事件)
Communications and Operational Security (沟通和操作安全)
Beginning the Incident (开始事件)
Handover (移交)
Handing Back the Incident (归还事件)
Preparing Communications and Remediation (准备沟通和补救)
Closure (关闭)
Conclusion (结论)
18. Recovery and Aftermath (恢复和后果)
Recovery Logistics (恢复逻辑)
Recovery Timeline (恢复时间线)
Planning the Recovery (规划恢复)
Scoping the Recovery (确定要恢复的范围)
Recovery Considerations (恢复注意事项)
Recovery Checklists (恢复检查清单)
Initiating the Recovery (启动恢复)
Isolating Assets (Quarantine) (隔离资产)
System Rebuilds and Software Upgrades (系统重建和软件升级)
Data Sanitization (数据消毒)
Recovery Data (恢复数据)
Credential and Secret Rotation (凭证和秘钥轮换)
After the Recovery (恢复之后)
Postmortems (尸检)
Examples (示例)
Compromised Cloud Instances (被攻陷的云实例)
Large-Scale Phishing Attack (大规模钓鱼攻击)
Targeted Attack Requiring Complex Recovery (有针对性的攻击需要复杂的恢复)
Conclusion (结论)
Part V. Organization and Culture (组织和文化)
19. Case Study: Chrome Security Team (案例研究:Chrome安全团队)
Background and Team Evolution (背景和团队发展)
Security Is a Team Responsibility (安全是团队的责任)
Help Users Safely Navigate the Web (帮助用户安全的浏览网页)
Speed Matters (速度问题)
Design for Defense in Depth (纵深防御设计)
Be Transparent and Engage the Community (透明化并与社区互动)
Conclusion (结论)
20. Understanding Roles and Responsibilities (理解角色和责任)
Who Is Responsible for Security and Reliability? (谁应该为安全性和可靠性负责)
The Roles of Specialists (专家的角色)
Understanding Security Expertise (了解安全专业知识)
Certifications and Academia (认证和学术界)
Integrating Security into the Organization (将安全整合到组织中)
Embedding Security Specialists and Security Teams (内嵌安全专家和安全团队)
Example: Embedding Security at Google (示例:在Google中嵌入安全性)
Special Teams: Blue and Red Teams (特殊团队:蓝队和红队)
External Researchers (外部研究人员)
Conclusion (结论)
21. Building a Culture of Security and Reliability (建立安全和可靠性的文化)
Defining a Healthy Security and Reliability Culture (定义一个健康的安全和可靠性文化)
Culture of Security and Reliability by Default (默认安全和可靠的文化)
Culture of Review (反思回顾的文化)
Culture of Awareness (意识文化)
Culture of Yes (Yes文化)
Culture of Inevitably (不可避免的文化)
Culture of Sustainability (可持续发展的文化)
Changing Culture Through Good Practice (通过良好实践改变文化)
Align Project Goals and Participant Incentives (对齐项目目标和参与者激励)
Reduce Fear with Risk-Reduction Mechanisms (通过风险减轻机制减少恐惧)
Make Safety Nets the Norm (让安全网成为规范)
Increase Productivity and Usability (提高生产力和可用性)
Overcommunicate and Be Transparent (过度沟通并保持透明)
Build Empathy (建立同理心)
Convincing Leadership (令人信服的领导)
Understand the Decision-Making Process (了解决策过程)
Build a Case for Change (说明变化的原因)
Pick Your Battles (选择你的战斗)
Escalations and Problem Resolution (升级和问题解决)
Conclusion (结论)
Appendix. A Disaster Risk Assessment Matrix (附录:一个灾难风险评估矩阵)
参考链接:
https://security.googleblog.com/2020/04/introducing-our-new-book-building.html
Google新书:《构建安全可靠的系统》
https://mp.weixin.qq.com/s/HztqUAeAfuobvXzOfZ6CFA
=END=
《“[read]构建安全可靠的系统(Building Secure & Reliable Systems)-outline”》 有 6 条评论
Software Solutions: Breakglass
https://rsmpartners.com/Mainframe-Security.Software-Solutions.Breakglass.html
`
Breakglass provides temporary emergency access control in a fully secured and audited manner. Different user groups can request temporary additional security permissions in order to complete a specific task.
Breakglass 以完全安全且经过审计的方式提供临时紧急访问控制。为了完成特定的任务,不同的用户组可以请求临时的附加安全权限。
Breakglass software enables fast and easy emergency access control for authorized users in a secure and flexible manner, supporting multiple user groups with different access and privilege levels. User groups, permitted requesters and authorized managers are fully controlled by RACF profiles, and all requests and approvals are fully audited via SMF records and console messages.
Breakglass 软件以安全、灵活的方式为授权用户提供快速、轻松的紧急访问控制,支持具有不同访问权限和特权级别的多个用户组。用户组、允许请求者和授权管理器完全由RACF配置文件控制,所有请求和批准都通过SMF记录和控制台消息进行全面审计。
`
Fuzzing战争: 从刀剑弓斧到星球大战
https://mp.weixin.qq.com/s/nREiT1Uj25igCMWu1kta9g
`
Fuzzing这个事物大概可以上溯到1950年,当计算机还在读取打孔卡作为输入的时候。那时候的工程师会从垃圾箱里随机检出一些废弃卡片,或者在卡上随机打孔作为输入来测试自己的程序。在1988年,Barton Miller在课堂上将Fuzzing这个名词正式确定,从此拉开三十年波澜壮阔的序幕。
广义上的Fuzzing并不是漏洞挖掘中的专属内容,而是DevSecOps和Continous Integration质量保证中必不可少的一环,甚至可以延伸到完备图灵自动机的美妙梦想。在起初,人们通常以monkey testing来指代最原始的fuzz,就像著名的无限猴子定理一样:让一只猴子在打字机上随机地按键,当按键时间达到无穷时,几乎必然能够打出任何给定的文字,比如莎士比亚的全套著作,当然也有可能包含一套Nginx RCE。
但很显然,随机化的输入虽然终究能覆盖所有的输入空间,在人类未来可预见的算力水平下近乎天方夜谭。刘慈欣在《诗云》中有一个宏大的故事:宇宙神级文明为了写出最优雅的诗,而把整个太阳系的物质作为存储器,采用枚举遍历的办法把所有文字的排列组合全部生成并存储了下来。但最后神却发现,即使理论上这里存储了所有的诗句,他也无法在可接受的时间内找出目标来,因为美妙这个词本身就是主观而无法定量界定的。以神的算力尚且无法完成,以人的算力就更不可能了。随着软件复杂度大爆炸式的增加,dumb fuzzing的随机生成方式相当于在东方明珠上向下扔一只钉子,寄希望它能刚好掉进楼下某个人举着的酒杯里。
# 从刀剑弓斧到热兵器时代
为了解决以上问题,在学术界,结合动态执行的symbolic execution在一段时间成为了主流,KLEE[1]即是其中的扛鼎之作。在Z3 Solver和LLVM bitcode infrastructure的加持下,KLEE能够自动解析各式各样的条件代码语句并生成输入,进而辅助fuzzing。
# 从越南战争到海湾战争
直至80年代末期,人们所设想的现代战争形式依然是拼消耗的钢铁洪流对撞,两伊战争的结果更是给这个理论增加了注脚。萨达姆在海湾阴云密布时,仍然充满了信心,仗百万精兵要让多国部队血流漂橹。
然平静之下,暗流涌动。针对以上问题,电子技术的发展和成熟及应用催生着新一代的信息化军事革命,并最终在海湾战争中完美登台亮相。多国联军以摧枯拉朽之势击败了伊拉克军队,彻底颠覆了过往的战争形势,震撼了全球:
* 精确制导武器极大提升了火力发现和打击的效率 – 精准原则
* C4ISR数据链指挥系统带来了从古至今所有军人梦寐以求的战场单向透明 – 敏捷原则
漫长的黑夜预示着黎明,更精准、更敏捷的划时代fuzzer即将登场。
# 横扫千军如卷席
随着LLVM工具链的成熟和计算资源的大发展,基于编译器插桩的coverage-feedback driven fuzzer渐渐成为了答案之一。正如美军在海湾战争展示的新军事革命一样,AFL的诞生则宣告着这个永恒的问题出现了一个高分答案。
AFL (Americal Fuzzy Lop,logo是美洲本地可爱的长毛兔) 由Michał Zalewski在2014年发布并开源,并迅速像AK47统治枪械市场一样成为了fuzzing系列工具的de facto standard,数年间斩获上万CVE。
在先驱者探明道路之后,后继者雨后春笋般涌现出来。LLVM也推出了自己的实现libfuzzer,以in-process mode取代了fork mode。基于AFL和libfuzzer的体系,结合protobuf定义,模版化fuzz也以structure-aware fuzzing的名义重返舞台。内核领域的对应者syzkaller也取得了极大成功。这类方法统称为Coverage-based greybox fuzzing(CGF),如秋风扫落叶一般横扫各大系统软件,进而成功入关获得了学术界的注意。
CCS’16中Marcel Bohme, et al[2]将工业界的这项革命成果正式通过Markov Chain的方式予以理论化,即Fuzzing实际上是不断变化的马尔可夫链,马尔可夫链的各个节点代表了程序的各种状态,而种子选择和样本变异的策略将直接影响结点之间的转移概率。那么衡量CGF Fuzzer效率的指标自然就是发现新节点的概率是否足够大,以及节点之间转移的概率*次数(命名为能量,注意这里的能量并不是指省电费降低服务器负载的概念,而是说对各个样本和路径的单位fuzz投入和需要生成的input数量,笔者认为称为FI, fuzzing investment更合适)是否足够均衡。
在Zalewski于2018年宣布挂印归隐后,社区接过了AFL发展的任务,并以AFL++的形式继续发布。AFL++引入了上述研究在内的多种优化,继续在各大服务器里冲锋陷阵,繁荣的新时代已然到来。
# 使命召唤:未来战争
在历史洪流的大背景下,漏洞军火化不再是遮遮掩掩的话题,而逐渐露出了真刀真枪国家对抗的原本面目。互联网设施的安全保卫和对等威慑能力的建设和实施,包括ClusterFuzz/OSSFuzz这样基础平台的建设,已经是时代摆在面前的课题。
在后续系列文章中,笔者将结合上述展望,尝试再度展开一二,敬请期待。
`
Awesome Cybersecurity Blue Team (极好的蓝队学习资源项目)
https://github.com/fabacab/awesome-cybersecurity-blueteam
`
Automation
* Code libraries and bindings
* Security Orchestration, Automation, and Response (SOAR)
Cloud platform security
Communications security (COMSEC)
DevSecOps
* Application or Binary Hardening
* Compliance testing and reporting
* Fuzzing
* Policy enforcement
Honeypots
* Tarpits
Host-based tools
* Sandboxes
Incident Response tools
* IR management consoles
* Evidence collection
Network perimeter defenses
* Firewall appliances or distributions
Operating System distributions
Phishing awareness and reporting
Preparedness training and wargaming
Security monitoring
* Endpoint Detection and Response (EDR)
* Network Security Monitoring (NSM)
* Security Information and Event Management (SIEM)
* Service and performance monitoring
* Threat hunting
Threat intelligence
Tor Onion service defenses
Transport-layer defenses
macOS-based defenses
Windows-based defenses
`
A collection of awesome penetration testing and offensive cybersecurity resources.
https://github.com/enaqx/awesome-pentest
我对 SRE 的理解
https://mp.weixin.qq.com/s/8hRvMaZCD38GEsrbZ8eVvA
`
最早讨论 SRE 来源于 Google 这本书《Site Reliability Engineering: How Google Runs Production Systems》。由 Google SRE 关键成员分享他们是如何对软件进行生命周期的整体性关注,以及为什么这样做能够帮助 Google 成功地构建、部署、监控和运维世界上现存最大的软件系统。
Site reliability engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems.
其中有句形象描述 SRE 工作的描述:
SRE is “what happens when a software engineer is tasked with what used to be called operations.”
即 SRE 的目标是构建可扩展和高可用的软件系统,通过软件工程的方法解决基础设施和操作相关的问题。
在 Google SRE 书中,对 SRE 日常工作状态有个准确的描述:至多 50% 的时间精力处理操作相关事宜,50% 以上的精力通过软件工程保障基础设施的稳定性和可扩展性。
基于上述描述,我对 SRE 的理解是:
职责:保障基础设施的稳定性和可扩展性
核心:解决问题
方法:通过操作类事务积累问题经验,通过编码等方式提升问题的解决效率
`
左耳朵耗子:我做系统架构的一些原则
https://mp.weixin.qq.com/s/DDcdJPynOzsfJdlFP3aBBg
https://coolshell.cn/articles/21672.html
`
工作 20 多年了,这 20 来年看到了很多公司的很多的系统架构,也看到了很多问题,在跟这些公司进行交流和讨论的时候,包括进行实施和方案比较的时候,因为相关的经历越来越多,所以,逐渐形成了自己的逻辑和方法论。今天,想写下这篇文章,把我的这些个人的经验和想法总结下来,希望能够让更多的人可以参考和借鉴,并能够做出更好的架构来。另外,我的这些思维方式和原则都针对于现有市面上众多不合理的架构和方案,所以,也算是一种“纠正”……(注意,这篇文章所说的这些架构上的原则,一般适用于相对比较复杂的业务,如果只是一些简单和访问量不大的应用,那么你可能会得出相反的结论)
目录
* 原则一:关注于真正的收益而不是技术本身
* 原则二:以应用服务和 API 为视角,而不是以资源和技术为视角
* 原则三:选择最主流和成熟的技术
* 原则四:完备性会比性能更重要
* 原则五:制定并遵循符从标准、规范和最佳实践
* 原则六:重视架构扩展性和可运维性
* 原则七:对控制逻辑进行全面收口
* 原则八:不要迁就老旧系统的技术债务
* 原则九:不要依赖自己的经验,要依赖于数据和学习
* 原则十:千万要小心 X – Y 问题
* 原则十一:激进胜于保守,创新与实用并不冲突
`
Microsoft 安全最佳实践 – Microsoft Security Best Practices
https://docs.microsoft.com/en-us/security/compass/compass
`
介绍 (Introduction)
治理、风险和合规性 (Governance, risk, and compliance)
安全操作 (Security operations)
标识和访问管理 (Identity and access management)
网络安全和遏制 (Network security and containment)
信息保护和存储 (Information protection and storage)
应用程序和服务 (Applications and services)
`