Engineering Decisions
Documentation¶
Link | Notes |
---|---|
Gergely Orosz Scaling Engineering Teams via RFCs: Writing Things Down | The power of writing things down, and spreading knowledge across the organization |
Gergely Orosz Engineering Planning with RFCs, Design Documents and ADRs | What are some successful planning approaches engineering teams use as they grow? |
Design Docs at Google | Anatomy of a good design doc |
Architecture decision record (ADR) | An architectural decision record (ADR) is a document that captures an important architectural decision made along with its context and consequences |
What is the best way to write a PRD? |
|
S.P.A.D.E. Toolkit: How to implement Square's famous decision-making framework | A decision-making framework, alternative to consensus built on accountability and clarity, where the person responsible for executing the decision is the one who decides |
Technical Writing Courses for Engineers from Google |
|
- Design Docs (or RFCs) are a great way to get higher-level feedback on an approach, before starting the work
- Written to share:
- context
- suggested approach
- bird's eye view of requirements and general architecture of the project
- tradeoffs
- and to invite feedback
- Architecture Decision Records (ADRs) document implementation decisions
- Written to document decisions, and less for getting feedback on these decisions
- Usually live in the same repo (living docs)
Cloud¶
Link | Notes |
---|---|
Cloud design patterns | Design patterns for building reliable, scalable, secure applications in the cloud by walking through examples based on Microsoft Azure |
AWS App-Layer Encryption in AWS | |
AWS Network access for private clusters | Very interesting article going into the problem of providing network connectivity between Kubernetes clusters and other internal tools (like deployment pipelines) |
AWSSquare Adopting AWS VPC Endpoints at Square | Secure communication between data centers and the cloud |
AWSSquare Providing mTLS Identities to Lambdas | Writeup on how Square added support for mutual TLS calls from AWS Lambda into their data center |
AWSSquare Expanding Secrets Infrastructure to AWS Lambda | How Square extended their datacenter-based secrets infrastructure to enable a cloud migration supporting Lambda |
AWSSquare Connecting Block Business Units with AWS API Gateway | How Block enables backend services to securely connect across business unit boundaries using AWS API Gateway |
AWS Cloud Encryption is worthless! Click here to see why... | When evaluating your cloud security posture priorities, encryption should be at the bottom of your list. First, get your IAM house in order |
AWS Building the Next Evolution of Cloud Networks at Slack | How Slack has gone through an evolution of their AWS infrastructure from running a few hand-built EC2, all the way to provisioning thousands of them across multiple AWS regions |
Multicloud failover is almost always a terrible idea | Multicloud failover is complex and costly to the point of nearly almost always being impractical, and it's not an especially effective way to address cloud resilience risks |
Infrastructure¶
Link | Notes |
---|---|
Uber Why We Leverage Multi-tenancy in Uber's Microservice Architecture |
|
Uber Introducing Domain-Oriented Microservice Architecture | This piece explains DOMA, the concerns that led to the adoption of this architecture for Uber, its benefits for platform and product teams, and, finally, some advice for teams who want to adopt this architecture |
Uber Crane: Uber’s Next-Gen Infrastructure Stack | Post examining the original motivation and some key features behind Uber's been multi-year journey to reimagine their infrastructure stack for a hybrid, multi-cloud world |
Container technologies at Coinbase: Why Kubernetes is not part of our stack | Container technologies also create a large set of challenges that must be overcome to prevent failures |
Decentralized GitOps over multiple environments | How SAP Artificial Intelligence implements GitOps in their large-scale project spanning multiple environments |
How we use HashiCorp Nomad | Reliability model of services running in our more than 200 edge cities worldwide |
Design Considerations at the Edge of the ServiceMesh | Set of design patterns around inbound and outbound traffic to and from a service mesh |
A Kubernetes engineer's guide to mTLS | What mTLS is, how it relates to ordinary TLS, and why it's relevant to Kubernetes |
Lyft Scaling productivity on microservices at Lyft | History of development and test environments |
monday.com’s Multi-Regional Architecture: A Deep Dive | When making a decision to go multi-region, one needs to understand the primary motivation, as the work will vary greatly between performance-first, resilience-first and privacy-first designs |
Inside Figma: securing internal web apps | A deep-dive into how Figma built a system for securing internal web applications that lets them require SSO authentication, enforce fine-grained authorization (via Okta groups), and support CLI tools, all using ALBs, AWS Cognito, and Okta |
Inside Figma: getting out of the (secure) shell | A simple solution for zero-trust shell access on AWS, by leveraging AWS SSO and Systems Manager |
Building ClickHouse Cloud From Scratch in a Year |
|
Development Environments & CI¶
Link | Notes |
---|---|
AWS Setup |
|
Automating Our Infrastructure to Empower Engineers |
|
Devpod: Improving Developer Productivity at Uber with Remote Development | How Uber improved the daily edit-build-run developer experience using DevPods |
Balancing Safety and Velocity in CI/CD at Slack |
|
Various¶
Link | Notes |
---|---|
Why is it so hard to decide to buy? |
|
Software Development Waste | A taxonomy for any team that's trying to figure out how to be more efficient |
The top 10 fallacies in platform engineering |
|