Skip to content

Architecture Best Practices

OU Structure


Concept Description
Availability Determined by percentage uptime, in 9s
High Availability The system will continue to function despite the complete failure of any component of the architecture
Fault Tolerance The system will continue to function without degradation in performance despite the complete failure of any component of the architecture
Redundant Multiple resources dedicated to performing the same task
Scalability The ability of a system to increase resources to accommodate increased demand
  • The ability of a system to increase and decrease resources allocated (usually horizontally) to match demand
  • In general, an elastic resource is also scalable, but the reverse isn't always true

Design for Failure

  • Can withstand the failure of an individual or multiple components
  • Goal is to ensure app survives when the underlying physical hardware for 1 of the servers fails

  • Remove single point of failure by having multiple resources for the same task
  • Types
Type Description
  • When resources fails, functionality restored on secondary resource using FAILOVER
  • Failover require time before completed, during that period resource is UNAVAILABLE
  • Used for STATEFUL components
  • Requests distributed to multiple redundant resources
  • When 1 fails → rest absorb workload
  • Redundant Webapp
    • Add another web instance in ANOTHER AZ
    • Also ACTIVE redundancy by swapping EIP with ELB


  • Ability of a system to grow to handle increased load, whether gradually over time or in response to a sudden change in business needs
  • Architecture should scale linearly
  • Types
Type Description
SCALING VERTICALLY Through an increase in the specs of an individual resource (e.g., EC2 instance type)
  • Through an increase in the number of resources (e.g., number of instances)
    • Needs to knowledge of previous interactions & stores no session info
    • Can scale horizontally cause each request can be served by any available resource
    • For webapps → AUTOSCALING (add AutoScaling Group)
    • Most apps need to maintain some kind of state info
    • You can make a portion of the architecture stateless by not storing state locally (e.g., use HTTP Cookies) on a horizontally-scaling resource (as it can appear/disappear)
    • Solution → store user session info in a DB (e.g., DynamoDB)

Storage Options

Option Description
  • Webapp needs large scale storage capacity and performance
  • Need storage with high data durability to support backup & active archives for disaster recovery
CloudFront CDN
DynamoDB NoSQL
EBS Reliable block storage for mission critical apps such as Oracle/SAP/OWA
RDS Highly available, scalable, secure MySQL DB
Redshift Data warehouse to support business analytics
Elasticache Redis cluster to store session info
Elastic FS Common FS for app that is shared between 1+ EC2 instances
  • Move static assets to S3 → then serve via CloudFront
    • < load on instances
    • < footprint of web tier
  • Move session info to DynamoDB/Elasticache
    • Web instances do not lose session info when autoscaling happens
  • Elasticache to store common DB query results
    • < load on DB tier

Layered Security

Defense in Depth
  • Network level: VPC that isolates part of the infra through subnets/SGs
  • App level: AWS WAF
  • Access control: IAM (least privilege)
  • Data: encryption in transit/at rest
Offload Response to AWS
  • Shared responsibility model
  • Reduce scope of responsibility
Security as Code
  • Scan IaC
  • Cloudformation templates can be imported as "products" into Service Catalog
Realtime Auditing
  • Continuous monitoring & automation of controls → Config Rules / Inspector / Trusted Advisor
  • Logging → Cloudwatch Logs / CloudTrail
  • Scan Logs → Lambda / EMR / Elasticsearch

Loose Coupling

Loosely Coupled Components
  • Less interdependencies
  • Interact with each other through interfaces (e.g., API Gateway)
Async Integration
  • Pattern for implementing loose coupling between services
  • Generator → SQS → Consumer
Service Discovery
  • Manages how processes & services in an environment can find & talk to one another
  • Components
    • Directory of services
    • Registering services in the directory
    • Lookup and connect to services

Elastic Architecture

  1. Create VPC (

  2. Create IGW, and attach to VPC

  3. Update Main RT → add route --> IGW
  4. Create Public Subnets

    • vpc_subnet_1_public = (US-EAST-1A)
    • vpc_subnet_3_public = (US-EAST-1B)
  5. Create NAT GW, attach it to vpc_subnet_1_public

  6. Create Private RT → add route --> NAT GW
  7. Create Private Subnets

    • vpc_subnet_2_private = (US-EAST-1A)
    • vpc_subnet_4_private = (US-EAST-1B)
    • Update both RT to use the Private RT
  8. Create SG for each tier

    • ELB = vpc_sg_elb / HTTP : TCP : 80 :
    • Web Server = vpc_sg_web / HTTP : TCP : 80 : vpc_sg_elb
    • RDS = vpc_sg_rds / MYSQL : TCP : 3306 : vpc_sg_web
  9. Create Multi-AZ RDS
    • Create DB Subnet Group in private subnets (2/4)
    • Launch MySQL
      • Multi-AZ: yes
      • VPC SG: vpc_sg_rds
  10. Create ELB
    • Name: vpc_elb
    • Listener: HTTP : 80 : HTTP : 80
    • SG: vpc_sg_elb
    • Health Check: HTTP : 80 : /index.html
  11. Create AutoScaling Group

    • Launch config: vpc_sg_web + user data
    #! /bin/bash
    yum update -y
    yum install -y php php-mysql mysql httpd
    echo "<html>Hello!</html>" > /var/www/index.html
    service httpd start
    • ASG → vpc_web_lc in public subnets (1/3)
    • Associate ELB


Kind Service Availability SLA Durability Scope
DBs DynamoDB 4 9s Region
Redshift 3 9s
RDS Multi-AZ 3.5 9s (99.95%)
Aurora 3.5 9s Multi-AZ
Aurora Multi-Master 4 9s
Storage S3 4 9s Region
EBS volume 5 9s afr of .1 to .2% AZ
EBS snapshot 4 9s 11 9s Region
EFS 4 9s 11 9s Region
Compute EC2 instance 1 9
Lambda 3.5 9s Region
Distribution Route53 100% Global
CloudFront 4 9s Global
ELB 4 9s Multi-AZ
API GW 4 9s Region
Cognito 3 9s Region