Skip to content

Databases

Comparison

Database Transport encryption Encryption at rest Audit Authentication Authorization
RDS Rooted at global RDS certs, configuration is per-engine KMS
or
TDE w/ SQL Server and Oracle - RDS managed key
per-engine log files per engine user accounts per engine
DynamoDB Standard AWS HTTPS endpoint KMS CloudTrail IAM only
Cognito possible
IAM identity policies - resources & condition keys
Redshift ACM managed certificate, redshift specific root KMS
CloudHSM
S3 DB user accounts SQL
Neptune Publicly trusted Amazon root KMS Console User accounts; or a limited IAM identity policy mechanism + request signing Engine-specific; or broad access if using IAM
Aurora Rooted at global RDS certs KMS mysql -> CloudWatch Logs User accounts; or an IAM authenticated API to obtain short lived passwords to connect SQL
DocumentDB Rooted at global RDS certs KMS CloudWatch Logs MongoDB user accounts MongoDB standard

OLAP vs OLTP

OLAP: Redshift

OLTP: relational DBs


RDS

General Info

Description
  • Relational DB Service (RDS): expose a DB instance but does not provide shell access
  • Operational Benefits
    • AWS in charge of: scaling, HA, backups, DB engine patches
    • No admin access provided
  • Pay
    • Provisioned compute resources
    • Provisioned storage resources

Use Cases

Suitable Not Suitable
  • Complex Transactions/Queries
  • Medium-to-High query/write rate
  • No more than a single worker node/shard
  • High durability
  • Massive read/write rates (e.g., 150K write/s)
  • Sharding
  • Simple GET/PUT requests and queries
  • RDBMS customization
DB Instance
  • Isolated DB environment deployed in a private network
  • Each instance runs & manages a DB ENGINE
  • Configuration
DB Parameter Group container for engine configuration values that can be applied to 1+ DB instances
DB Option Group container for enginer features (empty by default)

DB Engines

MySQL
  • OSS Community Edition with InnoDB
  • Supports multi-AZ & read replicas
PostgreSQL
  • Supports multi-AZ & read replicas
Oracle
EditionMulti-AZEncr.License
Standard OneYESKMSincluded, BYOL
StandardYESKMSBYOL
EnterpriseYESKMS & TDEBYOL
MS SQL
EditionMulti-AZEncr.License
ExpressNOKMSincluded
WebNOKMSincluded
StandardYESKMSincluded, BYOL
EnterpriseYESKMS & TDEBYOL
Aurora
  • Built using EBS
  • Scale up to 4-6TB & 30k IOPS
  • Storage types
    • Magnetic
    • General Purpose (SSD)
    • Provisioned IOPS (SSD)

Operations

Backup & Recovery
  • Backup Mechanisms
AUTOMATED BACKUPS
  • Continuously tracks changes & backs up the DB
  • Automated backups are turned on by default
  • RDS creates a storage volume snapshot of the complete DB instance (not just individual DBs)
  • Retention Period = 1day by default, max 35 days
  • When DB instance deleted → all automated backup snapshots are deleted
  • Backups occurr daily in a 30min maintenance window
MANUAL
  • At any time
  • Kept until explicitly disabled
  • Recovery
    • You cannot restore from a DB snapshot to an existing DB instance, a new DB instance is created
    • Only default DB params & SGs are associated with restored instance

HA

Multi-AZ
  • Synchronous replication to minimize RPO
  • Fast failover to minimize RTO
  • Allows to place a 2ndary copy of DB in another AZ for DISASTER RECOVERY purposes
  • DNS name that AWS resolves tp IP address: my-app-db.<id>.us-west-2.rds.amazonaws.com
Replication
  • RDS automatically replicates data from MASTER DB in PRIMARY instance to SLAVE DB/2ndary instance
  • Read replicas use asynchronous replication, pushing data to the read replicas whenever possible, for improved read performance
  • AWS provides up to five read replicas for a single database instance, configurable via the AWS console
  • RDS automatically performs a failover in the event
    • loss of availability in primary AZ
    • loss of network connectivity to primary DB
    • compute unit failure on primary DB
    • storage failure on primary DB
  • Custom failover to standby instance without user intervention

Scaling

Vertical Scalability (UP)
  • Each DB instance can scale from 5GB to 6TB in provisioned storage
  • Storage Expansion is supported for all engines except SQL Server
Horizontal Scalability (OUT)
  • Use cases
    • scale beyond capacity of single DB instance
    • handle read traffic when source DB is unavailable
    • offload reporting or data warehousing
  • With PARTITIONING (sharding)
    • partition DB into multiple instances (shards)
    • handle more users/requests but requires additional logic in the app layer
    • designed for NoSQL (DynamoDB, Cassandra)
  • With READ REPLICAS
    • offload read transactions from primary DB & increase overall number of transactions
    • updates made to the source DB are async copied to the read replica
    • supported by: MySQL, PostgreSQL, MariaDB, Aurora
    • replicas can span across regions

Security

Access Control
  • When create a DB instance you create a MASTER USER ACCOUNT, a native DB user account that allows to login with ALL DB privileges
  • Can create additional user accounts
  • No resource based policies

Network Isolation

Runs in a VPC
  • DB instance within a private subnet
  • Connect to on-premises with VPN
  • Publicly accessible option controls whether there is a publicly resolvable DNS name for the instance
DB Subnet Group
  • Collections of subnets designated for DB instances in a VPC
  • Each subnet group should have at least 1 subnet for each AZ in a region
DB SG
  • ~ EC2 SG
  • Default to DENY ALL access
  • Must explicitly authorize network ingress (IP range, EC2 SG)
  • Only allows access to the DB server port
Network ACL Allow/deny traffic IN/OUT subnet

Encryption

TLS
  • Encrypt connections App ↔ DB instance
  • Single root for all RDS database TLS certs
  • EngineDescription
    MySQL & SQL ServerRDS creates a SSL cert & installs it on DB instance when provisioned
    MySQLLaunch client with SSL CA pointer to public key
    SQL ServerDownload public key & import cert within OS
    OracleUses Oracle native network encryption
RDS Encryption
  • Encryption at rest, set during creation, uses KMS
  • Covers database, backups, replicas, snapshots
Transparent data encryption (TDE)
  • For SQL Server and Oracle with CloudHSM
  • Automatically encrypts data before it is written to storage
  • If you need data at rest with MySQL → app must manage it

DynamoDB

General Info

Description
  • NoSQL DB
  • Provide consistent performance levels by automatically distributing data&traffic for a table over multiple partitions
  • All data stored on SSD & automatically replicated across multiple AZ → > resilience than RDS, Aurora, Redshift
  • Supports cross-region replication

Components

TABLE collection of items
ITEM collection of 1+ attributes, with PRIMARY KEY
ATTRIBUTE name/value pair

Data Types

SCALAR single value (string, number, binary, boolean, null)
SET
  • single list of 1+ SCALAR
  • each value is unique
  • all must be of same type (string, number, binary)
DOCUMENT nested attributes (list, map)

Primary Key

PARTITION KEY 1 attribute, a partition (hash) key
PARTITION & SORT KEY
  • 2 attributes
  • possible to have same PARTITION KEY, but those 2 items must have different SORT KEY
Secondary Indexes
  • Creation

    • Define 1+ secondary indexes when creating a table with PARTITION & SORT KEY
  • Kinds

GLOBAL
  • index with PARTITION & SORT KEY != from table
  • any number of GLOBAL secondary index, created whenever
LOCAL
  • index with = PARTITION KEY as primary
  • index with != SORT KEY from primary
  • 1! LOCAL secondary index & must be created when creating table
  • Usage
    • Updated when an item is modified → consume W capacity units
    • GLOBAL → from own provisioned throughput
    • LOCAL → from main table

Operations

Provisioned Capacity
  • At creation, provide R/W capacity (R/W capacity units)
  • Each operation consume capacity units, proportional to size(item)
Example
  • Table w/o secondary index
  • Consumes:
    • 1 unit for R item <= 4KB
    • 1 unit for W item <= 1KB
  • R item of 110KB → 28 units (ceil(110/4) = ceil(27.5) = 28)
  • Plus, if strongly consistent28 * 2 = 56
Eventual Consistency
  • DynamoDB stores multiple copies of an item across a region
  • Eventually Consistent = R request immediately after a W might not show the latest change
  • Reads
    • Eventually Consistent R = might include stale data
    • Strongly Consistent R = most up-to-date data

Search

QUERY
  • Primary search operation to find items in a table or secondary index using only PRIMARY KEY attribute values
  • Each query requires a PARTITION KEY ATTRIBUTE NAME + value to search
  • Results sorted by PRIMARY KEY & limited to 1MB
SCAN
  • Read each item in a table or secondary index
  • Returns ALL data attributes for each item in the table/index
  • Each request returns <= 1MB data

Managing Data

Atomic Counters Increment/decrement value & guaranteed to be consistent across multiple concurrent requests
Reading
  • By default EVENTUALLY CONSISTENT READ
  • PARTITION KEY → must be specified
  • PARTITION & SORT KEY → must be specified
Scaling
  • Scale horizontally using PARTITIONS (compute & storage units)
  • Table items stored across multiple partitions
  • Which partition based on the PARTITION KEY → items with same PARTITION KEY are stored in the same partition
Partition
  • Provisioned throughput divided evenly across partitions
  • No sharing of throughput across partitions
  • 1 partition
    • <= 10GB data
    • <= 3KB R capacity units OR <= 1K W capacity units
  • After a partition is split, it cannot be merged back together
Streams
  • List of item modifications for last 24h period
  • STREAM = stream records
  • STREAM RECORD = single data modification in the table to which the stream belongs & organised into groups (shards) for max 24h
Backups
  • Full or incremental to a table, in same or different region
  • Copy used for
    • Disaster recovery
    • To federate data across regions to support a multi-region application

Security

Encryption
  • Optional encryption at rest integrated with KMS
Access Control
  • RESOURCE-level = IAM
    • No resource based policies
    • Full access to a table requires access to not just the table/<name> resource, but also table/<name>/*
  • DB-level = fine-grained access controls
    • Permissions that allow/deny access to items (rows) and attributes (columns)
    • Create them using an IAM policy
    • Several condition keys for fine-grained access including: dynamodb:LeadingKeysdynamodb:Selectdynamodb:Attributes
  • Each request must contain a valid signature (STS)
Integrations
  • Get and Put API calls are not logged to CloudTrail
  • Has a VPC endpoint you can use
  • Integration with Cognito:
    • identity pool with roles configured
    • roles have appropriate policy to (a) allow cognito to assume them and (b) perform desired DynamoDB actions

Redshift

General Info

Description
  • Data Warehouse service
  • Relational DB designed for OLAP & optimised for analysis & reporting of large datasets
  • Based on PostgreSQL
  • Fast querying capabilities over structured data using standard SQL commands
  • Data typically spans multiple data repo & at least 2TB
  • Pay
    • Provisioned compute resources
    • Storage (charged according to computer resources)
Clusters and Nodes
  • CLUSTER = LEADER NODE + 1+ COMPUTE NODES
  • Each cluster contains 1+ DBs
  • Node Types
    • 6 supported
    • grouped into 2 categories
      • DENSE COMPUTE = clusters up to 326TB using fast SSD
      • DENSE STORAGE = up to 2PB using large magnetic disks
  • Slices
    • disk storage for a COMPUTE node is divided into a number of slices (>2 & <16)
    • nodes participate in parallel query execution

Operations

Sort Keys
  • Enables efficient handling of range-restricted predicates
  • Types
    • COMPOUND = more efficient when query predicates use a PREFIX, subset of sort key column in order
    • INTERLEAVED = equal weight to each column in sort key. Query can use any subset of the columns of sort key
Distribution
  • Distributes & executes query in parallel across all of a cluster's compute nodes
  • Spreads table data across all compute nodes based on a distribution strategy
  • All data written to a node is continuously backed up to S3 using snapshots
Distribution Strategy
  • How to distribute the records across the nodes & slices in a cluster
  • When run a query, the optimiser shifts the rows to the compute nodes so to perform joins
  • Goal = minimise impact of redistribution step
  • 3 styles
EVEN Data distributed across slices uniformly (standard)
KEY Rows distributed according to values in 1 column
ALL Full copy of entire table is distributed to each node (lookup tables)
Snapshots
  • Automated & manual (1-35days & until explicitly deleted)
  • Used to restore a copy or create a clone

Security

Cluster Access

General
  • By default clusters are CLOSED to everyone
  • AWS account that creates the cluster has FULL ACCESS
  • IAM to create user accounts (RESOURCE-level)
  • No resource based policies
DB-level
  • DB users are named user accounts that can connect to a DB and are authn when they login to Redshift
  • You grant DB user permissions on a PER-CLUSTER basis (instead of per-table)
  • Users can only see data in the table rows that were GENERATED by them
Owner of DB object
  • User who creates it
  • By default only a super user of the owner can query/modify/grant permissions
  • Only the owner can modify/delete it
Data Encryption
  • All data in user-created tables encrypted with AES-256 (all data on disk & any backup)
  • Uses a 4-tier key-based architecture for encryption
Tier Description
DATA ENC KEYS encrypt data blocks in the cluster Each block has random AES-256 key
DB KEY encrypt DATA ENC KEYS
  • Random AES-256 key
  • Stored on disk in separate network from cluster
  • Stored encrypted by a MASTER KEY
  • Kept in memory in the cluster
CLUSTER KEY encrypt the DB KEY Stored by AWS or HSM
MASTER KEY encrypt the CLUSTER KEY if stored in AWS If not → encrypt the CLUSTER-KEY-ENCRYPTED DB KEY
Network Security
  • SSL
    • To comms with S3 or DynamoDB for: COPY, UNLOAD, backup, restore
    • Can encrypt client ↔ cluster
  • Clusters are associated with 1+ security groups
  • Audit logs
    • Disabled by default
    • stored to S3 (as well as the standard CloudTrail logs)
    • bucket policy has to allow putobject and getacl to a specific user from a redshift AWS account that varies by region: arn:aws:iam::<redshift regional account id>:user/logs

Neptune

Encryption
  • Encryption at rest with KMS
  • HTTPS access
IAM
  • Permissions are a subset of RDS permissions, and all the actions are rds
    • Can constrain to just neptune with a condition of rds:DatabaseEngine = graphdb
  • Hybrid model where you can authenticate with IAM, and define identity policies that allow access
    • No condition keys, no fine grained access (only a single neptune-db:* action)
    • HTTP requests then need to be signed with standard AWS v4 signatures that you construct yourself
Integrations
  • Interfaces appears in at least two subnets spanning two AZs in a VPC, interfaces have security groups
  • Optional audit logs
  • CloudTrail events appear as though they are from the RDS service not Neptune

Aurora

Features
  • Engine type for RDS
  • Stores 6 copies of the data, across 3 AZ, to ensure failover and disaster recovery → > resilience than RDS
  • Performance
    • Is ~5 times as fast as MySQL and ~3 times as fast as PostgreSQL
    • Serverless capability → horizontal scaling
  • Pay
    • Provisioned or actual compute resources
    • Actual storage
IAM
  • Supports IAM database authentication, similar to Neptune
    • Attach identity policy to IAM principals that allow rds-db:connect for a particular database user
    • Manage user permissions within the DB as per normal (IAM is just for authentication)
    • You get a token from the RDS API by specifying the db and user, then use the token in place of the user's password when connecting normally
Integrations
  • Uses normal VPC security groups to control access within a VPC

DocumentDB

Encryption
  • TLS from the RDS root
  • KMS encryption at rest
IAM
  • Master user + mongodb user management
  • IAM identity policies for management
Integrations
  • Auditing can be enabled to send events to CloudWatch Logs