Databases
Comparison¶
Database | Transport encryption | Encryption at rest | Audit | Authentication | Authorization |
---|---|---|---|---|---|
RDS | Rooted at global RDS certs, configuration is per-engine | KMS or TDE w/ SQL Server and Oracle - RDS managed key |
per-engine log files | per engine user accounts | per engine |
DynamoDB | Standard AWS HTTPS endpoint | KMS | CloudTrail | IAM only Cognito possible |
IAM identity policies - resources & condition keys |
Redshift | ACM managed certificate, redshift specific root | KMS CloudHSM |
S3 | DB user accounts | SQL |
Neptune | Publicly trusted Amazon root | KMS | Console | User accounts; or a limited IAM identity policy mechanism + request signing | Engine-specific; or broad access if using IAM |
Aurora | Rooted at global RDS certs | KMS | mysql -> CloudWatch Logs | User accounts; or an IAM authenticated API to obtain short lived passwords to connect | SQL |
DocumentDB | Rooted at global RDS certs | KMS | CloudWatch Logs | MongoDB user accounts | MongoDB standard |
OLAP vs OLTP
OLAP: Redshift
OLTP: relational DBs
RDS¶
General Info¶
- Description
-
- Relational DB Service (RDS): expose a DB instance but does not provide shell access
- Operational Benefits
- AWS in charge of: scaling, HA, backups, DB engine patches
- No admin access provided
- Pay
- Provisioned compute resources
- Provisioned storage resources
Use Cases
Suitable | Not Suitable |
---|---|
|
|
- DB Instance
-
- Isolated DB environment deployed in a private network
- Each instance runs & manages a DB ENGINE
- Configuration
DB Parameter Group container for engine configuration values that can be applied to 1+ DB instances DB Option Group container for engineer features (empty by default)
DB Engines
MySQL |
|
||||||||||||||||||||
PostgreSQL |
|
||||||||||||||||||||
Oracle |
|
||||||||||||||||||||
MS SQL |
|
||||||||||||||||||||
Aurora |
|
Operations¶
- Backup & Recovery
-
- Backup Mechanisms
AUTOMATED BACKUPS - Continuously tracks changes & backs up the DB
- Automated backups are turned on by default
- RDS creates a storage volume snapshot of the complete DB instance (not just individual DBs)
- Retention Period = 1day by default, max 35 days
- When DB instance deleted → all automated backup snapshots are deleted
- Backups occur daily in a 30min maintenance window
MANUAL - At any time
- Kept until explicitly disabled
- Recovery
- You cannot restore from a DB snapshot to an existing DB instance, a new DB instance is created
- Only default DB params & SGs are associated with restored instance
HA
Multi-AZ |
|
Replication |
|
Scaling
Vertical Scalability (UP) |
|
Horizontal Scalability (OUT) |
|
Security¶
- Access Control
-
- When create a DB instance you create a
MASTER
USER ACCOUNT, a native DB user account that allows to login with ALL DB privileges - Can create additional user accounts
- No resource based policies
- The Many Ways to Access RDS
- When create a DB instance you create a
Network Isolation
Runs in a VPC |
|
DB Subnet Group |
|
DB SG |
|
Network ACL | Allow/deny traffic IN/OUT subnet |
Encryption
TLS |
|
||||||||||
RDS Encryption |
|
||||||||||
Transparent data encryption (TDE) |
|
DynamoDB¶
General Info¶
- Description
-
- NoSQL DB
- Provide consistent performance levels by automatically distributing data&traffic for a table over multiple partitions
- All data stored on SSD & automatically replicated across multiple AZ → > resilience than RDS, Aurora, Redshift
- Supports cross-region replication
Components
TABLE |
collection of items |
ITEM |
collection of 1+ attributes, with PRIMARY KEY |
ATTRIBUTE |
name/value pair |
Data Types
SCALAR |
single value (string, number, binary, boolean, null) |
SET |
|
DOCUMENT |
nested attributes (list , map ) |
Primary Key
PARTITION KEY |
1 attribute, a partition (hash) key |
PARTITION & SORT KEY |
|
- Secondary Indexes
-
-
Creation
- Define 1+ secondary indexes when creating a table with
PARTITION & SORT KEY
- Define 1+ secondary indexes when creating a table with
-
Kinds
GLOBAL
- index with
PARTITION & SORT KEY
!= from table - any number of
GLOBAL
secondary index, created whenever
LOCAL
- index with =
PARTITION KEY
as primary - index with !=
SORT KEY
from primary - 1!
LOCAL
secondary index & must be created when creating table
- Usage
- Updated when an item is modified → consume W capacity units
GLOBAL
→ from own provisioned throughputLOCAL
→ from main table
-
Operations¶
- Provisioned Capacity
-
- At creation, provide R/W capacity (R/W capacity units)
- Each operation consume capacity units, proportional to
size(item)
Example
- Table w/o secondary index
- Consumes:
- 1 unit for R item <= 4KB
- 1 unit for W item <= 1KB
- R item of 110KB → 28 units (
ceil(110/4) = ceil(27.5) = 28
) - Plus, if strongly consistent →
28 * 2 = 56
- Eventual Consistency
-
- DynamoDB stores multiple copies of an item across a region
- Eventually Consistent = R request immediately after a W might not show the latest change
- Reads
- Eventually Consistent R = might include stale data
- Strongly Consistent R = most up-to-date data
Search
QUERY |
|
SCAN |
|
Managing Data
Atomic Counters | Increment/decrement value & guaranteed to be consistent across multiple concurrent requests |
Reading |
|
Scaling |
|
Partition |
|
Streams |
|
Backups |
|
Security¶
- Encryption
-
- Optional encryption at rest integrated with KMS
- Access Control
-
RESOURCE
-level = IAM- No resource based policies
- Full access to a table requires access to not just the
table/<name>
resource, but alsotable/<name>/*
DB
-level = fine-grained access controls- Permissions that allow/deny access to items (rows) and attributes (columns)
- Create them using an IAM policy
- Several condition keys for fine-grained access including:
dynamodb:LeadingKeys
,dynamodb:Select
,dynamodb:Attributes
- Each request must contain a valid signature (STS)
- The Many Ways to Access DynamoDB
- Integrations
-
Get
andPut
API calls are not logged to CloudTrail- Has a VPC endpoint you can use
- Integration with Cognito:
- identity pool with roles configured
- roles have appropriate policy to (a) allow cognito to assume them and (b) perform desired DynamoDB actions
Redshift¶
General Info¶
- Description
-
- Data Warehouse service
- Relational DB designed for OLAP & optimised for analysis & reporting of large datasets
- Based on PostgreSQL
- Fast querying capabilities over structured data using standard SQL commands
- Data typically spans multiple data repo & at least 2TB
- Pay
- Provisioned compute resources
- Storage (charged according to computer resources)
- Clusters and Nodes
-
CLUSTER = LEADER NODE + 1+ COMPUTE NODES
- Each cluster contains 1+ DBs
- Node Types
- 6 supported
- grouped into 2 categories
- DENSE COMPUTE = clusters up to 326TB using fast SSD
- DENSE STORAGE = up to 2PB using large magnetic disks
- Slices
- disk storage for a COMPUTE node is divided into a number of slices (>2 & <16)
- nodes participate in parallel query execution
Operations¶
- Sort Keys
-
- Enables efficient handling of range-restricted predicates
- Types
COMPOUND
= more efficient when query predicates use aPREFIX
, subset of sort key column in orderINTERLEAVED
= equal weight to each column in sort key. Query can use any subset of the columns of sort key
- Distribution
-
- Distributes & executes query in parallel across all of a cluster's compute nodes
- Spreads table data across all compute nodes based on a distribution strategy
- All data written to a node is continuously backed up to S3 using snapshots
- Distribution Strategy
-
- How to distribute the records across the nodes & slices in a cluster
- When run a query, the optimiser shifts the rows to the compute nodes so to perform joins
- Goal = minimise impact of redistribution step
- 3 styles
EVEN
Data distributed across slices uniformly (standard) KEY
Rows distributed according to values in 1 column ALL
Full copy of entire table is distributed to each node (lookup tables) - Snapshots
-
- Automated & manual (1-35days & until explicitly deleted)
- Used to restore a copy or create a clone
Security¶
Additional Resources
Cluster Access
General |
|
DB-level |
|
Owner of DB object |
|
- Data Encryption
-
- All data in user-created tables encrypted with AES-256 (all data on disk & any backup)
- Uses a 4-tier key-based architecture for encryption
Tier | Description |
---|---|
DATA ENC KEYS encrypt data blocks in the cluster |
Each block has random AES-256 key |
DB KEY encrypt DATA ENC KEYS |
|
CLUSTER KEY encrypt the DB KEY |
Stored by AWS or HSM |
MASTER KEY encrypt the CLUSTER KEY if stored in AWS |
If not → encrypt the CLUSTER-KEY-ENCRYPTED DB KEY |
- Network Security
-
- SSL
- To comms with S3 or DynamoDB for:
COPY
,UNLOAD
, backup, restore - Can encrypt client ↔ cluster
- To comms with S3 or DynamoDB for:
- Clusters are associated with 1+ security groups
- Audit logs
- Disabled by default
- stored to S3 (as well as the standard CloudTrail logs)
- bucket policy has to allow
putobject
andgetacl
to a specific user from a redshift AWS account that varies by region:arn:aws:iam::<redshift regional account id>:user/logs
- SSL
Neptune¶
- Encryption
-
- Encryption at rest with KMS
- HTTPS access
- IAM
-
- Permissions are a subset of RDS permissions, and all the actions are
rds
- Can constrain to just neptune with a condition of
rds:DatabaseEngine = graphdb
- Can constrain to just neptune with a condition of
- Hybrid model where you can authenticate with IAM, and define identity policies that allow access
- No condition keys, no fine grained access (only a single
neptune-db:* action
) - HTTP requests then need to be signed with standard AWS v4 signatures that you construct yourself
- No condition keys, no fine grained access (only a single
- Permissions are a subset of RDS permissions, and all the actions are
- Integrations
-
- Interfaces appears in at least two subnets spanning two AZs in a VPC, interfaces have security groups
- Optional audit logs
- CloudTrail events appear as though they are from the RDS service not Neptune
Aurora¶
- Features
-
- Engine type for RDS
- Stores 6 copies of the data, across 3 AZ, to ensure failover and disaster recovery → > resilience than RDS
- Performance
- Is ~5 times as fast as MySQL and ~3 times as fast as PostgreSQL
- Serverless capability → horizontal scaling
- Pay
- Provisioned or actual compute resources
- Actual storage
- IAM
-
- Supports IAM database authentication, similar to Neptune
- Attach identity policy to IAM principals that allow
rds-db:connect
for a particular database user - Manage user permissions within the DB as per normal (IAM is just for authentication)
- You get a
token
from the RDS API by specifying the db and user, then use the token in place of the user's password when connecting normally
- Attach identity policy to IAM principals that allow
- Supports IAM database authentication, similar to Neptune
- Integrations
-
- Uses normal VPC security groups to control access within a VPC
DocumentDB¶
- Encryption
-
- TLS from the RDS root
- KMS encryption at rest
- IAM
-
- Master user + mongodb user management
- IAM identity policies for management
- Integrations
-
- Auditing can be enabled to send events to CloudWatch Logs