Managed Databases: RDS, Aurora, DynamoDB, Cloud SQL
What This Concept Is
A managed database is one where the provider runs the installation, patching, backup, replication, and failover, while you still own the schema, queries, and data lifecycle. Two main shapes:
- Managed relational - Postgres, MySQL, MariaDB, SQL Server, Oracle. Offerings: Amazon RDS (generic engines), Amazon Aurora (AWS-rebuilt Postgres/MySQL with decoupled storage), Google Cloud SQL, Azure Database for PostgreSQL / MySQL.
- Managed NoSQL - key-value or document stores with horizontal scale. Offerings: Amazon DynamoDB, Google Firestore / Bigtable, Azure Cosmos DB.
RDS is "Postgres, but we run the machine." Aurora is "Postgres protocol, but storage and replication are redesigned for the cloud" - auto-growing storage, 6 copies across 3 AZs, much faster failover. DynamoDB is "give up SQL, get single-digit-ms latency at any scale."
They all provide:
- automated backups and point-in-time recovery
- instance-level HA (Multi-AZ standby) or storage-level HA (Aurora)
- encryption at rest and in transit
- IAM-based control plane; database-native auth for data plane (with IAM auth as an optional add-on for RDS/Cloud SQL)
- VPC integration; the DB lives in a private subnet and is reachable only via its SG
They do not provide:
- schema design, indexing, query tuning
- application-level retry logic for transient failures
- cost management; they are billed per hour of provisioned capacity regardless of your traffic
- disaster-recovery runbooks (automated backup is not a rehearsed restore)
Why It Matters Here
Databases are usually the most expensive, least replaceable component in a production system. Understanding the managed-DB tradeoffs early prevents:
- "we picked DynamoDB but our access patterns are relational" (painful schema contortions)
- "we picked Aurora Serverless and the cold-start latency breaks our SLA" (wrong on/off pattern)
- "RDS Multi-AZ is our disaster plan" (it is an AZ plan, not a region plan)
- "we self-host Postgres on EC2 for flexibility" (you now own patching, backups, HA, at the cost of three engineers' time)
- "our DR restore has never been tested and now
pg_restoreis missing an extension our prod uses"
Concrete Example
A mid-size SaaS needs:
- a transactional OLTP database for orders, users, and invoices
- a session store with sub-10-ms reads at high QPS
- an analytics warehouse (out of scope for Module 1)
Choices:
- RDS PostgreSQL (Multi-AZ) for OLTP:
db.r6g.xlargeinstance with 1 TB of storage, Multi-AZ standby in another AZ. Automated daily backups, 7-day PITR window. Failover is ~60-120 s on the AWS side. - Aurora PostgreSQL as an alternative: same SQL, better failover (~30 s), storage auto-scales, up to 15 read replicas with shared storage. More expensive per hour but cheaper operationally when you scale read traffic.
- DynamoDB for session store: on-demand capacity, single-item GETs in ~5 ms, global tables for multi-region replication. No schema migration headaches; bill scales with reads/writes, not with idle time.
Connection pattern from an app in private subnet:
# DB endpoint resolves to a private IP inside the VPC
psql "host=prod-db.cluster-abc.us-east-1.rds.amazonaws.com sslmode=require \
user=app_writer password=$(aws secretsmanager get-secret-value --secret-id prod/db --query SecretString --output text | jq -r .password)"
Use connection pooling (RDS Proxy, PgBouncer) so a 300-container service doesn't exhaust max_connections.
Gotchas worth naming:
- RDS Multi-AZ is a synchronous standby in another AZ of the same region. It does not protect you from a regional failure. For that you need cross-region read replicas or Aurora Global Database.
- DynamoDB's consistency default is eventual. If you need read-your-writes, request
ConsistentRead=true, at double the cost. - RDS minor-version upgrades can be applied automatically in a maintenance window. If your app is sensitive to specific Postgres extensions or patch behavior, disable auto-upgrades and manage them in your change windows.
- Cloud SQL on GCP defaults to private IP only when you enable Private Service Connect - new projects still default to a public IP if you click through. Verify before enabling internet-facing databases by accident.
Common Confusion / Misconception
"Managed means zero operations." You still design schemas, indexes, and queries. You still monitor slow queries, tune connection pooling, and size instances. The provider does the installation; you do the engineering.
"Aurora is Postgres." It speaks the Postgres wire protocol and most SQL, but the storage engine and replication are different. Some extensions, features, and behaviors differ; some pg_* utilities are unavailable. Read the compat notes.
"DynamoDB is infinitely scalable with no tradeoffs." Its scalability depends on your partition key distribution. A hot key means a single partition bottleneck, no matter how much capacity you have. Design keys up front.
"Backups == restores." An untested restore is not a recovery plan. Rehearse a full restore quarterly against a non-prod account. Time it. Record whether it matches your RTO.
Gotcha: On RDS, the admin user is not superuser. Certain operations (install extensions, ALTER SYSTEM) require going through parameter groups or the provider's blessed rds_* roles. On Aurora, even more is locked down. This matters the first time you try to install pg_cron or pg_stat_statements.
How To Use It
For any new database:
- Describe the access pattern: OLTP with complex joins? Key-value with high fan-out? Time-series? Analytics?
- Start with managed; only self-host if you can defend it in writing.
- Relational: prefer Aurora for scale and failover, RDS for cost at small size.
- NoSQL: DynamoDB if access patterns are key/index-based; think of partition keys first. For document-heavy work on GCP, Firestore; on Azure, Cosmos DB with the right API (SQL, Mongo, Cassandra).
- Always configure automated backups, PITR, encryption at rest, and IAM authentication where supported.
- Put the database in a private subnet with a security group that only the app tier can reach.
- Store credentials in Secrets Manager / Secret Manager / Key Vault, not in env vars.
- Rehearse failover (for Multi-AZ) and restore (for PITR/snapshots). A DB that has never failed over will fail differently than you expect.
Check Yourself
- RDS Multi-AZ protects you from which failure? Which failure does it not protect you from?
- Why is DynamoDB a poor choice if your access pattern is "list all orders for a user filtered by status, sorted by date, and join with products"?
- What is the one-sentence operational difference between Aurora and RDS Postgres?
- Your Aurora cluster has a 90-second failover in tests. Your app holds connections 300 seconds. What happens during a failover, and what do you change?
- A team wants to skip PITR "to save money." Write the counter-argument in under 60 words.
Mini Drill or Application
For a hypothetical e-commerce site, in fifteen minutes pick a managed database for each of: user accounts, product catalog, session store, order history. Justify each choice, size the instance or capacity, name the backup and DR plan, and flag one risk per choice.
Extension: take your DynamoDB choice and sketch the partition key, sort key, and one GSI for the "list recent orders by user" query. If you cannot, the data shape is telling you DynamoDB is not the right tool.
Read This Only If Stuck
- Amazon RDS overview - engines, Multi-AZ, backup model
- Amazon RDS: Multi-AZ deployments - synchronous standby semantics
- Amazon Aurora overview - storage architecture and compatibility framing
- Amazon DynamoDB: Introduction - partition-key model, capacity modes
- Google Cloud: Cloud SQL overview - the GCP managed-relational story
- Azure: Azure Database for PostgreSQL - Flexible Server - Azure's managed Postgres, HA model
- Linux Command Line: Processes and top - diagnosing resource pressure on self-managed DB hosts (and understanding what RDS hides)