VPCs, Subnets, Route Tables, NAT
What This Concept Is
A VPC (Virtual Private Cloud) is your private, logically isolated network inside a cloud region. You choose a private IPv4 CIDR range (for example 10.0.0.0/16) and carve it into smaller subnets, each tied to a single AZ.
Core pieces:
- CIDR block - the IP address range for the VPC (
10.0.0.0/16gives ~65k addresses). Secondary CIDRs can be added later; overlapping with peered networks is a forever-problem. - Subnet - a slice of that CIDR bound to one AZ (
10.0.1.0/24inus-east-1a). Each subnet is public or private, distinguished only by whether its route table sends0.0.0.0/0to an internet gateway. - Route table - a list of rules mapping destination CIDRs to a next hop (internet gateway, NAT gateway, VPC endpoint, transit gateway, peering). Every subnet is associated with exactly one route table.
- Internet Gateway (IGW) - the VPC's connection point to the public internet; needed for public subnets.
- NAT Gateway - a managed outbound-only gateway. Instances in private subnets send outbound traffic through NAT to reach the internet; inbound from the internet is not allowed.
- Security groups (stateful, attached to ENIs) and network ACLs (stateless, attached to subnets) sit on top as the packet filters.
GCP and Azure use similar primitives with different names: GCP's VPC is global (subnets are regional) and uses "Firewall Rules" instead of SGs; Azure's VNet is regional like AWS, uses "NSGs," and puts NAT behind "NAT Gateway" or user-defined routes. The concept graph is the same: CIDR -> subnet -> route -> filter -> gateway.
Why It Matters Here
Every resource with an IP address lives in a subnet. If you do not design the VPC right, no amount of good application code saves you:
- a database in a "private" subnet whose route table sends
0.0.0.0/0to an IGW is not actually private - a Lambda or Fargate task in a subnet with no NAT cannot reach external APIs
- a poorly sized subnet (e.g.,
/27) runs out of IPs as soon as you scale an ASG or add ENIs - a VPC that uses the same CIDR as your on-premises network cannot be peered without renumbering
- NAT charges per hour and per GB processed - an unexpected cross-AZ NAT hairpin can 3x a month's networking bill
Later modules (ECS, RDS, Kubernetes, load balancers, private endpoints) all anchor into this primitive.
Concrete Example
A classic 3-tier layout across three AZs in us-east-1:
VPC: 10.0.0.0/16
AZ us-east-1a:
public subnet 10.0.0.0/24 (ALB, NAT)
private app 10.0.10.0/24 (app servers, Fargate tasks)
private data 10.0.20.0/24 (RDS)
AZ us-east-1b:
public subnet 10.0.1.0/24
private app 10.0.11.0/24
private data 10.0.21.0/24
AZ us-east-1c:
public subnet 10.0.2.0/24
private app 10.0.12.0/24
private data 10.0.22.0/24
Route tables:
- Public subnets:
0.0.0.0/0 -> IGW,10.0.0.0/16 -> local - Private app subnets:
0.0.0.0/0 -> NAT Gateway (in same AZ),10.0.0.0/16 -> local - Private data subnets:
10.0.0.0/16 -> local(no 0.0.0.0/0 route at all)
Minimal Terraform skeleton:
resource "aws_vpc" "main" { cidr_block = "10.0.0.0/16" }
resource "aws_subnet" "public_a" {
vpc_id = aws_vpc.main.id; cidr_block = "10.0.0.0/24"; availability_zone = "us-east-1a"
map_public_ip_on_launch = true
}
# ... similar for app_a/data_a and b/c AZs ...
resource "aws_nat_gateway" "a" { subnet_id = aws_subnet.public_a.id; allocation_id = aws_eip.nat_a.id }
Notice the discipline: only the ALB is reachable from the internet; app traffic is north-south through the ALB; the database has no internet route at all; outbound API calls from the app go through NAT.
Sanity checks from a running instance (shell):
ip addr # confirm private IP
ip route # default route via the VPC router
curl http://169.254.169.254/latest/meta-data/ # IMDS - shows subnet/AZ
curl -I https://example.com # test egress through NAT
If the last call fails with DNS OK but connect hang, you likely have no NAT path or a missing 0.0.0.0/0 route.
Common Confusion / Misconception
"A subnet is private because I named it private." A subnet is private if and only if its route table has no path to an internet gateway. The label means nothing.
"One NAT Gateway is enough." NAT gateways are AZ-scoped. If you put a single NAT in us-east-1a and the AZ fails, every private subnet in 1b and 1c loses outbound internet access as well. Place one NAT per AZ for AZ-resilient designs. One-NAT designs also silently 2-3x cross-AZ transfer bills.
"0.0.0.0/0 in a security group is fine if I also use a WAF." 0.0.0.0/0 on a non-public-facing port is almost always a mistake. The least-privilege default is "only the security groups that need to reach me, can reach me."
"Security groups block egress by default." They allow all egress by default. Only inbound is denied by default. If you want egress control, you must write the rules explicitly.
"NACLs and security groups do the same thing." NACLs are stateless subnet-level filters; security groups are stateful ENI-level filters. NACLs are easier to misconfigure (you must allow ephemeral return ports) and are rarely needed except for subnet-wide deny rules.
Gotchas:
- Overlapping CIDRs are impossible to peer or connect via Transit Gateway without NAT. Pick VPC CIDRs against an organization-wide IP plan, not per project.
- VPC endpoints (S3 Gateway endpoint, DynamoDB Gateway endpoint) skip NAT entirely and are free. Forgetting them means paying NAT data-processing on every S3 byte.
- AWS reserves 5 IP addresses per subnet (
.0, .1, .2, .3, .255). A/28gives 11 usable, not 16.
How To Use It
For any new workload:
- Decide the VPC CIDR against your org's IP plan; use
/16unless you have a reason not to. Reserve additional space for peering, DR regions, and VPC growth. - Design at least three AZs' worth of subnets: public, private-app, private-data (or similar).
- Size each subnet based on the worst-case instance/ENI count × 3-5x safety. For Fargate/EKS, size much larger than you think.
- Create one NAT per AZ for outbound; public ALB in public subnets; app tier in private-app; data tier in private-data.
- Keep route tables minimal; audit every
0.0.0.0/0route. - Use security groups by role (
sg-alb,sg-app,sg-db) and reference SGs from SGs, not CIDR lists. - Add VPC Gateway Endpoints for S3 and DynamoDB on day one; add Interface Endpoints for other high-volume services (SSM, STS, Secrets Manager).
- Enable VPC Flow Logs to a cheap S3 storage class; they are priceless during a security incident.
Check Yourself
- What single line of configuration determines whether a subnet is public or private?
- Why is "one NAT Gateway" an anti-pattern in a production VPC?
- What happens if two VPCs you need to peer have overlapping CIDRs?
- An instance in a private subnet can resolve DNS for
google.combut cannot connect. What are the top three causes? - Name the exact number of usable IPs in a
/28subnet and explain why it is not 16.
Mini Drill or Application
In fifteen minutes, sketch a VPC for a 3-tier app: web, app, database. Include VPC CIDR, subnet CIDRs across three AZs, route-table contents for each subnet class, NAT placement, and one security-group per tier with inbound rules written out.
Extension: on an existing VPC you can access, run aws ec2 describe-route-tables and confirm the default route of every "private" subnet does not point to an IGW. If it does, you have just found a real bug.
Read This Only If Stuck
- Amazon VPC: How it works - the canonical model
- Amazon VPC: Subnets for your VPC - subnet and route-table behavior
- Amazon VPC: NAT gateways - NAT placement, pricing, AZ failure semantics
- Google Cloud: VPC network overview - how GCP's global VPCs differ from AWS regional VPCs
- Azure: Virtual networks overview - VNet, NSG, and route-table concepts
- Linux Command Line: Examining and monitoring a network (ping, traceroute, ip) - shell tools to verify connectivity from inside a VM
- Linux Command Line: netstat and secure communication with remote hosts - socket-level diagnostics when a subnet "should be reachable but isn't"