Skip to main content

Security Groups, NACLs, and VPC Endpoints: The Network Moat

What This Concept Is

Even in an identity-first world, you still want the network layer to narrow what is reachable. In cloud networks, three controls do most of the heavy lifting:

  • Security groups (SGs) -- stateful firewalls attached to instances or interfaces. They evaluate allow rules, and return traffic is automatically allowed. They are the primary per-workload control.
  • Network ACLs (NACLs) -- stateless firewalls at the subnet level. They evaluate both allow and deny rules in order, and return traffic needs its own rule. They are coarser and are usually used as a last-ditch account-wide guardrail.
  • VPC endpoints -- private routes between your VPC and a cloud provider's managed service (object storage, KMS, queue service, etc.), so that calls to that service never traverse the public internet.

Vocabulary varies by provider (AWS calls them these names; GCP has firewall rules + VPC Service Controls + Private Google Access; Azure has NSGs + Service Endpoints / Private Endpoints) but the shape of the moat is the same.

Why It Matters Here

The network layer is no longer a perimeter, but it is still a filter. A well-designed network moat:

  • reduces the attack surface reachable from the internet
  • contains lateral movement if a workload is compromised
  • keeps sensitive traffic (KMS, databases, internal APIs) off the public internet
  • makes data exfiltration harder by cutting off unnecessary egress

Identity-based controls defend "who can call this"; network controls defend "who can reach this to try". Both matter.

Concrete Example

A web app with three tiers: a load balancer, an application service, and a database. The company also uses a managed object store for user uploads and a managed KMS for envelope encryption.

A badly scoped network design:

  • SG on app servers allows all inbound from 0.0.0.0/0 on port 443
  • SG on the DB allows inbound from the entire VPC CIDR
  • NACLs are default-open
  • Calls to object storage and KMS go out the internet gateway

A well-scoped network moat:

  • LB security group: inbound 443 from 0.0.0.0/0, outbound only to the app-tier SG on the app port
  • App-tier SG: inbound from the LB SG on the app port; outbound to the DB SG on 5432, to the object store endpoint, and to the KMS endpoint
  • DB SG: inbound from the app-tier SG on 5432 only; outbound constrained to managed service endpoints for backups
  • NACL on the DB subnet: deny inbound from any CIDR other than the app subnet, as a belt-and-suspenders account-wide guardrail
  • VPC endpoints: private endpoints for the object store, KMS, and any queue services so that traffic never leaves the VPC. The app-tier SG does not need internet egress at all

Now a compromised app instance still sees a database on 5432 (the application needs that) but cannot pivot to the DB from a random other subnet, cannot reach the KMS except via the endpoint, and cannot exfiltrate to an arbitrary internet host.

Common Confusion / Misconception

"Why both SGs and NACLs?" SGs are stateful and attached to workloads (the right tool for per-service rules: "app tier may reach DB on 5432"). NACLs are stateless and subnet-scoped (the right tool for coarse, broad-sweep guardrails: "nothing on this subnet ever talks to the internet"). Use SGs as the primary rule source; use NACLs sparingly for sweeping rules. Stateless means every return packet needs its own rule -- that is the footgun.

"VPC endpoints are a performance feature." Performance is a side-effect. Their real security value is cutting off a major exfiltration path: traffic to the managed service no longer leaves the VPC, and the endpoint policy can be scoped per-bucket or per-key. An endpoint policy like aws:PrincipalOrgID equals <your-org> on an S3 VPC endpoint prevents compromised credentials from writing to an attacker-owned bucket through that endpoint.

"The default VPC is production-ready." The default VPC in most cloud accounts has friendly defaults (open SGs, default NACLs, IGW routes, DNS on everywhere) that are not production-safe. New accounts should get a hardened baseline before any real workload lands -- delete the default VPC, use account factory / landing zone patterns.

"Security groups alone are enough." SGs defend north/south and east/west reachability at L4. They do not inspect application layer, they do not enforce mTLS, they do not rate-limit. Pair with an L7 control (service mesh, WAF, API gateway) when request-level enforcement matters (Concept 3).

"Egress rules are unnecessary because the workload is trusted." The workload is only trusted until it is compromised. Default-deny egress with a narrow allowlist (DNS, the managed services you actually use, a specific API partner) is what turns a credential leak from "data exfil" into "failed connection". The Capital One breach is the canonical example of what open egress enables.

"Cilium / eBPF replaces security groups." They add L3-L7 NetworkPolicy enforcement inside the cluster and can replace kube-proxy. They do not replace cloud-level SGs at the VPC edge; both layers still matter.

How To Use It

For each workload, fill out a small table:

DirectionSourceDestinationPortRationale
InboundLB SGApp SG8080HTTP from the LB
OutboundApp SGDB SG5432DB access
OutboundApp SGS3 endpoint443Uploads
OutboundApp SGKMS endpoint443Envelope unwrap
OutboundApp SGDNS resolver53Service discovery
...............

If a row has rationale "because it worked", remove it and see what breaks. Default egress should be deny; every allowed destination is a conscious choice.

A sample Terraform snippet for an app-tier SG with default-deny egress:

resource "aws_security_group" "app" {
name = "app-tier"
vpc_id = var.vpc_id
egress = [] # explicit default-deny; add via aws_security_group_rule below
}

resource "aws_security_group_rule" "app_to_db" {
type = "egress"
security_group_id = aws_security_group.app.id
source_security_group_id = aws_security_group.db.id
from_port = 5432
to_port = 5432
protocol = "tcp"
description = "Postgres"
}

Check Yourself

  1. Why is a security group better than a NACL for per-service rules?
  2. When is a NACL actually the right tool instead of a security group?
  3. What exfiltration path does a VPC endpoint close that a SG alone does not?

Mini Drill or Application

For a 3-tier app you know, draw the network moat. Mark each edge with port and rationale. Then remove any edge you cannot defend. If the system still works, the moat is tighter than it was.

See also (external)

Depth Path


Source Backbone

Security and observability require official docs, but these books provide the systems and reliability backbone behind the practices.