Providers, Resources, Data Sources

What This Concept Is

Terraform has three block types that appear in nearly every config. Knowing which one does what is 80% of reading HCL.

Provider: a plugin that knows how to talk to one API (AWS, Azure, GCP, GitHub, Datadog, Cloudflare, Kubernetes, etc.). You configure it once with credentials and a region.

Resource: something Terraform manages -- creates, updates, and destroys. aws_s3_bucket, google_compute_instance, kubernetes_deployment. The full lifecycle of the thing is Terraform's responsibility.

Data source: something Terraform reads but does not manage. Looking up an AMI ID, the current AWS account, the caller's IAM identity, an existing VPC. Querying, not owning.

The distinction matters because terraform destroy deletes resources; it ignores data sources.

Why It Matters Here

These three blocks are the vocabulary of every Terraform root module. A senior engineer can glance at someone's .tf file and immediately separate "infrastructure we own" from "information we read from the world" from "SDK configuration." You want to develop that eye early.

In review, the split also tells you where to worry:

resources: blast radius, state impact, ordering
data sources: cost of API calls at plan time, stale reads
providers: version pins, regional surprises, credential scope

Concrete Example

A real (if tiny) AWS root module using all three:

terraform {
  required_version = ">= 1.6"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.40"
    }
  }
}

provider "aws" {
  region = var.region
}

data "aws_caller_identity" "current" {}

data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"]

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.small"
  subnet_id     = var.subnet_id

  tags = {
    Name      = "web-${var.env}"
    ManagedBy = "terraform"
    Account   = data.aws_caller_identity.current.account_id
  }
}

Line-by-line:

terraform { required_providers ... } declares the provider and a version constraint. ~> 5.40 means "any 5.40.x or later 5.x," not 6.
provider "aws" { region = ... } configures credentials implicitly (env vars, profile) and the region.
data "aws_caller_identity" "current" {} reads the account ID at plan time. It is not a resource; destroy will not affect it.
data "aws_ami" "ubuntu" { ... } queries for the latest Ubuntu AMI owned by Canonical.
resource "aws_instance" "web" { ... } creates and manages an EC2 instance. ami and account_id are pulled from the data sources.

Common Confusion / Misconception

"data is like a variable." No. A data block makes an API call every plan and can block if the query is slow. A variable is just input. Variables go in variable blocks; see Concept 05.

"I should use a data source to 'look up' resources managed elsewhere." This is how people accidentally create two Terraform stacks that both think they own the same resource. Use data sources for things genuinely not under Terraform management, or use the terraform_remote_state data source to read another stack's outputs.

"Providers are interchangeable." They are not. hashicorp/aws and oracle/oci have different block structures even for analogous resources. Provider docs are the source of truth; see the Terraform Registry.

"Version-pinning providers is optional." Leaving version unspecified means a terraform init six months from now may pull a breaking release. Always pin with ~> or an explicit range.

How To Use It

Every root module starts with a terraform { required_providers ... } block. No exceptions.
Put each provider's configuration in a single place (providers.tf). Do not scatter provider {} blocks across files.
When you catch yourself writing a data source, ask "is this really not managed by someone?" If a sibling team owns it, use their stack's remote state output instead.
Read provider docs on the Terraform Registry, not blog posts. Resource argument names drift between major versions.

Check Yourself

Which block type does terraform destroy ignore, and why?
What breaks if you remove the required_providers block from a config that was working yesterday?
Name one resource type and one data source type for the same underlying cloud object, and explain when you would use each.

Mini Drill or Application

In 15 minutes, write a config that uses all three block types for any cloud or for the null / random providers if you have no cloud handy:

declare a provider with a version constraint
read one data source (null has none; try random_id or use a GitHub provider with no auth)
create one resource that references the data source

Run terraform plan and point at which attribute value came from the data source, which from the config, and which from the provider's defaults.

Source Backbone

Infrastructure-as-code details are tool-specific, but these local books provide the operational backbone for shell, Git, and change discipline.

Pro Git - versioned infrastructure changes, branching, review, and rollback habits.
Git from the Bottom Up - mental model for stateful change history.
The Linux Command Line - shell and automation grounding for infrastructure work.

What This Concept Is​

Why It Matters Here​

Concrete Example​

Common Confusion / Misconception​

How To Use It​

Check Yourself​

Mini Drill or Application​

See also (external)​

Source Backbone​