Providers, Resources, Data Sources
What This Concept Is
Terraform has three block types that appear in nearly every config. Knowing which one does what is 80% of reading HCL.
Provider: a plugin that knows how to talk to one API (AWS, Azure, GCP, GitHub, Datadog, Cloudflare, Kubernetes, etc.). You configure it once with credentials and a region.
Resource: something Terraform manages -- creates, updates, and destroys. aws_s3_bucket, google_compute_instance, kubernetes_deployment. The full lifecycle of the thing is Terraform's responsibility.
Data source: something Terraform reads but does not manage. Looking up an AMI ID, the current AWS account, the caller's IAM identity, an existing VPC. Querying, not owning.
The distinction matters because terraform destroy deletes resources; it ignores data sources.
Why It Matters Here
These three blocks are the vocabulary of every Terraform root module. A senior engineer can glance at someone's .tf file and immediately separate "infrastructure we own" from "information we read from the world" from "SDK configuration." You want to develop that eye early.
In review, the split also tells you where to worry:
- resources: blast radius, state impact, ordering
- data sources: cost of API calls at plan time, stale reads
- providers: version pins, regional surprises, credential scope
Concrete Example
A real (if tiny) AWS root module using all three:
terraform {
required_version = ">= 1.6"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.40"
}
}
}
provider "aws" {
region = var.region
}
data "aws_caller_identity" "current" {}
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"]
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
}
}
resource "aws_instance" "web" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.small"
subnet_id = var.subnet_id
tags = {
Name = "web-${var.env}"
ManagedBy = "terraform"
Account = data.aws_caller_identity.current.account_id
}
}
Line-by-line:
terraform { required_providers ... }declares the provider and a version constraint.~> 5.40means "any 5.40.x or later 5.x," not 6.provider "aws" { region = ... }configures credentials implicitly (env vars, profile) and the region.data "aws_caller_identity" "current" {}reads the account ID at plan time. It is not a resource;destroywill not affect it.data "aws_ami" "ubuntu" { ... }queries for the latest Ubuntu AMI owned by Canonical.resource "aws_instance" "web" { ... }creates and manages an EC2 instance.amiandaccount_idare pulled from the data sources.
Common Confusion / Misconception
"data is like a variable." No. A data block makes an API call every plan and can block if the query is slow. A variable is just input. Variables go in variable blocks; see Concept 05.
"I should use a data source to 'look up' resources managed elsewhere." This is how people accidentally create two Terraform stacks that both think they own the same resource. Use data sources for things genuinely not under Terraform management, or use the terraform_remote_state data source to read another stack's outputs.
"Providers are interchangeable." They are not. hashicorp/aws and oracle/oci have different block structures even for analogous resources. Provider docs are the source of truth; see the Terraform Registry.
"Version-pinning providers is optional." Leaving version unspecified means a terraform init six months from now may pull a breaking release. Always pin with ~> or an explicit range.
How To Use It
- Every root module starts with a
terraform { required_providers ... }block. No exceptions. - Put each provider's configuration in a single place (
providers.tf). Do not scatterprovider {}blocks across files. - When you catch yourself writing a data source, ask "is this really not managed by someone?" If a sibling team owns it, use their stack's remote state output instead.
- Read provider docs on the Terraform Registry, not blog posts. Resource argument names drift between major versions.
Check Yourself
- Which block type does
terraform destroyignore, and why? - What breaks if you remove the
required_providersblock from a config that was working yesterday? - Name one resource type and one data source type for the same underlying cloud object, and explain when you would use each.
Mini Drill or Application
In 15 minutes, write a config that uses all three block types for any cloud or for the null / random providers if you have no cloud handy:
- declare a provider with a version constraint
- read one data source (
nullhas none; tryrandom_idor use a GitHub provider with no auth) - create one resource that references the data source
Run terraform plan and point at which attribute value came from the data source, which from the config, and which from the provider's defaults.
See also (external)
- Terraform Language: Providers -- how providers are declared, installed, and versioned.
- Terraform Language: Query data from external sources -- data block semantics,
count/for_each, and apply-time vs plan-time reads.
Source Backbone
Infrastructure-as-code details are tool-specific, but these local books provide the operational backbone for shell, Git, and change discipline.
- Pro Git - versioned infrastructure changes, branching, review, and rollback habits.
- Git from the Bottom Up - mental model for stateful change history.
- The Linux Command Line - shell and automation grounding for infrastructure work.