IAM: Principals, Policies, Roles vs Users

What This Concept Is

IAM (Identity and Access Management) is the cloud's access-control backbone. Every API call hits IAM first. If IAM says no, nothing else matters.

Core vocabulary (AWS, with parallels in GCP and Azure):

Principal - who or what is making the request. An IAM user, an IAM role (assumed by a workload or federated identity), an AWS service, or a cross-account principal.
Policy - a JSON document that grants or denies permissions. Contains one or more Statement objects, each with Effect (Allow/Deny), Action, Resource, optional Principal, and optional Condition.
IAM user - a long-lived identity with static credentials (password, access keys). Tied to a specific human.
IAM role - an identity that principals assume temporarily. Has no static credentials; issues short-lived session tokens. Used by EC2, Lambda, ECS tasks, federated SSO users, and cross-account access.
Policy attachment - identity-based policies live on users, groups, or roles; resource-based policies live on resources (S3 buckets, KMS keys, SQS queues).

The authorization outcome is: explicit Deny wins, then explicit Allow, then default Deny.

Cross-provider analogy. GCP IAM grants roles (bundles of permissions) to principals (users, groups, service accounts) on resources arranged in a project/folder/organization hierarchy, with allow policies and deny policies. Azure RBAC assigns role definitions to principals at a scope (subscription, resource group, resource) with optional conditions. The structure - who does what on what, under what conditions - is universal; the JSON shapes differ.

Why It Matters Here

IAM is always on the customer side of the shared-responsibility model. It is also the #1 source of real-world cloud breaches (leaked keys, over-broad roles, wildcards on production buckets).

Every later module assumes you can:

write a tight policy for a specific action on a specific resource
prefer roles over users for workloads
distinguish identity-based from resource-based policies
use conditions to scope access by source, time, tag, or MFA
understand policy evaluation order across identity, resource, SCP, permissions boundaries, and session policies

Without those habits, every other safeguard (VPC rules, encryption, audit logs) only slows down the attacker, not stops them. The Linux chmod/chown/sudo mental model is a useful first scaffolding: principals, permissions, resources, explicit denies - scaled up, with JSON and a cloud-scale blast radius.

Concrete Example

A Lambda function needs to read from one S3 bucket and write logs.

Bad policy (common):

{
  "Version": "2012-10-17",
  "Statement": [
    { "Effect": "Allow", "Action": "*", "Resource": "*" }
  ]
}

This grants "do anything, anywhere." Every breach post-mortem has a version of this.

Better, narrow policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ReadUploads",
      "Effect": "Allow",
      "Action": ["s3:GetObject"],
      "Resource": "arn:aws:s3:::acme-uploads-prod/*",
      "Condition": {
        "StringEquals": { "s3:ExistingObjectTag/owner": "team-payments" }
      }
    },
    {
      "Sid": "WriteLogs",
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:us-east-1:123456789012:log-group:/aws/lambda/resize-images:*"
    }
  ]
}

Notice:

one bucket, not *
only GetObject, not s3:*
a Condition on an object tag so even inside that bucket you only see objects belonging to your team
log-group is the function's own group, not all groups

The role trust policy specifies who can assume this role (in this case, the Lambda service):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": { "Service": "lambda.amazonaws.com" },
      "Action": "sts:AssumeRole"
    }
  ]
}

When the function runs, the Lambda service assumes the role and hands the function temporary credentials. No static keys exist anywhere.

Verify from the shell (on an instance or local dev with a role chain):

aws sts get-caller-identity
# { "Account": "...", "Arn": "arn:aws:sts::...:assumed-role/role-name/session-name" }
aws iam simulate-principal-policy \
  --policy-source-arn arn:aws:iam::...:role/resize-images \
  --action-names s3:GetObject \
  --resource-arns arn:aws:s3:::acme-uploads-prod/tenant42/foo.jpg

simulate-principal-policy is the fastest way to answer "would this call be allowed?" without actually making it.

Common Confusion / Misconception

"I will use an IAM user for my EC2 instance because it's simpler." Don't. EC2 has instance profiles that attach a role. The instance gets short-lived credentials automatically rotated. Using a user means embedding long-lived keys on disk, which is how leaks happen.

"Effect: Allow and Principal: * is the same as a public bucket." A wildcard principal in a resource-based policy grants access to anyone on the internet. Combined with an S3 bucket, that is a public bucket. Always scope the Principal.

"Conditions are optional." Conditions are the main tool for least privilege: scope by aws:SourceIp, aws:MultiFactorAuthPresent, aws:ResourceTag/*, aws:PrincipalTag/*, aws:SourceVpc, request time, and more. A policy without conditions is often too broad.

"Managed policies are safer than inline." They are easier to audit centrally, but AWS-managed policies like AdministratorAccess or PowerUserAccess are routinely over-broad. A narrow inline policy is often safer than a wide managed one.

"GCP and Azure don't have the Deny wins rule." GCP added explicit deny policies (they win over allow). Azure has deny assignments (limited cases) and Azure Policy deny effects. The default on all three is "no explicit allow -> deny," but the engines now have explicit-deny primitives.

Gotcha: An explicit Deny anywhere in the evaluated policies wins over any Allow. This is how Service Control Policies (SCPs) at the organization level can block actions even if an account-level policy would allow them. Also, permissions boundaries (AWS) and principal access boundaries (GCP) cap what a principal can ever be granted - a role with a max-limit policy cannot be widened by attaching more inline policies.

How To Use It

For every workload:

Define a role per workload, never shared.
Start from zero permissions, add only what is needed, and name every Action explicitly (avoid * as the last character when you can).
Scope Resource to specific ARNs, not *.
Add Condition blocks for source (VPC, IP), tags, and MFA where relevant.
Use resource-based policies (S3 bucket policy, KMS key policy) as a second layer for cross-account access.
Access-review quarterly: list roles, their attached policies, and last-used timestamps. Delete the unused.
Enable IAM Access Analyzer (or GCP Policy Analyzer, Azure PIM reviews) and treat findings as real tickets.
Rotate human access to federated SSO (Identity Center / Workforce Identity / Entra ID) - no long-lived IAM users for humans, ever.

Check Yourself

Why is a role preferable to a user for anything that is not a human?
In { Effect: Allow, Action: "s3:*", Resource: "*" }, what are the three separate ways you should tighten this?
Explain in one sentence why explicit Deny winning is a design feature, not a bug.
A developer's policy Allows s3:* on a bucket, but the org's SCP denies s3:PutObjectAcl in the prod OU. Can the developer set an object ACL? Why or why not?
On Linux, chmod 777 is universally wrong. Write the one-line IAM equivalent and explain what makes it equally wrong.

Mini Drill or Application

In twenty minutes, write an IAM policy for a batch job that (a) reads objects from s3://acme-inbox-prod/*, (b) writes objects to s3://acme-processed-prod/*, (c) publishes a message to one SNS topic, and (d) logs to its own CloudWatch log group. Include at least one Condition. Critique your own policy for wildcards.

Extension: run aws iam simulate-principal-policy (or GCP's policy-troubleshooter / Azure's Check access) against the role for each of the four actions and a fifth "forbidden" action. Confirm the first four allow and the fifth denies. Save the output as a regression test for the next time someone edits the policy.

Read This Only If Stuck

IAM JSON policy element reference - every field with examples
IAM identities: users, groups, roles - canonical explanation of role vs user
IAM best practices - the up-to-date opinionated guidance (least privilege, temporary credentials, MFA, rotation)
Google Cloud IAM overview - principals, roles, allow/deny policies, policy inheritance
Azure: What is Azure role-based access control (RBAC)? - Azure's RBAC model and scope hierarchy
Linux Command Line: Owners, group members, and everybody else - the filesystem-level mental model that IAM generalizes
Linux Command Line: sudo and chgrp - privilege escalation on a single host; IAM roles are the cloud-scale version of "run as a different identity"

What This Concept Is​

Why It Matters Here​

Concrete Example​

Common Confusion / Misconception​

How To Use It​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​