Threat Modeling (STRIDE) for Cloud Services

What This Concept Is

Threat modeling is the disciplined version of the question "what can go wrong here". You pick a system, draw its trust boundaries, list its assets, and then walk a checklist of attacker moves against it.

STRIDE is Microsoft's attacker-move checklist. It gives you one category per letter:

Spoofing -- pretending to be someone else
Tampering -- modifying data in flight or at rest
Repudiation -- denying an action with no way to prove it
Information disclosure -- reading what you should not
Denial of service -- preventing legitimate use
Elevation of privilege -- gaining rights you should not have

STRIDE is not the only framework. Alternatives include PASTA (Process for Attack Simulation and Threat Analysis, more process-heavy), LINDDUN (privacy-focused: Linkability, Identifiability, Non-repudiation, Detectability, Disclosure, Unawareness, Non-compliance), and attack trees. STRIDE fits cloud services well because each letter maps cleanly to a control surface a cloud engineer actually owns: IAM, queue policies, object-store policies, rate limits, encryption, and audit logs.

The OWASP Foundation frames threat modeling around four questions that pair naturally with STRIDE: what are we working on, what can go wrong, what are we going to do about it, and did we do a good job? Use STRIDE to answer question 2; use a control catalog (cloud Well-Architected, NIST SP 800-53) to answer question 3; use follow-up reviews and incident retros to answer question 4.

Why It Matters Here

In cloud systems, most damaging incidents trace back to something that would have been obvious in a threat model. A leaked IAM key is an E finding. A log file with PII in plain text is an I finding. A webhook that does not verify its signature is an S finding. An S3 bucket readable by the world is an I finding that a one-minute STRIDE pass would have caught. If you never sit down and enumerate these, you are shipping them.

Threat modeling is the cheapest security work you can do. The labor is one engineer-day per service per feature for a first pass, ~2 hours for incremental updates. Compared to the cost of an incident (engineering, customer trust, legal notification, regulatory fine) it is asymmetric.

It is also the first security work that scales. A good model becomes reusable: the same shape of service (API -> queue -> worker -> store) gets the same shape of mitigations. The Building Secure and Reliable Systems book makes this point repeatedly -- security has to be designed in early, because the cost curve of adding it later is steep and the failure modes compound.

Concrete Example: One Finding per STRIDE Letter

Take a simple "uploads service": users upload files through an API, files land in object storage, a worker processes them and writes results to a database. Assume all three run in one cloud account behind a load balancer.

The output of a threat-modeling session is typically a table. For this service, one page looks like:

#	Asset	Letter	Finding	Likelihood	Impact	Mitigation	Owner
1	Queue	S	Worker does not verify signer of queue messages; any internal service can inject jobs	Med	High	Message signatures + scoped SQS/PubSub IAM	Worker team
2	Objects	T	Files uploaded but not integrity-checked; bytes can be silently replaced	Low	High	Store SHA-256 at upload; verify on read; object versioning with MFA-delete	Platform
3	API	R	`DELETE /data` returns 200 but writes no audit log	High	Med	Append-only audit log keyed (user, action, time), shipped off-host	API team
4	Objects	I	Result objects use predictable filenames in a public-read bucket	Med	High	Default-private bucket, signed URLs, block-public-access at account	Platform
5	API	D	Upload endpoint has no body-size limit; one big client fills the queue	High	Med	Body-size cap, per-identity rate limits, dead-letter queue	API team
6	Worker	E	Worker IAM role grants `s3:*` instead of `s3:GetObject` on one prefix	High	High	Resource-scoped actions + condition keys (`aws:SourceArn`, `aws:PrincipalOrgID`)	Platform

Each mitigation is boring, specific, and cheap -- which is what good threat-modeling output looks like. A mitigation that reads "improve security" is not a mitigation; it is a TODO.

A useful shortcut: most STRIDE letters map to a dominant control family. S -> authn / signatures. T -> integrity hashes, signed commits, WORM storage. R -> audit logs off-host. I -> encryption + access policy. D -> rate limits + quotas + backpressure. E -> least-privilege IAM. When you cannot find a mitigation, you usually just have not looked in the right family.

Common Confusion / Misconception

"Threat modeling is a vulnerability scan." No. Vulnerability scanning looks for known CVEs in deployed code. Threat modeling looks for design-time weaknesses in the system as drawn. Both are useful; they catch different things. A service with zero CVEs and an R-class missing audit log is still broken.

"STRIDE should list every bad thing that could happen." The opposite. Threat modeling is structured enumeration against a specific system: one diagram, one trust-boundary list, one pass per letter, with one named mitigation each. A good first model is one page. A bad first model is ten pages of "maybe" that never turn into action.

"STRIDE is the control list." STRIDE finds threats; it does not deploy controls. The output is a list of decisions, not a security posture. Pair STRIDE with a control catalog (NIST SP 800-53, CIS Benchmarks, cloud Well-Architected) so every finding lands on a known, pre-approved control.

"Trust boundaries are network boundaries." Not only. A trust boundary is anywhere your assumptions about the caller change: between two teams' accounts, between your code and a third-party library call, between a row in your DB and the same row after a partner integration writes to it. Draw the boundary at each assumption change; STRIDE each one.

How To Use It

Run this cycle for any service more important than a toy:

Draw the system. Boxes are components, lines are data flows, double lines are trust boundaries (between the internet and your VPC, between two teams' accounts, between app and data tier).
List assets. What is valuable? Usually customer data, credentials, computing resources, and reputation.
Walk STRIDE. For each letter, ask "how would an attacker achieve this against this system?". Write at least one plausible finding per letter.
Score and triage. Rate each finding on a 3x3 likelihood x impact grid. Mitigate the top-right; document and accept the bottom-left; the middle is where design effort lands.
Assign a mitigation per finding. Concrete, owned, and ideally already a control you know how to configure (IAM policy, bucket policy, queue signature, audit log sink).
Store the model next to the code. A threat-model.md in the service repo, reviewed like any other design artifact, beats a wiki page nobody edits.
Schedule revisits. Every new feature, every architectural change, every new external integration is a reason to rerun the relevant rows.

Check Yourself

Give one real finding for each STRIDE letter on a service you have worked on.
What is the difference between a threat and a vulnerability, and which one does STRIDE enumerate?
Why does STRIDE belong at design time rather than only at pentest time?
Which two STRIDE letters are most often under-modeled by engineers who rely on "the cloud handles it"? (Hint: repudiation and elevation of privilege.)
A finding reads: "any internal service can read from the secrets bucket." Which letter is that, and what is the smallest change that closes it?

Mini Drill or Application

Pick a system you know well (a personal project counts). Spend 30 minutes:

draw three boxes max and the data flows between them
list three assets
write one finding per STRIDE letter, score it likelihood x impact, name one mitigation
flag any finding whose mitigation is "TODO" -- those are the decisions you have not made yet

Keep the output to one page. That page is a real threat model. If you repeat this drill monthly, the same findings will stop surprising you and you will start catching new ones at design time rather than during an incident.

Depth Path

Cluster 1 next: Identity-Centric Security
Reference and Selective Reading -- only open if this page plus one external link did not land

Source Backbone

Security and observability require official docs, but these books provide the systems and reliability backbone behind the practices.

Building Secure and Reliable Systems - primary book backbone for security/reliability tradeoffs.
Software Engineering at Google - support for operational engineering and process.
The Linux Command Line - support for operational investigation and automation.

What This Concept Is​

Why It Matters Here​

Concrete Example: One Finding per STRIDE Letter​

Common Confusion / Misconception​

How To Use It​

Check Yourself​

Mini Drill or Application​

See also (external)​

Depth Path​

Source Backbone​