SARIF and the TLCTC — from a pile of findings to a picture of cause

Run Semgrep, CodeQL, Trivy, Grype, Bandit and gosec across one codebase and you get six reports, six vocabularies, and a few thousand findings. What you don't get is an answer to the only question a security lead actually has to defend in a meeting: where are we structurally exposed, and why?

That gap is the reason the integrations/sarif pack exists. It takes the lingua franca of static analysis output — SARIF — and projects every finding onto the ten Top Level Cyber Threat Clusters. The result is not "1,247 findings." It's "your exposure concentrates in #1 Abuse of Functions and #4 Identity Theft, and here's the evidence trail for each." This post walks through how that translation works, and — because it's the more interesting design decision — where the tool deliberately refuses to go.

01 Two languages

SARIF (Static Analysis Results Interchange Format, OASIS 2.1.0) is a tool-neutral JSON container. Every scanner that emits it agrees on the same skeleton: runs[].results[], each result carrying a ruleId, a message, a location, and — if you're lucky — a CWE or CVE somewhere in its taxa, properties, or relationships. SARIF's job is to answer "what did this tool find, and where?" It is, by design, oriented toward the outcome of a scan.

The TLCTC is the opposite axis. It is a cause-oriented taxonomy: ten non-overlapping clusters that classify threats by why a compromise becomes possible — the generic vulnerability being exploited — not by what the scanner happened to name. A SQL injection weakness and a path-traversal weakness are different CWEs, different rule IDs, different tools — but if both let an attacker abuse a server's intended request-handling, the framework wants you to see one shape of risk, not two unrelated tickets.

The core move

SARIF tells you what each tool found. The TLCTC tells you why it matters. The classifier is just the dictionary between the two — and the dictionary already exists.

02 The bridge already exists — CWE

The translation hinges on a join most scanners already give you for free: the CWE. The TLCTC project maintains a canonical, audited CWE → TLCTC mapping of 987 weaknesses. The SARIF pack does not fork or reinterpret it — it loads that single source of truth and looks each finding's CWE up against it.

For findings that carry only a CVE and no CWE — typical of dependency and container scanners like Trivy — there's an offline fallback to the KEV → TLCTC table (1,568 known-exploited CVEs, each pre-resolved to a cluster). No network calls, no NVD round-trips: the pack is stdlib-only and reads from snapshots that ship in the repository.

So the resolution ladder for any one finding is simply:

# resolution ladder (first match wins)
finding ──► CWE present?  ──► canonical CWE→TLCTC lookup   # 987 entries
        └─► CVE only?     ──► offline KEV→TLCTC fallback    # 1,568 CVEs
        └─► neither       ──► unmapped (a blind spot, logged not hidden)

03 Not every CWE is a clean cause

Here's where a naive mapper would go wrong. CWE is a sprawling, uneven catalogue. Some entries are crisp, exploitable causes (CWE-89, SQL injection). Others are abstract category nodes or quality smells that have no business asserting a threat cluster. So the canonical mapping carries a verdict on every entry, and the classifier honours it:

Verdict	Behaviour	Where it lands
Allowed / Allowed-with-Review	Confident cause	Classified into its cluster
Discouraged	Defensible but weak	A low-confidence section
Prohibited / N/A	Not a cause	Silently skipped (visible with `--verbose`)

This matters because a finding can carry several CWEs at once. A Prohibited or N/A CWE on a finding must not short-circuit a perfectly good Allowed one sitting next to it — so the classifier scans every CWE on the finding before it decides, rather than bailing on the first it sees.

When several CWEs disagree

And when a finding genuinely resolves to different clusters across its CWEs? The classifier gathers every usable cluster, unions them, and picks the primary by the framework's lowest-numbered convention — independent of the order the CWEs happened to appear in. (An earlier version returned on whichever CWE sorted first as a string, which meant CWE-100 quietly beat CWE-89; that determinism bug is fixed, and the chosen primary now carries a contributing_cwes trail so the call is auditable.)

Server or client? R-ROLE decides

Some weaknesses are inherently two-faced. Cross-site scripting maps to #2 | #3 — Exploiting Server or Exploiting Client — and the CWE alone can't tell you which. The TLCTC's rule R-ROLE resolves this by the role of the flawed component, and the classifier implements it with file-path globs you configure per project:

# project config: which paths are server-role vs client-role
source_globs = {
  "server": ["src/api/**", "backend/**"],   # → #2
  "client": ["web/ui/**", "frontend/**"],   # → #3
}

Every finding records why its cluster was chosen — the matched glob, the verdict, the source table — so a reviewer can always reconstruct the decision instead of trusting it.

04 The part that matters most: where it stops

It would be easy — and wrong — to make this tool do more. A static finding looks tantalisingly like the first step of an attack. Why not emit a full TLCTC attack path from a scan? Because of Axiom III: threats are causes, not outcomes, and an attack path is a record of steps that actually executed. A SAST finding is a latent cause sitting in source code. Nothing has run. No credential was used, no payload detonated, no boundary was crossed.

Axiom III · causes, not outcomes

Emitting a Layer 3 attack path from a static scan would manufacture steps that never happened. The pack classifies weaknesses — the cause side — and leaves attack-path construction to incident analysis, where there is real evidence of execution.

This is the discipline that keeps the integration honest. A weakness is mapped to the cluster whose generic vulnerability it represents — full stop. It is not dressed up with velocity annotations, Δt timings, or DRE outcomes it has no evidence for. The notation you'd see in a real incident analysis —

#9 ||[human][@External→@Org]|| →[Δt=24h] #7 →[Δt=5m] #4 →[Δt=15m] (#1 + #7) + [DRE: Ac]

— describes events that occurred. A scanner's SARIF file describes events that could. Conflating the two would corrupt the statistics the whole framework depends on. So the pack stays on the cause side of the line, and says so explicitly in its own documentation.

05 What you actually get out

Point it at any conformant .sarif file and it emits any combination of three artefacts:

JSON · cluster summary + per-finding + low-confidence + unmapped Markdown · a PR-comment body TLCTC-SARIF · a standalone enriched report

The JSON gives you the strategic rollup — how many findings land in each cluster — plus the full per-finding detail with provenance. The Markdown is built to paste straight into a pull-request review. The third format is a fresh SARIF 2.1.0 document tagged with properties.tlctc on every result; it preserves each finding's source region (so line anchors survive in code-scanning UIs) and records which tool produced it.

And because exposure should be enforceable, there's a CI gate:

# fail the build if anything lands in #2 or #4
python -m cli classify scan.sarif --fail-on-cluster "#2,#4"
# → exits 2 on a hit; no separate policy engine required

You stop arguing about 1,200 individual findings and start governing ten clusters of cause.

06 A worked moment

Take one Semgrep finding: a SQL-injection rule, tagged CWE-89, located at src/api/users.py. The pipeline runs:

# 1 mine identifiers   ──► CWE-89
# 2 canonical lookup   ──► #2  (verdict: Allowed)
# 3 R-ROLE not needed  ──► single cluster, no #2|#3 split
# 4 emit               ──► cluster #2, source=cwe, provenance=CWE-89

Multiply that across every scanner in your pipeline and the cluster summary writes itself. The finding that Semgrep called "tainted SQL string," that another tool might have called something else entirely, becomes one line in a picture of #2 Exploiting Server exposure — comparable, countable, and tied back to the same ten clusters your threat models, your ATT&CK mapping, and your incident write-ups already use.

07 What it is not

Honesty about scope is part of the design:

Not in scope	Why
Live tool invocation	It consumes a `.sarif` a scanner already produced; it doesn't run Semgrep or Trivy.
NVD / CVE→CWE enrichment	The CVE path uses the offline KEV snapshot only — broader coverage would add a network dependency.
Layer 3 attack paths	Axiom III. Weaknesses aren't realised paths.
Third-party dependencies	Stdlib only — `json`, `argparse`, `fnmatch`, `pathlib`. Drops into any CI unchanged.

The win isn't clever classification. It's a shared coordinate system. Once your static-analysis output, your detection rules, your ATT&CK coverage and your incident analyses all speak in the same ten clusters, you can finally ask cross-cutting questions — does our SAST exposure in #1 line up with where our detections are thin? — and get an answer instead of a vocabulary mismatch. SARIF is one more dialect folded into that single language of cause.

That folding-together is not a side effect of the tooling — it is the framework. A companion piece, the Sigma coverage map, takes the same ten clusters from the opposite direction: not where you're weak, but where you're watching. Read the two halves side by side and the point announces itself. The TLCTC earns its keep precisely by bringing things together — pulling scanners, detections and incidents that never shared a vocabulary into one model of cause, where they can finally be compared.

Top Level Cyber Threat Clusters (TLCTC) v2.1 — a cause-oriented, axiomatic cyber-threat taxonomy. Licensed CC BY 4.0.

The SARIF classifier lives at integrations/sarif/. The canonical mappings it relies on are mappings/mitre-cwe/ and mappings/cisa-kev/.

Bernhard Kreinz · tlctc.net

SARIF and the TLCTC: from a pile of findings to a picture of cause

01 Two languages

02 The bridge already exists — CWE

03 Not every CWE is a clean cause

When several CWEs disagree

Server or client? R-ROLE decides

04 The part that matters most: where it stops

05 What you actually get out

06 A worked moment

07 What it is not