A modern detection library is a sprawling thing. SigmaHQ alone ships thousands of community rules across hundreds of log sources. You can count them, you can tag them with ATT&CK, you can group them by product — but try answering a board-level question with that pile: across the whole spectrum of threats, what are we actually watching for, and what are we blind to?
The companion to the SARIF classifier answers exactly that, from the opposite direction. Where SARIF projects static weaknesses onto the ten clusters — the cause side — the mappings/sigma artifact projects your detection rules onto them. It's not a classifier you run on incidents. It's a coverage audit: a cluster-level map of the detections you already have.
01 A different question
TLCTC's existing mappings — ATT&CK, CWE — answer a static, cause-side question: which cluster does this technique or weakness belong to? Useful, but abstract. The Sigma mapping asks something operational:
Given the detection rules your SOC runs today, which TLCTC clusters are you actually detecting — and how many rules cover each one?
That reframes a detection inventory as a coverage surface. Rules that resolve to #1 tell you how deeply you watch Abuse of Functions; rules that resolve to nothing tell you where the corpus simply isn't looking. The second category is the interesting one — but to trust either, you have to understand the chain that produced them.
02 A two-hop derivation — and why honesty about it matters
This artifact is not authored. It is derived, mechanically, through two joins:
# the derivation chain, left to right
Sigma rule ──► attack.t* tags ──► parent technique IDs
│
technique ──► tlctc-enterprise-attack.json ──► cluster expression
│
clusterSet + primaryCluster + derivationStatusEvery cluster a rule lands in traces back to its author's attack.t* tags, run through the project's ATT&CK→TLCTC mapping of 698 techniques. There is no manual per-rule classification — three thousand rules make that impractical, and the rule authors already encoded the signal in their tags.
The flip side, stated plainly in the mapping's own README, is that quality is bounded by that chain. The ATT&CK→TLCTC mapping it builds on is itself AI-generated and marked experimental. So the cluster assignments inherit that uncertainty — a derived view built on a derived view. The artifact doesn't hide this; it leads with it.
Mapping quality depends on (a) how consistently each rule's author tagged it, and (b) the quality of the upstream ATT&CK→TLCTC mapping. The pipeline surfaces that uncertainty as first-class status values rather than papering over it with a clean-looking answer.
03 The mechanics: fold, resolve, derive
For each rule, the generator strips everything that isn't a technique tag, folds sub-techniques to their parent, and deduplicates:
tags: [attack.execution, attack.t1059.001, attack.t1059.003]
│ │ │
└─ dropped └──────┬───────────┘
(tactic label) ▼
fold to parent ──► techniques = [T1059]Each technique is then looked up in the ATT&CK index, its cluster expression parsed, and the results unioned into a clusterSet. The primaryCluster is always the lowest-numbered member of that set — an auditable default, never a confident assertion when more than one cluster is in play. The whole thing is deterministic and idempotent: two runs against the same rules clone produce byte-identical output, and the SigmaHQ commit SHA is pinned in the metadata so the snapshot is reproducible.
04 Three honest verdicts
Rather than force every rule into a clean cluster, the derivation labels each with a derivationStatus that says exactly how much confidence the chain earned:
| Status | Meaning |
|---|---|
| ok | Every tagged technique resolved to a single concrete cluster — a clean read. |
| ambiguous | Techniques resolve to multiple clusters, or one maps to an alternation like #2 | #3, or partial resolution — some techniques resolved while others were absent from the mapping. |
| unmapped | The rule carries no attack.t* tags, or every tagged technique is missing from the ATT&CK mapping. |
The crucial reframing is on that last row. An unmapped rule is not a classification failure — it's a detection that can't be placed on the threat-cluster map, usually because its author tagged it only with tactic labels or nothing at all. From a TLCTC perspective, those are blind spots in the audit: detections you run but can't reason about strategically. Surfacing them is the point, not an embarrassment to be swept up.
05 What the numbers say about the whole ecosystem
Run against the pinned SigmaHQ snapshot (commit 994da166…, May 2026), the mapping covers 3,132 rules:
| Status | Rules | |
|---|---|---|
| ok — single concrete cluster | 739 | |
| ambiguous — multiple / alternation / partial | 1,834 | |
| unmapped — no resolvable technique | 559 |
And among the cleanly-resolved rules, the primary-cluster distribution leans hard one way:
| Cluster | OK rules | |
|---|---|---|
#1 Abuse of Functions | 520 | |
#4 Identity Theft | 100 | |
#7 Malware | 53 | |
#2 Exploiting Server | 36 | |
#9 Social Engineering | 15 | |
#5 · #6 · #8 · #10 | 15 |
The dominance of #1 isn't a quirk of the mapping — it's a portrait of the detection ecosystem itself. SigmaHQ's centre of gravity is post-exploitation behaviour: LOLBIN execution, administrative-tool abuse, configuration changes — exactly the Abuse of Functions shape, reached through techniques like T1059, T1204, T1218 and T1569. The corpus is overwhelmingly good at spotting attackers misusing legitimate capability, and comparatively thin on, say, Man in the Middle or Flooding. That's a strategic statement about community detection you simply cannot read off a flat rule count.
Three thousand rules become one sentence: this is where we watch closely, and this is where we don't.
06 What it deliberately doesn't do
Like its SARIF sibling, the discipline is in the restraint:
| Not done | Why |
|---|---|
| Copy detection logic | Only id, title, logsource, derived techniques and cluster fields are committed. No detection: bodies — the artifact is license-safe to redistribute. |
| Manually classify rules | The author's tags carry the signal; hand-classifying 3,000+ rules would be unrepeatable and unauditable. |
| Preserve sub-technique granularity | The stable surface in the ATT&CK mapping is parent-level; t1059.001 and t1059.003 both fold to T1059. |
| Claim freshness | It's a pinned, point-in-time snapshot. Regenerate against a new clone to pick up changed rules — the generator needs only PyYAML. |
Why surface ambiguity instead of resolving it
It would be trivial to collapse every ambiguous rule to its lowest-numbered cluster and report a tidier set of numbers. The mapping refuses, because a partial or alternating result is information. A rule that touches #1 | #4 is telling you the detection straddles two clusters; flattening that to "#1" would manufacture a precision the evidence doesn't support — the same instinct that keeps the SARIF pack from inventing attack paths. The full clusterSet stays authoritative; primaryCluster is only ever a labelled default.
07 One coordinate system
The payoff is the same as everywhere else in the framework: a shared set of ten clusters. Once your static-analysis exposure (SARIF → CWE), your incident write-ups (Layer 3 attack paths), and now your detection coverage (Sigma → ATT&CK) all speak in those same clusters, the cross-cutting questions finally line up.
That last question is the one worth the effort. If your SAST exposure piles up in a cluster your detection corpus barely covers, you've found a gap no single tool would ever have shown you — because the tools don't share a language. The Sigma mapping contributes the operational layer — what you are watching — to a model the rest of the TLCTC builds from cause; the same model its companion, the SARIF classifier, populates from static weakness.
Neither half is the point on its own. Bringing them together is. That act of convergence — disparate tools, built by different people for different jobs, resolving onto one shared model of cause — is not a feature of the mappings. It is what the TLCTC is fundamentally for.
Top Level Cyber Threat Clusters (TLCTC) v2.1 — a cause-oriented, axiomatic cyber-threat taxonomy. Licensed CC BY 4.0.
The Sigma mapping lives at mappings/sigma/; its full traversal algorithm is documented in decision-tree.md. It is derived from mappings/mitre-attack-enterprise/ and the SigmaHQ rules repository.
Bernhard Kreinz · tlctc.net