The File Type Fallacy: Why Extension Blocklists Miss the Point

Security teams worldwide maintain blocklists of "dangerous file extensions." GovCERT-CH, Microsoft, and countless organizations publish lists of file types to block at email gateways. But these lists conflate two fundamentally different threat mechanisms—and that confusion leads to both false confidence and misallocated resources.

The TLCTC framework provides a principled way to think about this. The question isn't "is this file type dangerous?" but rather: "Does this file format have a designed execution capability, or does weaponization require exploiting an implementation flaw?"

This distinction determines not just how we classify threats, but how we design controls, allocate patching priorities, and communicate risk.

The Core Distinction: FEC vs. Data

TLCTC V2.0 defines Foreign Executable Content (FEC) as:

"Attacker-controlled program text or bytes that are interpreted, loaded, or executed by a general-purpose execution engine in the target environment. This includes binaries, scripts, macros, modules, and attacker-controlled commands fed into interpreters."

The critical phrase is "designed execution capability." FEC-capable file formats have code execution as an intended feature of their specification. When a .exe runs or a .ps1 executes, that's the format working as designed.

Contrast this with a JPEG image. The JPEG specification defines how to store compressed pixel data. It has no execution capability. If a malicious JPEG achieves code execution, it's because the attacker exploited a bug in the image decoder—a buffer overflow, integer overflow, or heap corruption in the parsing code.

Key Insight

Native executables are directly loaded by the OS. Application-mediated formats (like macro-enabled Office documents) use designed application functionality to enable embedded code. Data formats require parser bugs to achieve execution. This maps to a three-tier model in TLCTC: pure #7, #1→#7, and #2/#3→#7 respectively.

But this distinction is actually more nuanced. There are three tiers, not two:

Tier 1: Native/Direct FEC (pure #7)

The OS loader or runtime directly executes the file. The file is the foreign executable content. No intermediate application processing required.

.exe .dll .com .scr .sys → Windows PE loader

.sh .bash → Shell directly interprets

.ps1 → PowerShell directly interprets (when invoked)

File

Code runs

Tier 2: Application-Mediated FEC (#1 → #7)

An application processes the file through its designed functionality (#1 Abuse of Functions), which then enables embedded code execution (#7). The file type association and MIME type handling is itself the first step.

.docm .xlsm .pptm → Office opens file (#1) → macro engine runs VBA (#7)

.hta → mshta.exe processes (#1) → executes script (#7)

.jar → JVM loads archive (#1) → executes bytecode (#7)

.pdf (with JS) → Reader processes (#1) → JavaScript executes (#7)

File

Code runs

App processes file (designed function) → enables execution

Tier 3: Data Formats Requiring Parser Exploits (#2/#3 → #7)

No designed execution capability. Code execution requires exploiting an implementation flaw in the parser, codec, or renderer.

.jpg .png .gif .webp → image decoder exploits

.mp4 .avi .mkv → video codec exploits

.ttf .otf → font rasterizer exploits

.zip .rar → archive parser exploits

File

#2/#3

Code runs

Parser bug exploited → enables unintended execution

#7 Native FEC (Tier 1)

→ Direct execution by OS/runtime
→ File IS the executable
→ Fastest attack velocity
→ Control: Block execution

#1→#7 Mediated FEC (Tier 2)

→ App processes via designed function
→ File ENABLES execution
→ Intermediate velocity
→ Control: Disable features (macros)

#2/#3 Data (Tier 3)

→ No designed execution
→ Requires parser bugs
→ Patched = safe
→ Control: Patch parsers, sandbox
→ Exhaustive list impossible

The R-ROLE Rule: Context Determines Classification

Here's where it gets interesting. When a data file exploits a parser vulnerability, the classification depends on which component is parsing it—not the file type itself.

TLCTC's R-ROLE rule states:

Server role (accepts and handles inbound requests) → #2 Exploiting Server
Client role (consumes external content/responses) → #3 Exploiting Client

The same malicious file can be classified differently depending on context:

Scenario	Component Role	Classification
User opens malicious.jpg in image viewer	Client consuming content	#3 → #7
Server thumbnail generator processes uploaded malicious.jpg	Server processing request	#2 → #7
CDN image optimizer transforms malicious.jpg	Server role	#2 → #7
Email client renders inline malicious.jpg	Client rendering content	#3 → #7

Notice the → #7 in each sequence. When exploitation of a parser bug leads to code execution, we record both steps: the exploitation (#2 or #3) and the FEC execution (#7). This is TLCTC's R-EXEC rule—FEC execution must always be recorded as its own step.

Why Extension Blocklists Can Never Be Complete

This brings us to the fundamental problem with extension-based blocklists.

For FEC-capable formats, an exhaustive list is theoretically possible. There are a finite number of file formats with designed execution capability. We can enumerate them:

.exe .dll .ps1 .bat .cmd .vbs .js .wsf .hta .msi .jar .docm .xlsm .pptm ...

For data formats, exhaustive enumeration is impossible. Any file that gets parsed can potentially exploit a parser vulnerability. This includes every image, audio, video, font, document, archive, and serialization format—plus every proprietary format you've never heard of.

The Proprietary Format Problem

Your organization probably handles file types that appear on no blocklist anywhere. Proprietary CAD formats, industry-specific data files, legacy document types, custom configuration files. Every one of these has a parser. Every parser can have bugs. No blocklist will ever include .xyz-proprietary-format until after it's been exploited in the wild.

The MIME Type Dimension

File extensions are unreliable indicators anyway. What matters is how systems actually process files, which is often determined by MIME types (Content-Type headers) rather than extensions.

MIME Type Pattern	Nature	TLCTC Implication
application/x-executable	Native executable	FEC by design → #7
text/x-script.*	Script files	FEC by design → #7
application/vnd.ms-* (macros)	Office with macros	FEC by design → #7
image/*	Image data	Data → #2/#3 if exploited
application/json	Structured data	Data → #2/#3 if exploited

Practical Control Implications

For FEC-Capable Formats: Block or Sandbox Execution

These formats are dangerous by design. Controls focus on preventing or containing execution (Email Gateways, App Allowlisting, Script Policies, Protected View). The control question is: "Should this code be allowed to run?"

For Data Formats: Patch Parsers, Sandbox Processing

These formats are safe when processed by correct implementations. Controls focus on patching parsers, sandboxing, and memory safety. The control question is: "Is my parser implementation safe?"

Control Strategy Decision Flow

File received

Identify format

FEC-capable?

YES Block/sandbox execution

NO Ensure parser is patched

Why GovCERT-CH Still Blocks Data Formats

Blocking JPEGs or MP3s is valid defense-in-depth to reduce exposure to unpatched parser vulnerabilities. However, it is crucial to understand why. Blocking .exe prevents designed malware. Blocking .jpg reduces parser exploit surface.

The False Confidence Trap

If you block .jpg at your email gateway but don't patch your image processing libraries, you've protected one delivery channel while leaving the vulnerability intact. The same user might encounter a malicious image on a website, in a downloaded document, or from a USB drive.

Historical Examples

CVE/Campaign	File	Classification
CVE-2017-0199	.rtf / .docx	#7 FEC
CVE-2023-4863	.webp	#3 → #7
CVE-2004-0200	.jpg	#3 → #7
Emotet	.docm	#1 → #7 → #1 → #7

Dissecting the Emotet Chain

The Emotet example perfectly illustrates the difference between Tier 1 (native) and Tier 2 (application-mediated) FEC:

victim.docm

Word opens Macro PS Invoke Payload

Step 1 — #1 Abuse of Functions: Word processes the .docm file.
Step 2 — #7 Malware: The VBA macro executes via the macro engine.
Step 3 — #1 Abuse of Functions: The macro invokes powershell.exe (LOLBAS).
Step 4 — #7 Malware: PowerShell executes the payload.

Contrast with Native FEC

If the victim had simply double-clicked malware.exe, the chain would be just #7—direct execution by the OS loader. No application processing step, no function abuse. The .exe is the foreign executable content; it doesn't need an application to enable its execution.

This distinction matters for control design. For the .exe case, your control is blocking execution (application allowlisting, SmartScreen, etc.). For the .docm case, you have additional intervention points: disable macros by default (#1→#7 breaks), block PowerShell invocation from Office (#7→#1 breaks), constrain PowerShell execution (#1→#7 breaks).

Summary: The Three-Tier Classification

The TLCTC framework provides clear criteria for classifying file-based threats:

Tier 1 — Native FEC: Is the file directly executed by the OS loader or runtime?
If yes → pure #7 (the file IS the executable content)
Tier 2 — Application-Mediated FEC: Does an application process the file (designed functionality) and then enable embedded code execution?
If yes → #1 → #7 (function abuse enables FEC)
Tier 3 — Data requiring exploits: Does the format have no designed execution capability?
If yes → #2/#3 → #7 (parser exploit enables unintended execution)

For Tier 3, apply R-ROLE: Is the vulnerable parser in a server role (processing inbound requests) or client role (consuming external content)? This determines #2 vs #3.

Bottom Line

Extension blocklists conflate three different threat mechanisms. Native FEC formats (Tier 1) are directly executable. Application-mediated FEC formats (Tier 2) enable execution through designed application functionality. Data formats (Tier 3) require parser exploits—and any format with a parser is a potential attack surface, including proprietary formats and any MIME type your systems process. Effective security requires understanding which tier you're defending against and applying the appropriate controls at the right points in the chain.

Don't block .jpg and think you've addressed "image-based attacks." You've reduced one delivery channel. The real control is ensuring your image parsers are patched, sandboxed, and memory-safe. The file extension is just a hint—the threat model is what matters.

Go Offensive: The Whitelist Imperative

The analysis above leads to an uncomfortable conclusion: blacklists are structurally inadequate. They are reactive, incomplete, and provide false confidence. The alternative is a whitelist approach—and yes, it requires more work. That's the point.

Why Blacklists Fail

Infinite regress: For Tier 3 formats, the list can never be complete. Every proprietary format, every MIME type your systems process, every parser is a potential attack surface. You're always one step behind.
False confidence: "We block dangerous extensions" becomes security theater. You blocked .docm, but what about .slk? What about the custom format your CAD system processes?
Context blindness: A .exe in your CI/CD pipeline is expected; a .exe arriving via email is suspicious. Blacklists cannot express this distinction.
No provenance tracking: Even "safe" files participate in attack chains. A legitimate library.png with a #10 prefix (compromised at source) passes every blacklist.
Reactive posture: You add extensions to the blacklist after they're exploited in the wild. The attacker has already won that round.

The Whitelist Model

At each system boundary, define:

Expected file types: What SHOULD flow through this boundary?
Expected sources: What is the provenance/trust context?
Expected processing: Which parser/handler will process this content?

Anything that deviates from these expectations becomes a detection signal—not "is this extension on a bad list?" but "is this file type expected from this source at this boundary?"

This aligns with TLCTC's boundary notation: ||[context][@Source→@Target]||. At each crossing, you specify what's expected, making deviations visible.

Objection Handling

Organizations resist whitelisting with predictable objections. Each objection, properly understood, is actually an argument for the approach:

Objection	Reality
"We don't know all the file types we need"	Then you don't understand your attack surface. Discovery is the first step to defense.
"Too much operational overhead"	The "overhead" is actually understanding your environment. That's not overhead—it's the job.
"False positives will disrupt business"	False positives reveal undocumented workflows, shadow IT, and unexpected data flows. These are security findings, not annoyances.
"We can't maintain it"	If you can't maintain a whitelist, you can't defend the boundary. The complexity exists whether you acknowledge it or not.

SDLC Context: Where Blacklists Are Useless

Consider the software development lifecycle. A developer downloads a dependency:

#10

Compromised package → build system processes → malicious code executes

A blacklist doesn't help here—.js, .py, .jar are all "expected" file types in development. What matters is:

Is this source in our trust registry?
Does this artifact match expected hashes/signatures?
Is this file type expected from THIS source at THIS boundary?

Blacklists answer none of these questions. Whitelists answer all of them.

The Offensive Posture

Stop playing defense against an infinite list of "bad" extensions. Go offensive: define what's expected, flag what isn't, and treat every boundary as a control point with explicit trust decisions. The complexity is real. Hiding from it doesn't make it go away.

A Final Note: File Type Control Is Not Enough

Defense in Depth Required

Everything discussed in this article addresses one control dimension: controlling what file types may cross boundaries. This is necessary but not sufficient for the PROTECT objective against #7 Malware.

A more complete FEC protection strategy requires additional complementary controls:

Control	Function	Where in Chain	Catches
File Type Control	What MAY enter	Boundary crossing	Unexpected formats from unexpected sources
Malware Scanner	What IS malicious	Content inspection	Known malware signatures/behaviors
Application Control	What MAY execute	Runtime	Unapproved code execution attempts

Each control alone is insufficient and consider also umbrella controls:

File Type Control alone fails against malware in allowed types, parser zero-days, and anything arriving through expected channels.
Malware Scanner alone fails against zero-day malware, fileless attacks, encrypted payloads, and novel techniques.
Application Control alone fails against LOLBAS abuse (PowerShell is on the allowlist), macro execution in approved Office, and interpreter abuse.

For Tier 3 formats specifically, add a fourth imperative: patch your parsers. You cannot allowlist or scan your way out of a heap overflow in libwebp. When the threat requires exploiting implementation flaws, fixing those flaws is the primary defense.

File type control—done properly with whitelists, not blacklists—is one essential layer. But it's only one layer. Design your defenses accordingly.

The File Type Fallacy: Why Extension Blocklists Miss the Point

The Core Distinction: FEC vs. Data

Tier 1: Native/Direct FEC (pure #7)

Tier 2: Application-Mediated FEC (#1 → #7)

Tier 3: Data Formats Requiring Parser Exploits (#2/#3 → #7)

#7 Native FEC (Tier 1)

#1→#7 Mediated FEC (Tier 2)

#2/#3 Data (Tier 3)

The R-ROLE Rule: Context Determines Classification

Why Extension Blocklists Can Never Be Complete

The MIME Type Dimension

Practical Control Implications

For FEC-Capable Formats: Block or Sandbox Execution

For Data Formats: Patch Parsers, Sandbox Processing

Why GovCERT-CH Still Blocks Data Formats

Historical Examples

Dissecting the Emotet Chain

Summary: The Three-Tier Classification

Go Offensive: The Whitelist Imperative

Why Blacklists Fail

The Whitelist Model

Objection Handling

SDLC Context: Where Blacklists Are Useless

A Final Note: File Type Control Is Not Enough

References & Further Reading