Unicode Steganography Explained

Published: April 26, 2026

Steganography is the practice of hiding information inside something that looks ordinary. Unicode steganography hides executable code or data inside text by using characters that are valid Unicode but render as invisible — zero pixels wide, no glyph, completely undetectable to the human eye.

Unlike traditional steganography (hiding data in images or audio), Unicode steganography embeds payloads directly in source code, configuration files, or text that developers copy and paste every day.

Why Unicode Makes This Possible

The Unicode standard defines over 149,000 characters across 161 scripts. Among them are dozens of characters specifically designed to be invisible:

Zero-width characters — characters with no visual width, used for text shaping in complex scripts like Arabic and Thai
Directional marks — invisible characters that control text direction (left-to-right vs. right-to-left)
Tag characters — an entire hidden alphabet (U+E0001–U+E007F) that maps to ASCII but renders invisibly
Variation selectors — modifiers that alter how the previous character renders (or don't, for most characters)

These characters exist for legitimate typographic reasons. But attackers exploit the gap between what the Unicode standard allows and what code editors show.

Three Attack Techniques

1. Binary Encoding with Zero-Width Characters

The simplest technique. Each byte of malicious code is converted to binary, then each bit is represented by one of two invisible characters:

U+200B (zero-width space) = 0
U+200C (zero-width non-joiner) = 1

Eight invisible characters encode one byte. A 500-byte payload requires 4,000 invisible characters — all hidden between visible lines of code.

Example: The letter "A" (0x41 = 01000001) would be encoded as: U+200C U+200B U+200B U+200B U+200B U+200B U+200B U+200C — completely invisible in any editor.

2. Private Use Area Mapping

Used by Glassworm. Each ASCII character is mapped to a Unicode Private Use Area codepoint or Variation Selector Supplement character. The mapping is arbitrary but consistent — a small decoder function reverses it at runtime.

This technique is harder to detect because the invisible characters don't follow a simple binary pattern.

3. Hangul Jamo Encoding

A newer technique that uses half-width and full-width Hangul (Korean) character variants. Each ASCII byte is split into bits represented by specific Hangul characters that some systems render as invisible or near-invisible.

The Decoder Problem

Hidden characters alone are inert. The attack requires a decoder — a small piece of visible JavaScript that:

Reads a string containing invisible characters
Maps them back to executable code
Passes the result to eval() or Function()

The decoder is typically 3–5 lines of code, often disguised as a string utility or configuration parser. It is the only visible part of the attack.

Where Steganographic Payloads Hide

Location	Why It Works
npm package source files	Developers rarely read every line of dependencies
VS Code extension code	Extensions run with full system access
GitHub repository files	Code review UIs hide invisible characters
AI-generated code	LLMs may propagate invisible chars from training data
Copy-pasted Stack Overflow answers	Browser copy can include hidden characters from page source
Configuration files (JSON, YAML)	Parsers may silently accept invisible characters

Why Traditional Tools Miss It

Linters (ESLint, Pylint) don't flag invisible Unicode by default
SAST scanners look for code patterns, not character encoding anomalies
Code review is visual — if you can't see it, you can't review it
grep can find them, but you need to know exactly which ranges to search for
AI code assistants may even generate invisible characters unknowingly

How to Detect Unicode Steganography

Vibe Check scans for all 14 invisible Unicode character ranges used in known steganographic attacks. It detects individual invisible characters (warning) and consecutive sequences of 3+ invisible characters (critical — almost certainly a payload). Everything runs in your browser.

Scan Your Code Now →

Defense Checklist

Scan all code before execution — especially code from npm, GitHub, or AI assistants
Add pre-commit hooks that reject files containing suspicious Unicode ranges
Enable whitespace rendering in your editor (editor.renderWhitespace: all in VS Code)
Monitor file sizes — steganographic payloads make files larger than their visible content
Audit eval() usage — the decoder must use eval or Function to execute the hidden payload
Use lockfiles and verify package checksums to prevent supply chain substitution

Unicode Steganography Explained

Why Unicode Makes This Possible

Three Attack Techniques

1. Binary Encoding with Zero-Width Characters

2. Private Use Area Mapping

3. Hangul Jamo Encoding

The Decoder Problem

Where Steganographic Payloads Hide

Why Traditional Tools Miss It

How to Detect Unicode Steganography

Defense Checklist

Further Reading