Unicode Steganography Explained

Published: April 26, 2026

Steganography is the practice of hiding information inside something that looks ordinary. Unicode steganography hides executable code or data inside text by using characters that are valid Unicode but render as invisible — zero pixels wide, no glyph, completely undetectable to the human eye.

Unlike traditional steganography (hiding data in images or audio), Unicode steganography embeds payloads directly in source code, configuration files, or text that developers copy and paste every day.

Why Unicode Makes This Possible

The Unicode standard defines over 149,000 characters across 161 scripts. Among them are dozens of characters specifically designed to be invisible:

These characters exist for legitimate typographic reasons. But attackers exploit the gap between what the Unicode standard allows and what code editors show.

Three Attack Techniques

1. Binary Encoding with Zero-Width Characters

The simplest technique. Each byte of malicious code is converted to binary, then each bit is represented by one of two invisible characters:

Eight invisible characters encode one byte. A 500-byte payload requires 4,000 invisible characters — all hidden between visible lines of code.

Example: The letter "A" (0x41 = 01000001) would be encoded as: U+200C U+200B U+200B U+200B U+200B U+200B U+200B U+200C — completely invisible in any editor.

2. Private Use Area Mapping

Used by Glassworm. Each ASCII character is mapped to a Unicode Private Use Area codepoint or Variation Selector Supplement character. The mapping is arbitrary but consistent — a small decoder function reverses it at runtime.

This technique is harder to detect because the invisible characters don't follow a simple binary pattern.

3. Hangul Jamo Encoding

A newer technique that uses half-width and full-width Hangul (Korean) character variants. Each ASCII byte is split into bits represented by specific Hangul characters that some systems render as invisible or near-invisible.

The Decoder Problem

Hidden characters alone are inert. The attack requires a decoder — a small piece of visible JavaScript that:

  1. Reads a string containing invisible characters
  2. Maps them back to executable code
  3. Passes the result to eval() or Function()

The decoder is typically 3–5 lines of code, often disguised as a string utility or configuration parser. It is the only visible part of the attack.

Where Steganographic Payloads Hide

LocationWhy It Works
npm package source filesDevelopers rarely read every line of dependencies
VS Code extension codeExtensions run with full system access
GitHub repository filesCode review UIs hide invisible characters
AI-generated codeLLMs may propagate invisible chars from training data
Copy-pasted Stack Overflow answersBrowser copy can include hidden characters from page source
Configuration files (JSON, YAML)Parsers may silently accept invisible characters

Why Traditional Tools Miss It

How to Detect Unicode Steganography

Vibe Check scans for all 14 invisible Unicode character ranges used in known steganographic attacks. It detects individual invisible characters (warning) and consecutive sequences of 3+ invisible characters (critical — almost certainly a payload). Everything runs in your browser.

Scan Your Code Now →

Defense Checklist

  1. Scan all code before execution — especially code from npm, GitHub, or AI assistants
  2. Add pre-commit hooks that reject files containing suspicious Unicode ranges
  3. Enable whitespace rendering in your editor (editor.renderWhitespace: all in VS Code)
  4. Monitor file sizes — steganographic payloads make files larger than their visible content
  5. Audit eval() usage — the decoder must use eval or Function to execute the hidden payload
  6. Use lockfiles and verify package checksums to prevent supply chain substitution

Further Reading