Cryptographic Document Attribution

The Cryptographic Document Attribution system is the concretisation of my attempt to solve the information leak problem exposed in Emergent Steganography.

The question: how to distribute a classified document while mitigating the risks of unauthorized disclosure?

My answer: hide the recipient's identity inside the document.

Before we start

This article is based on Cryptographic Document Attribution: A Forensic System via Emergent Steganography, a more formal paper that gets a tad bit more into the details.

For an implemented proof-of-concept, head to CryptographicDocumentAttributionDemo.jl.

Here, I will only describe the main components and how they interact with each other.

Defining the pieces

The system has three core agents:

the Issuer,
the Timestamp Authority (TSA)
the Forensic Analyst.

The system acts as a mediator between those entities and is built directly upon the emergent steganography framework, where a Rule Chain RC serves as the secret key.

The TSA is the critical trust anchor that defeats the primary legal counter-argument: "The issuer fabricated the watermark after the leak." By obtaining a timestamp before distribution, the issuer proves the watermark's existence at that prior time.

A concrete scenario

Imagine you (the Issuer) are about to send a confidential report to three different partners.

You don’t just send the same PDF three times. For each recipient, you quietly generate a unique rule chain — a small, deterministic set of transformations that will embed a subtle identifier into their copy.

Let’s follow one recipient: R.

Before anything is sent

First, you generate:

A unique identifier ID_R
A rule chain RC_R that will determine how that identifier is embedded

Before modifying the document, you create a cryptographic commitment:

H = SHA256(RC_R || ID_R || Hash(Doc))

This hash commits you to three things at once:

the exact original document,
the exact identifier,
the exact embedding logic.

You then send this hash to a Timestamp Authority.

They return a signed timestamp token T, anchoring that commitment to a precise moment in time.

At this stage, you haven’t accused anyone of anything. You’ve simply created a time-locked cryptographic witness.

Only then do you embed the identifier into the document using RC_R, producing Doc_R, and send it to the recipient.

Privately, you store (RC_R, ID_R, T) in a secure log.

That’s it. Distribution is complete.

Now imagine a leak appears

Months later, a scan of the report shows up online.

Someone forwards it to you.

You (or a forensic analyst) extract an identifier from the leaked copy. Using the relevant rule chain, you recover a candidate ID'.

Now you recompute:

H' = SHA256(RC_R || ID' || Hash(Original_Doc))

If H' matches the original commitment H, something very strong follows.

Why this matters

An independent verifier can now check:

The timestamp token T is valid and was issued at time t.
The recomputed hash H' matches the original H.
The timestamp predates the leak.

If all of this holds, then the identifier extracted from the leaked document was already cryptographically bound to the original document at time t.

It wasn’t inserted after the leak. It wasn’t fabricated retroactively.

The system doesn’t prove motive. It doesn’t even prove intent.

It proves something narrower — and more powerful:

That a specific personalized version of the document existed at a specific point in time.

And that version matches the leaked copy.

Reversibility as structure

This protocol is not just about document tracing. It illustrates a broader principle: design structures that can be traversed both forward and backward without ambiguity.

By committing early, keeping the rules minimal, and anchoring transformations in time, you constrain what can be rewritten later.

The strength does not come from opacity or complexity, but from symmetry and pre-commitment. When a system can close on itself under scrutiny, it becomes less about accusation and more about structural coherence.

< Emergent Steganography The Structure Of Meaning >

Content licensed under CC BY 4.0.

Code snippets licensed under MIT License.

Last Update — 30 Apr 2026