In 1887, a scholar named T.C. Mendenhall had a peculiar idea. He wondered if authors, like composers, had a unique “curve” or “diagram” that defined their work. He meticulously counted the lengths of words in the writings of Charles Dickens and William Makepeace Thackeray, plotting the frequencies on a graph. The results were striking: each author produced a distinct, recognizable curve. Without knowing it, Mendenhall had laid the groundwork for a fascinating field that treats language as a set of data and writing style as a kind of fingerprint: stylometry.
We often think of our writing style as a conscious choice—a deliberate selection of powerful verbs or elegant phrases. But what if the most telling parts of our style are the ones we don’t notice? What if, hidden within our patterns of punctuation and our unconscious preference for “while” over “whilst”, there lies a secret, unforgeable signature? This is the core idea of the linguistic watermark, a hidden ID that can unmask anonymous authors, expose forgeries, and even solve crimes.
Unlike a watermark on paper, a linguistic watermark isn’t a physical mark. It’s a statistical profile of an author’s unique and consistent writing habits. Think of it as a “stylistic DNA”. While you might consciously choose to write a complex sentence, you probably don’t think about how often you use the word “of”, or whether you prefer to end a list with a serial comma. Yet, it’s these tiny, subconscious tics, repeated thousands of times over a body of work, that create a robust and surprisingly reliable identifier.
The science of measuring these features is called stylometry. Using computational power, stylometrists can analyze vast amounts of text and quantify these patterns, comparing an anonymous or disputed text against a set of known works by a potential author (a “corpus”).
So, what exactly are these features that give you away? They fall into several categories, and the most powerful methods use a combination of them.
The theory is fascinating, but its real power is demonstrated in its application. Stylometry has played a key role in some of the most intriguing literary and criminal cases.
The classic case. Written in 1787-88 to promote the ratification of the U.S. Constitution, The Federalist Papers were published under the pseudonym “Publius”. While it was known that the authors were Alexander Hamilton, James Madison, and John Jay, 12 of the essays were disputed, with both Hamilton and Madison being claimed as the author. In the 1960s, statisticians Frederick Mosteller and David Wallace analyzed the frequency of function words like “by”, “from”, and “to” in the disputed papers and compared them to known writings of Hamilton and Madison. The result was a resounding conclusion: all twelve were Madison’s work.
In 2013, a debut crime novel called The Cuckoo’s Calling by an unknown author named Robert Galbraith received critical acclaim. When a journalist received an anonymous tip that Galbraith was actually J.K. Rowling, researchers Patrick Juola and Peter Millican were called in. They compared the book’s stylistic fingerprint to that of Rowling and several other authors. The analysis—focusing on word-pair frequencies and common word usage—showed an undeniable match. The linguistic watermark gave her away, and Rowling soon confirmed she was indeed the author.
Forensic linguistics, a close cousin of stylometry, played a crucial role in identifying Ted Kaczynski as the Unabomber. When the “Unabomber Manifesto” was published, Kaczynski’s brother, David, recognized the writing style and specific turns of phrase, such as the use of “cool-headed logicians”. This initial human recognition was later supported by detailed linguistic analysis that linked the Manifesto to Kaczynski’s other writings, becoming key evidence in the FBI’s investigation.
This raises a tantalizing question: if you know about these watermarks, can you consciously change your style to write anonymously? The answer is: it’s extremely difficult.
The field of “adversarial stylometry” explores this very idea. While you could certainly force yourself to use longer sentences or avoid em-dashes, maintaining that artificial style consistently over a long text is another matter entirely. The unconscious habits, especially the use of function words, are deeply ingrained. Trying to control them is like trying to consciously manage your breathing rate and blink frequency at the same time—you can do it for a little while, but you’ll eventually slip back into your natural rhythm.
However, stylometry isn’t foolproof. An author’s style can evolve over time or change depending on the genre. Analysis of very short texts, like tweets or text messages, is also much less reliable because there isn’t enough data to establish a stable pattern.
From centuries-old political documents to modern-day crime novels, the traces of our identity are woven into the very fabric of our language. Our words do more than just communicate ideas; they carry a hidden echo of who we are. The next time you write an email or a message, remember that you’re not just typing words—you’re leaving a trail of invisible, undeniable, and uniquely personal linguistic crumbs.
While speakers from Delhi and Lahore can converse with ease, their national languages, Hindi and…
How do you communicate when you can neither see nor hear? This post explores the…
Consider the classic riddle: "I saw a man on a hill with a telescope." This…
Forget sterile museum displays of emperors and epic battles. The true, unfiltered history of humanity…
Can a font choice really cost a company millions? From a single misplaced letter that…
Ever wonder why 'knight' has a 'k' or 'island' has an 's'? The answer isn't…
This website uses cookies.