The World’s Most Mysterious Book: Can Linguistics Ever Decipher the Voynich Manuscript?

Imagine a book filled with drawings of impossible flowers, strange astronomical charts mapping unknown constellations, and bizarre diagrams of naked figures bathing in green liquid. Now, imagine that this entire book is written in a graceful, flowing script that has never been seen anywhere else on Earth. This isn’t the plot of a fantasy novel; this is the Voynich Manuscript, a 15th-century codex that has been called the world’s most mysterious book. For over a century, it has taunted and tantalized the world’s best cryptographers, historians, and linguists, steadfastly refusing to give up its secrets.

Discovered in an Italian monastery in 1912 by antiquarian bookseller Wilfrid Voynich, the manuscript has been carbon-dated to the early 1400s. Its 240 vellum pages are a puzzle box. But while the illustrations are baffling, the true enigma lies in the text itself. Can the tools of modern linguistics ever hope to crack a code that has resisted all attempts at decryption?

The Ghost of a Language

At first glance, the script, often called “Voynichese,” looks like it could be meaningless gibberish—an elaborate, nonsensical prank. But when linguists began to analyze the text’s statistical properties, they found something astonishing. The text doesn’t behave like random scribbles. It behaves like a real, structured language.

One of the most compelling pieces of evidence is that Voynichese obeys Zipf’s Law. In any natural language, from English to Swahili, the most frequent word appears about twice as often as the second most frequent word, three times as often as the third most frequent, and so on. If you plot this on a graph, you get a predictable downward curve. Randomly generated text doesn’t follow this pattern. But the Voynich Manuscript does, almost perfectly. This suggests a coherent system is at play.

Furthermore, the text’s entropy—a measure of its order and predictability—falls squarely within the range of human languages. It’s not too repetitive, which would suggest a simple substitution cipher (where ‘a’ always equals ‘x’, ‘b’ always equals ‘y’, etc.). It’s also not completely chaotic, which would point to a one-time pad cipher or pure gibberish. It has the same kind of patterned, semi-predictable flow that you find in Latin, Arabic, or Chinese.

Words with Rules

Beyond the high-level statistics, the “words” in Voynichese seem to have an internal structure, what linguists call morphology. Certain character combinations appear to act like prefixes and suffixes. For example, some common “words” seem to share a root, like the frequently cited examples:

qokedy
qokeey
qokeedy
qokedy-

This pattern strongly implies a grammatical system, where root words are modified to change their meaning or function, much like we add “-ed” to a verb to make it past tense in English. There are also strict rules about which characters can appear where. Some glyphs are almost exclusively found at the beginning of a word, others in the middle, and a distinct set at the end. This is a common feature of real writing systems, designed for efficiency and readability.

These linguistic markers are so strong that they make the simplest explanation—that it’s all a hoax—surprisingly difficult to accept.

The Leading Theories: Language, Cipher, or Hoax?

Over the decades, three main theories have emerged, each with passionate advocates and frustratingly large holes.

Theory 1: An Unknown Natural Language

This theory posits that the manuscript is written in a real, but now-extinct or undiscovered, language, using a custom-made alphabet. The statistical patterns are the strongest evidence for this. The author may have been the sole speaker of a dying dialect or part of a small, isolated community. Some researchers have suggested it could be a lost Turkic or East Asian language, brought to Europe along the Silk Road.

The Problem: If it’s a real language, which one? Despite countless attempts, no one has been able to convincingly link Voynichese to any known language family. Furthermore, the illustrations of plants and stars don’t match any known species or constellations, making it impossible to pin down a geographical origin.

Theory 2: A Sophisticated Cipher

Given the 15th century’s fascination with cryptography, it’s plausible that the text is an enciphered version of a known language like Latin or Italian. However, simple ciphers have been ruled out. More complex systems, like a Vigenère cipher (which uses a keyword to shift the alphabet) or a codebook (where entire words are replaced by symbols), have also been proposed. Perhaps the strange features, like the lack of very short words common in European languages (like “a” or “in”), are artifacts of the encryption method.

The Problem: The world’s best codebreakers, including William Friedman’s team that cracked Japan’s PURPLE cipher in WWII, have failed. The text’s internal linguistic structure is incredibly difficult to produce with most known historical ciphers. Faking Zipf’s Law and consistent morphological patterns would require a system far more advanced than anything known from that period.

Theory 3: The World’s Most Elaborate Hoax

Could the Voynich Manuscript be the greatest prank in history? The theory goes that a clever forger created a beautiful and mysterious-looking book with the sole purpose of selling it to a wealthy, curious patron—perhaps the Holy Roman Emperor Rudolf II, a known collector of oddities. The text would be meaningless, designed only to look like a language.

The Problem: This theory almost requires more genius than the others. To create, by hand, a 240-page document that so perfectly mimics the subtle statistical properties of a real language—without the aid of computers—is an almost superhuman feat. The sheer effort and linguistic brilliance required to fake it so convincingly seems, to many, more incredible than actually writing it in a real system.

Can AI Finally Crack the Code?

In recent years, researchers have turned to artificial intelligence and machine learning to analyze the manuscript’s vast dataset. AI algorithms have identified patterns and structures that eluded human eyes. Several studies have made bold claims, with one AI suggesting the underlying language is a form of Proto-Romance and another pointing to an unusual dialect of Hebrew. However, none of these claims have led to a verifiable translation, and each has been met with significant skepticism from the wider community.

The Voynich Manuscript is now fully digitized and available online, allowing anyone from a tenured professor of linguistics to an amateur hobbyist to take a shot at solving it. It remains a humbling testament to the limits of our knowledge. It is a linguistic ghost, exhibiting all the behaviors of a language while remaining utterly silent.

Will we ever decipher it? Perhaps one day, a breakthrough in linguistics, a newly discovered historical text, or a clever AI will finally provide the key. But for now, the Voynich Manuscript remains a perfect mystery—a beautiful, unreadable book that challenges our fundamental understanding of language and communication, daring us to solve a puzzle laid down 600 years ago.