The Predictable Magic of Words
Let’s play a quick game. I’m going to start a sentence, and you have to guess the next letter. Ready?
T-H-E C-A-T S-A-T O-N T-H-E M-A-_
What’s your guess? If you said ‘T’, you’re almost certainly correct. You didn’t need to be a psychic; you just needed to be a speaker of English. You instinctively understand that language isn’t a random jumble of characters. It’s a system brimming with patterns, rules, and—most importantly—predictability.
This very predictability is the cornerstone of a fascinating field called entropic linguistics. It’s where the mathematical genius of Claude Shannon, the father of information theory, meets the messy, beautiful world of human language. By measuring this predictability, we can understand not only how we communicate but also how to crack the codes designed to hide that communication.
What is Entropy, Anyway?
In the 1940s, Claude Shannon was working on a fundamental problem: how can we quantify “information”? His answer was a concept he borrowed from thermodynamics: entropy. In information theory, entropy is a measure of uncertainty or surprise.
- High Entropy: A system with high entropy is unpredictable and chaotic. A coin flip has high entropy because you can’t be sure if it will be heads or tails. A string of random letters like
"kpsw jyz fq"
has very high entropy. - Low Entropy: A system with low entropy is predictable and ordered. A trick coin with two heads has zero entropy—the outcome is certain. Our sentence, “The cat sat on the mat”, has very low entropy.
Shannon realized that the more surprising a message is, the more “information” it actually contains. If I tell you “the sun will rise tomorrow”, I’ve conveyed very little new information because the event is extremely predictable (low entropy). If I tell you “a blue squirrel just stole my wallet”, I’ve conveyed a huge amount of information because the event is wildly unpredictable (high entropy).
Language, as it turns out, sits in a sweet spot. It’s not completely predictable, but it’s far from random. It has a relatively low entropy, and this is its greatest strength—and its greatest weakness.
The Redundancy of Language
Why is English (or any human language) so predictable? The answer is redundancy. Our language is filled with patterns at every level that reduce its entropy.
From Letters to N-grams
Think about the letters of the alphabet. If every letter were equally likely to appear, the entropy would be maxed out. But we know that’s not true. The letter ‘E’ appears far more often than ‘Z’ or ‘Q’.
It goes deeper. Certain letter combinations are far more common than others. This is the world of n-grams (sequences of n items).
- Bigrams (2-letter sequences): You’ll see ‘TH’, ‘HE’, ‘IN’, and ‘ER’ all the time. You will almost never see ‘QG’, ‘JZ’, or ‘XW’.
- Trigrams (3-letter sequences): ‘THE’ and ‘AND’ are a dime a dozen. ‘ZQX’ is not.
This structure means that even if you miss a letter in a word, you can often figure it out from context. If you see "predi_table"
, your brain immediately fills in the ‘c’ because that sequence is familiar and probable. This redundancy makes communication robust.
Words, Grammar, and Context
The predictability continues at the word level. Grammar and syntax provide a powerful framework that limits our choices. After the words “The fluffy”, your mind doesn’t expect the next word to be “democracy” or “ran.” You anticipate a noun, likely something that can be described as fluffy, like “kitten”, “blanket”, or “cloud.”
Shannon estimated that due to all this redundancy—from letter frequency to grammar—the entropy of written English is only about 1 to 1.5 bits per character, rather than the 4.7 bits per character you’d expect from a random selection of 26 letters and a space. We are, in effect, using only a fraction of the theoretical information capacity of our alphabet.
The Cryptographer’s Secret Weapon
For centuries, this low entropy of language has been the bane of secret-keepers and the boon of codebreakers. The entire field of classical cryptography is a battle against the inherent predictability of language.
Consider a simple substitution cipher, where every letter of the alphabet is consistently replaced by another. For example, every ‘A’ becomes a ‘G’, every ‘B’ becomes an ‘X’, and so on.
Plaintext: HELLO WORLD
Ciphertext: RZLLO GOLQY
On the surface, it looks like gibberish (high entropy). But the cipher has only masked the letters; it hasn’t changed the underlying structure of the language. The low-entropy patterns are still there, waiting to be found.
Frequency Analysis: The Telltale ‘E’
A cryptanalyst breaking this code doesn’t need to guess the entire key. They start by exploiting the most basic redundancy: letter frequency. They’ll count every letter in the encrypted message. If the ciphertext is long enough, the most common symbol will almost certainly correspond to the most common letter in English: ‘E’. The next most common might be ‘T’, then ‘A’, ‘O’, ‘I’, and so on.
In our short example, ‘L’ appears three times. A codebreaker might hypothesize that ‘L’ stands for ‘E’, ‘O’, or another common letter, giving them a crucial first step.
Hunting for Patterns
The attack doesn’t stop there. Cryptanalysts also hunt for n-gram patterns. They might look for repeated three-letter groups in the ciphertext. If they find one that appears over and over, like "XLF"
, they can be pretty sure it’s the encrypted version of "THE"
. Suddenly, they know that X=T, L=H, and F=E. With just those three letters, large portions of the message can be deciphered, revealing more patterns and causing the rest of the code to crumble.
Every predictable feature of language—double letters (like the ‘LL’ in HELLO), common prefixes and suffixes, grammatical rules—is a handhold for the cryptanalyst. They are, in essence, using Shannon’s entropy against the code-maker. The code is broken not by finding a flaw in the cipher itself, but by exploiting the unbreakable, low-entropy habits of the language it’s trying to hide.
Entropy in Our Digital World
The principles of entropic linguistics are not just for spies and scholars. They are deeply embedded in the technology we use every day.
- Data Compression: How does a ZIP file make a large text document smaller? It finds the redundant, low-entropy patterns (like the word ‘the’ or the letter ‘e’) and replaces them with shorter, more efficient codes. The predictability of language is what makes it so compressible.
- Predictive Text: Your smartphone’s autocomplete and predictive text features are entropy-minimizing machines. They use statistical models to calculate the most probable next word based on what you’ve already typed, effectively surfing the low-entropy waves of your sentences.
Claude Shannon gave us a mathematical lens to view language, revealing it as a beautiful dance between order and surprise. The predictable patterns that allow us to understand each other across a noisy room are the same ones that allow a cryptographer to unravel a secret message. Language is a code we all know how to use, and thanks to entropy, we can now measure just how wonderfully, and vulnerably, predictable it is.