Proving Languages Are Related

Ever been told that English is related to German? That seems plausible enough. But what if I told you it’s also a distant cousin to Russian, Greek, and even Sanskrit, the ancient classical language of India? And to make it stranger, it has no genetic link whatsoever to its geographic neighbor, Finnish.

How can linguists possibly know this for sure? Are they just spotting a few look-alike words? The answer is a resounding “no.” The certainty comes from a powerful and rigorous scientific toolkit known as the comparative method. It’s the linguistic equivalent of DNA testing, allowing us to reconstruct family trees and uncover the history hidden within our words.

Let’s demystify this fascinating process.

The Obvious Traps: Borrowing and Pure Coincidence

Before diving into how it’s done, we need to understand what doesn’t count as evidence. Newcomers to linguistics often fall into two common traps.

First is borrowing, or loanwords. English is full of them. We use words like sushi (from Japanese), ballet (from French), and algebra (from Arabic). These words tell us about cultural contact, trade, and influence, but they say nothing about a shared origin. English didn’t descend from Japanese just because we enjoy raw fish; we simply adopted the word along with the dish.

The second trap is chance resemblance. With a limited number of sounds humans can make, and thousands of languages, some words will sound similar just by coincidence. The Persian word for “bad” is, well, bad. The Malay word for “eye” is mata, uncannily similar to the Greek mati. These are linguistic red herrings. A handful of look-alikes is not proof.

To prove a genetic relationship, linguists need something much more robust: systematic, predictable patterns of difference.

The Gold Standard: The Comparative Method Step-by-Step

The comparative method is a meticulous process that looks past superficial similarities and hunts for regular, patterned sound correspondences in core vocabulary.

Step 1: Gather Your Suspects (Potential Cognates)

The first step is to assemble lists of words from the languages you’re comparing. But not just any words. You focus on the basic, core vocabulary—words that nearly every culture would have and are unlikely to be borrowed. This includes:

Family members (mother, father, brother)
Body parts (head, foot, eye)
Natural elements (sun, water, fire)
Simple verbs (eat, drink, see)
Numbers (one, two, three)

Words that come from a common ancestor are called cognates. For example, the English word mother is a cognate of the German Mutter and the Spanish madre. They weren’t borrowed from each other; they all evolved from a single, older word.

Step 2: Find the System (Sound Correspondences)

This is the heart of the method. Linguists lay out the potential cognates and look for patterns. Let’s take the word for “father” and “foot” across several languages we suspect are related (part of the Indo-European family).

Cognate Set 1: “Father”

Sanskrit: pitṛ́
Ancient Greek: patḗr
Latin: pater
Gothic (an old Germanic language): fadar
Old English: fæder

Cognate Set 2: “Foot”

Sanskrit: pád-
Ancient Greek: pod-
Latin: ped-
Gothic: fōtus
Old English: fōt

Do you see the pattern? It’s not that the words are identical, but that their differences are regular.

Where Sanskrit, Greek, and Latin have a [p] sound at the beginning of the word, Gothic and Old English (the ancestors of modern German and English) consistently have an [f] sound. This isn’t a coincidence; it’s a systematic sound correspondence. This specific p → f shift is one of the most famous discoveries in linguistics, part of a set of changes known as Grimm’s Law.

Finding one such correspondence is interesting. Finding dozens across hundreds of core words is scientific proof. For example, you find another correspondence: where Latin has a [d] (e.g., in decem, “ten”), English has a [t] (ten). It’s a system!

Step 3: Reconstruct the Ancestor (The Proto-Word)

Once you have these correspondences, you can work backward and reconstruct the word in the parent language, or proto-language. A proto-language is a hypothetical ancestor that was never written down. We mark reconstructed words with an asterisk (*) to show they are theoretical.

Looking at our “father” set, which sound is more likely the original: [p] or [f]? Linguists know that the change from a “stop” sound like [p] to a “fricative” sound like [f] is a very common, natural type of sound change across the world’s languages. The reverse is much rarer. Therefore, the proto-language almost certainly had a [p] sound.

By applying this logic to every sound in the word, linguists have reconstructed the Proto-Indo-European (PIE) word for “father” as something like *ph₂tḗr (often simplified as *pətér).

From this single ancient word, we get:

*pətér → Latin pater → Spanish padre, French père
*pətér → Sanskrit pitṛ́
*pətér → (via Grimm’s Law, p→f) → Proto-Germanic *fader → English father, German Vater

The Test Case: So What About Finnish?

Now we can answer our original question. Why isn’t Finnish in this family? Let’s look at its core vocabulary:

Father: isä
Foot: jalka
Water: vesi (compare English water, German Wasser, Russian voda from PIE *wódr̥)
Three: kolme (compare English three, Latin trēs from PIE *tréyes)

When you line up Finnish words with the Indo-European cognate sets, you find… nothing. No systematic correspondences. No predictable patterns. The vocabulary is fundamentally different. Finnish belongs to a completely separate family, the Uralic family, along with Hungarian and Estonian.

History Written in Sound

The comparative method does more than just group languages. It builds family trees, showing which languages branched off from others and when. The fact that all Germanic languages share the p→f shift tells us that this change happened in a common ancestor, Proto-Germanic, after it had already split off from the ancestor of Latin and Greek.

So, when a linguist says English and Sanskrit are related, they’re not making a wild guess based on a few words. They are citing a massive, interlocking system of evidence built over 200 years of painstaking work. They are pointing to thousands of regular sound changes that connect the word “father” on a London street to its 3,500-year-old cousin, pitṛ́, in the sacred hymns of the Rigveda. And in doing so, they let us hear the faint but undeniable echoes of our shared linguistic ancestors, speaking to us from the deep past.