Categories: Body LanguagePsycholinguisticsPhonetics

How the Deaf Read Lips: A Feat of Phonetics

The answer lies in the messy, beautiful world of phonetics.

What is Speechreading, Really?

First, let’s adjust our terminology. While “lip-reading” is the common term, most experts and community members prefer speechreading. This is because it’s not just about the lips. A skilled speechreader uses a whole constellation of cues: the movement of the jaw, the puff of the cheeks, the furrow of an eyebrow, the rhythm of the conversation, and the context of the situation. It’s a holistic skill that involves observing the entire face and body, not just the mouth.

But at its core, speechreading begins with the visible sounds. To understand its limits, we need to break speech down into its smallest parts: phonemes.

The Phonetic Building Blocks: Visemes

In linguistics, a phoneme is the smallest unit of sound that can distinguish one word from another (like the /p/ in “pat” vs. the /b/ in “bat”). However, from a visual perspective, many of these distinct sounds look identical. This is where the concept of a viseme comes in.

A viseme is a group of phonemes that look the same on the lips. It’s the visual equivalent of a phoneme. Understanding visemes is the key to understanding why speechreading is so challenging.

The “Easy” to See Sounds

Some sounds give the speechreader a solid, reliable visual anchor. These are typically sounds made at the front of the mouth.

Bilabials (/p/, /b/, /m/): These sounds are the poster children for “visible” speech. All three require the lips to press together completely. If you say the words “pie”, “buy”, and “my”, you’ll see and feel that distinctive lip closure. To a speechreader, this motion is a clear signal that one of these three sounds has been spoken.
Labiodentals (/f/, /v/): These sounds are also highly visible. They are formed by placing the top teeth on the bottom lip. Try saying “fan” and “van” while looking in a mirror. The gesture is unmistakable.
Lip-rounded Vowels: Certain vowels require dramatic lip shapes. The /u/ sound (as in “boot”) involves obvious rounding, while the /i/ sound (as in “beet”) involves stretching the lips wide. These can provide helpful clues.

When Sounds Become Invisible

If all sounds were as obvious as /p/ or /f/, speechreading would be much easier. Unfortunately, they’re the exception, not the rule. Experts estimate that only about 30-40% of English sounds are clearly visible on the lips. The rest are ambiguous or completely invisible.

The Viseme Ambiguity

Remember our “easy” bilabial group: /p/, /b/, and /m/? While the group is visible, distinguishing between the sounds within it is nearly impossible.

“Pat”
“Bat”
“Mat”

Say these three words aloud. Visually, they are identical. The difference lies in voicing (/b/ is voiced, /p/ is not) and nasality (/m/ sends air through the nose). Both voicing (the vibration of vocal cords) and nasality happen inside the throat and nasal cavity, making them completely invisible to a speechreader.

This single viseme group, {p, b, m}, creates a huge amount of ambiguity. Did they say “maybe” or “baby”? “Pole” or “mole”? The visual information is the same.

The Truly Hidden Sounds

Other sounds are made at the back of the mouth, completely hidden from view.

Velars (/k/, /g/, /ŋ/ as in “sing“): These are produced by raising the back of the tongue to the soft palate (the velum). There is no visible lip or front-of-mouth movement. This means “coat”, “goat”, and “King” offer no clear consonant cues at the point of articulation.
Alveolars (/t/, /d/, /n/, /s/, /z/, /l/): This massive group of common sounds is formed by placing the tongue on the alveolar ridge, the bump just behind your top teeth. The lips barely move. This makes pairs like “tot” and “dot”, or “night” and “light”, incredibly difficult to tell apart.
Glottals (/h/): The sound /h/ is just a puff of air from the glottis in your throat. It has no visual component whatsoever. The word “hat” looks identical to “at.”

The Homophene Trap: When Words Look the Same

When you combine these invisible and ambiguous sounds, you get homophenes—words that look identical on the lips but have different meanings. The English language is riddled with them, and they are the bane of a speechreader’s existence. A classic example is the set:

pet, bed, men

All three words belong to the {p, b, m} and {t, d, n} viseme groups. Visually, they are indistinguishable. The only way to know which word was said is to rely on something else entirely.

Some other famous homophene groups include:

“ship”, “chip”, “jeep”
“cheese”, “sheets”, “jeans”
“where”, “wear”, “were”

And the internet-famous one: “I love you” looks remarkably similar to “olive juice.” Go ahead, try it in a mirror. It’s a fun party trick, but it underscores a serious challenge for speechreaders.

The Brain: The Ultimate Super-Processor

So, if only 30% of speech is visible, how is speechreading possible at all? This is where the brain steps in, acting as a powerful prediction and inference engine.

1. Context is King: The brain is a master of using context to narrow down the possibilities. If you’re in a restaurant and someone says something that looks like “pass the pet/bed/men”, your brain will instantly and subconsciously conclude they said “men” in the context of “Can you show me to the men’s room?” Or if the sentence is “My dog sleeps on its…”, your brain will default to “bed.” Without context, speechreading is a near-impossible guessing game.

2. Filling in the Gaps: This cognitive process is known as phonemic restoration. The brain takes the limited visual data it receives—the visemes—and combines it with its vast knowledge of grammar, vocabulary, and social situations to “fill in” the missing sounds. It essentially generates a list of likely candidates and picks the one that makes the most sense. This happens in fractions of a second and is mentally exhausting.

3. Beyond the Lips: Facial expressions provide a crucial layer of information. A questioning look, a smile, or a look of disgust can change the entire meaning of a sentence. The rhythm and cadence of speech also help differentiate questions from statements.

A Feat of Perception

Speechreading is not a passive act of watching; it’s an active, intellectually demanding process of reconstruction. It’s a testament to the brain’s incredible plasticity and its ability to make sense of a world with incomplete information. It’s a skill that requires immense concentration, guesswork, and an encyclopedic knowledge of language patterns.

So, the next time you speak to someone who is deaf or hard of hearing, remember the phonetic puzzle they’re solving in real-time. Face them, speak clearly (but don’t exaggerate your lip movements, as this distorts them), and don’t cover your mouth. By providing a clear visual signal, you’re handing them a few more crucial pieces to an incredibly complex puzzle.

LingoDigest

Next AI's Sign Language Problem »

Previous « The Logic of Back-Formation: From 'Editor' to 'Edit'

Published by

LingoDigest

Tags: communityishomophenespsycholinguisticscognitionbrainspeechacoustic phoneticsreading

2 months ago

This website uses cookies.

How the Deaf Read Lips: A Feat of Phonetics

What is Speechreading, Really?

The Phonetic Building Blocks: Visemes

The “Easy” to See Sounds

When Sounds Become Invisible

The Viseme Ambiguity

The Truly Hidden Sounds

The Homophene Trap: When Words Look the Same

The Brain: The Ultimate Super-Processor

A Feat of Perception

Recent Posts

The ‘Dot That Died’: Hangul’s Lost Vowel

Stuttering John’s Lost Language

The Town That Fought Over Its Apostrophe

How Dr. Seuss Invented ‘Nerd’

The Treaty That Had Two Meanings

The Doctor Who Invented a Writing System