The Sound of a Vowel: What Are Formants?

The Sound of a Vowel: What Are Formants?

Sing a note. Now, keeping that same pitch, slowly change the sound you’re making from an “ee” to an “oo,” and then to an “ah.” You can feel your tongue and jaw moving, and you can hear a distinct change in the sound’s character, or timbre. The pitch—the fundamental musical note produced by your vocal cords—hasn’t changed, yet the vowel is completely different. How is this possible?

This simple exercise reveals one of the most fundamental principles of acoustic phonetics. The identity of a vowel doesn’t live in its pitch. It lives in a set of acoustic fingerprints called formants. Understanding formants is like getting a backstage pass to see how the human voice really works.

The Source and the Filter: A Two-Part System

To understand what formants are, we first need to break down speech production into two basic components: the source and the filter.

  1. The Source: This is the raw sound generated by your vocal cords (or larynx). As air from your lungs pushes through, your vocal cords vibrate rapidly, creating a buzzing sound. The speed of this vibration determines the fundamental frequency (F0), which we perceive as pitch. A faster vibration means a higher pitch; a slower one means a lower pitch. This source sound is complex, containing the fundamental frequency plus a whole series of higher-frequency overtones, or harmonics.
  2. The Filter: This is your vocal tract—the open space in your throat (pharynx), mouth (oral cavity), and sometimes nose (nasal cavity). This space acts as a resonance chamber. Like the body of a guitar or a violin, its specific shape amplifies certain frequencies from the source sound and dampens others.

Think of it this way: plucking a guitar string in open air produces a thin, quiet sound. That’s the source. But when that same string is attached to the hollow body of a guitar, the sound becomes rich, loud, and full. The guitar’s body—the filter—has resonated with the string’s vibrations, amplifying some frequencies to create the instrument’s characteristic tone. Our vocal tract does the exact same thing with the buzz from our vocal cords.

So, What Exactly Is a Formant?

A formant is simply a peak of resonant frequency created by the filter of our vocal tract. When the source buzz from the vocal cords passes through the vocal tract, the frequencies that match the resonant properties of that particular mouth shape get a major boost in amplitude (loudness).

Imagine you have a hundred different tuning forks (the harmonics from your vocal cords) and you bring them near a wine glass (your vocal tract). Only the tuning forks whose frequencies match the resonant frequency of the glass will cause it to ring out loudly. Those loud, amplified frequencies are the formants.

By changing the shape of our vocal tract—by moving our tongue, rounding our lips, and raising or lowering our jaw—we change its resonant frequencies. This, in turn, changes which harmonics get amplified. Our brains are exquisitely tuned to detect these formant peaks, and it’s how we instantly and unconsciously identify which vowel we’re hearing, regardless of the speaker’s pitch, accent, or age.

The Big Two: F1 and F2, the Architects of Vowel Space

While there are many formants, for most languages, the first two formants (F1 and F2) are the most important for telling vowels apart. They correspond directly to the physical position of your tongue and jaw.

F1: The Vowel Height Formant

The first formant, F1, is primarily correlated with vowel height—that is, how high or low your tongue is in your mouth (which generally corresponds to how open or closed your jaw is).

  • A low F1 value corresponds to a high vowel, where the tongue is raised high in the mouth. Think of the vowel /i/ in “see” or /u/ in “soo“. To make these sounds, your jaw is relatively closed.
  • A high F1 value corresponds to a low vowel, where the tongue is low and the jaw is dropped. Think of the /æ/ in “cat” or the /ɑ/ in “father”.

A simple mnemonic is that F1 frequency goes up as your jaw goes down. It’s an inverse relationship.

F2: The Vowel Backness Formant

The second formant, F2, is primarily correlated with vowel backness—how far forward or back your tongue is in your mouth.

  • A high F2 value corresponds to a front vowel. For the /i/ in “see“, the body of your tongue is pushed far forward, almost touching the ridge behind your top teeth. This shortens the front of the oral cavity, creating a high-frequency resonance.
  • A low F2 value corresponds to a back vowel. For the /u/ in “soo“, you pull your tongue back and round your lips. This creates a long, open resonating chamber in the front of your mouth, which results in a low-frequency resonance.

So, we can map vowels based on their F1 and F2 values. Let’s look at the three “corner vowels” that define the extremes of the vowel space:

  • /i/ (as in “see”): High tongue, front tongue. It has a low F1 and a high F2.
  • /u/ (as in “sue”): High tongue, back tongue. It has a low F1 and a low F2.
  • /ɑ/ (as in “spa”): Low tongue, back tongue. It has a high F1 and a low F2.

Visualizing the Sound: Vowel Charts and Spectrograms

This relationship between tongue position and formant frequencies is so reliable that linguists visualize it using the International Phonetic Alphabet (IPA) vowel chart. This chart isn’t just an arbitrary collection of symbols; it’s a schematic map of your mouth. The vertical axis represents vowel height (high to low), and the horizontal axis represents vowel backness (front to back).

Where a vowel like /e/ (“bait”) or /o/ (“boat”) sits on that chart is a direct reflection of its F1 and F2 values.

We can also see formants directly using a piece of software that generates a spectrogram. A spectrogram is a graph that visualizes sound, showing frequency on the y-axis, time on the x-axis, and intensity as darkness. When you look at a spectrogram of someone speaking, the vowels appear as distinct, dark horizontal bands. These bands are the formants! You can literally watch the bands shift up and down as the speaker moves from one vowel to another, for example, in the word “beautiful” (/i//u//ə//ʊ//ə/).

These visible bands prove that formants aren’t just a theoretical concept—they are a physical, measurable property of sound that our brain effortlessly decodes into meaningful language.

Beyond the Basics

Of course, the acoustics of speech are richer than just F1 and F2. The third formant, F3, is also important, especially for distinguishing front rounded vowels (like the German /y/ in für) from their unrounded counterparts, and it’s famously low for the American English “r” sound. The overall pattern of all formants gives each person’s voice its unique timbre.

The next time you listen to someone speak or sing, pay close attention to the vowels. Remember that you’re not just hearing a melody of changing pitches. You’re hearing a complex and beautiful dance of resonant frequencies—a hidden acoustic code, shaped by the subtle movements of the tongue and jaw, that allows us to share the entire world of human thought.