Next: Diphthongs and Reduced Vowels Up: The Acoustics of Vowels Previous: Resonance

Vowels

(NOTE: We do not include the diphthongs in the definition of the vowel category; this class of sounds is more accurately termed the monophthongs, but we will for now call them the vowels. The American English vowels which we consider are Worldbet front vowels /i:/, /I/, /E/, /@/; the mid vowels /I_x/, /3r/, /&/, /&r/; and back vowels /u/, /U/, />/, //, /A/.

A vowel can be defined as a relatively long-lasting, unchanging sound in which the oral tract (with help from the nasal tract in the case of nasalized vowels such as in French) is kept relatively open from the glottis to the lips, allowing the vocal tract to act as a resonator. Remember from our discussion of the whisper that that a vowel need not be voiced, but we consider only voiced vowels from this point on.
Vowels are stable segments during which the articulators do not move. They almost always carry the greatest energy in the speech signal, because during vowel phonation the vocal tract is most open. Because of these characteristics, vowels are probably the easiest speech category to recognize in a spectrogram.
What gives vowels their individual character is the existence of a different set of formants in the spectrogram for each vowel. Formants are those frequency ranges which emerge from the mouth and nose with the greatest relative amplitude. From the above discussion of resonance, formants may be recognized as the resonant frequencies of the vocal tract. For all voiced sounds including vowels, it is usually sufficient to look at the three lowest frequency formants to recognize the phoneme. Those formants are labelled F1, F2, and F3.
For today's purpose, which is to gain some idea of the acoustic characteristics of vowels, it will be sufficient to take as examples the three so-called quantal vowels of American English plus the neutral vowel. See Figure 8 for an idea of where the quantal vowels fit in the vowel triangle (which is really a quadrilateral). The four vowels are the following:
- /i:/ An high front vowel having a high-frequency concentration of energy above 1800 Hz.
- /u/ A high back vowel having almost all its energy in the low frequencies below 1000 Hz.
- /A/ A low vowel having a tight concentration of energy in the mid-range of 800 Hz to 1800 Hz.
- /&/ A central vowel having a spread of energy among all frequencies.
Figure 9 shows about 10 pitch periods for each of these quantal vowels. Figure 10 shows the traditional spectrogram of the same four vowels, along with a notion of where they fit in the vowel space. Figure 11 shows these four vowels in 3-D form. Observe the location of the formants F1, F2, and F3, which look like mountain peaks.
The rules which acousticians use to predict the formant structure of vowels can be summarized as follows:
1. The area of the major constriction determines the location of F1; as that area decreases, F1 also decreases. Contrast the small opening of /i:/ and /u/ with the widely open /A/ and the intermediate /&/.
2. The distance from the glottis to the major constriction determines the location of F2; as the distance increases, F2 also increases. Contrast the high F2 of /i:/ with the low F2 of /u/. See Figure 11.
3. For a given area and distance from the glottis of the major constriction in a vowel, lip rounding causes F1 and F2, especially F2, to fall. Thus, the back vowel /u/ has a lower F2 than rule 2 alone would predict because we round our lips when pronouncing this vowel.
The above rules are subject to further refinement.
The importance of the article Peterson and Barney article, ``Control Methods Used in a Study of the Vowels", which appeared in 1952, is that it provided one of the first solid insights into how a listener's classification of a given vowel depends upon the formant frequencies in the speech signal. In the study, 70 subjects listened to 1520 recorded words from the set heed, hid, head, had, hod, hawed, hood, who'd, hud, heard as pronounced by 76 different adult male, adult female, and child speakers. Many interesting statistics emerged, but one of the main conclusions is provided by Figure 12, which plots the F1 formant value versus the F2 formant value of a number of adult male and child speakers for the 10 vowels.

}

Next: Diphthongs and Reduced Vowels Up: The Acoustics of Vowels Previous: Resonance

Ed Kaiser
Sat Mar 15 00:01:27 PST 1997