SPECTRO-TEMPORAL FEATURES OF VOWEL SEGMENTS

Rob van Son

SPECTRO-TEMPORAL FEATURES OF VOWEL SEGMENTS

Note: This thesis was originally written in Microsoft Word 5 on the Macintosh. Later, I found out that the original word files could not be reliably converted to PDF anymore. Several placeholders for figures were lost and some figures and tables overlap. Moreover, some symbols in formulas were lost too. Therefore, in addition to losing figures and tables, the page numbering is off. Together with the endless problems I had keeping formatting and page numbering "alive" when I could least afford the lost time, it might not come as a surprise that I will not recommend any version of Microsoft Office for writing a PhD thesis. For your information, I have switched to LaTeX for anything more complex than a personal letter.

Summary

In this thesis we have investigated several aspects of the spectro-temporal structure of vowel segments, both concerning vowel production as well as vowel perception. Chapter 1 contains a summary of current models on vowel production and perception. Models of vowel pronunciation try to explain why vowel realizations vary so much in natural speech. It is known that vowel production is influenced in highly systematic ways by context, stress, and speaking style (among others). The classical explanation is that of the target-undershoot model. This model states that vowel articulation is limited by the speed of the articulators (e.g., jaw, tongue, lips). Each vowel has a unique target-position for each of the articulators which will produce the ideal, or canonical, realization of that vowel. When vowel realizations are very long, there is ample time for the different articulators to reach their respective target positions. However, when vowel duration is short and the context forces the articulators to cover relatively large distances, there is not enough time and the articulators are stopped short of their targets. The resulting vowel realizations show "undershoot" in their articulatory movements as well as in the resulting formant frequencies, hence the name of the model: target-undershoot.

The classical quantitative study of Lindblom (1963) on the relation between vowel duration and formant-undershoot is discussed in depth. It showed that formant-undershoot increased exponentially with a decrease in vowel duration. However, subsequent studies gave ambiguous results. Some studies did find clear evidence for articulatory- and formantundershoot. Others showed that there were numerous cases were no relation between vowel duration and target-undershoot could be found. Especially, changes in stress and speaking style could bring about changes in duration that were not accompanied by changes in target-undershoot. In our opinion, these conflicting results can be explained by assuming that target-undershoot is planned by the speaker. In this view, the undershoot serves a purpose that depends on factors like context, prosody, and speaking style. From this it follows that, irrespective of vowel duration, the undershoot itself should not change if the purpose of the undershoot does not change and vice versa.

Considering the conflicting reports in the literature, it seems that any test of the target-undershoot model should introduce changes in vowel duration without changing stress, speaking style, or other prosodic factors that were known to cause ambiguous results. In this study, we settled for changes in speaking rate. A long, meaningful text, read at a normal and at a fast rate, would induce a speaker to use the same stress assignments and the same "style" of speaking, irrespective of reading speed. At the same time, a difference in speaking rate would change the duration of all the vowels. In this study (chapters 2-4), we used all realizations of seven different vowels and some realizations of the schwa (/´/). If vowel duration could control formant-undershoot all by itself, then an increase in speaking rate should induce an increase in undershoot. However, if formant-undershoot is planned, then a change in speaking rate should not necessarily result in a change in formant-undershoot.