Vowel reduction is a well established phenomenon that has found its place in phonetics textbooks (e.g., O'Shaughnessy, 1987; Clark and Yallop, 1990). Briefly summarized, vowels are pronounced more "sloppily" and with less distinction when speaking style is informal, or when the vowels are part of unstressed syllables. Essentially, vowels become more centralized and/or more like the phonemes that surround them. Although there is an ongoing debate about the details, vowel reduction is generally considered to be a universal phenomenon of speech (cf. Van Bergem, 1995).
To study whether and how consonants reduce, we decided to contrast speech from reading aloud with that of "spontaneous" story telling. It is known that vowels spoken informally or spontaneously are severely reduced with respect to vowels that were read aloud from text. When there exists a corresponding phenomenon that can be called consonant reduction, this too can be expected to show itself when informal speech is compared with read speech.
Consonant reduction can happen in several ways. At the broadest level, it would manifest itself as a loss of distinction in the manner of articulation and in the place of articulation. The former would result in blurring the borders between, for example, vowel-like consonants and vowels, or fricatives and plosives. The latter could result in, for example, palatalization of fricatives and plosives (Byrd, 1994), or a lack of distinction between alveolar and labio-dental consonants due to incomplete or inappropriate closure. As the perception of place and manner of articulation for consonants also depends on the transition between vowels and consonants, consonants might additionally be perceived differently due to changes in the neighbouring vowel segments alone. At the moment, any understanding of the way reduction affects the spectro-temporal structure of consonants and the way it influences consonant identification is seriously lacking. Therefore, it is difficult to point to specific features of articulation where reduction will affect the phonemic distinction of consonants. We will limit ourselves in this paper to an inventarization of aspects of consonant acoustics that parallel the vowel characteristics that are affected by vowel reduction. One important question that we want to answer is whether acoustic consonant reduction is indeed similar to vowel reduction.
Back Mid Alve LabD
Plos k g t d p b
Fric x V J S s z f v
Nasal N n m
V-like l ~
Back Mid Alve LabD Total
Plos 62 65 61 188
Fric 77 3 62 75 217
Nasal 14 72 63 149
V-like 44 44
Total 153 3 243 199 598
Four
aspects of vowels and consonants are studied to characterize consonant
reduction:
2. Duration
3. Center of Gravity of the spectrum (i.e., the "mean" frequency)
4. Sound energy difference between vowels and consonants To be able to compare realizations across both speaking styles, we will ignore the limit of consonant reduction, i.e., complete deletion, where these aspects are undefined.
For this study we used speech material of an experienced newscaster who first told some stories and anecdotes to an interviewer (who he knew quite well). This speech was transliterated and after some time he was asked to read the transcription. This way, we obtained 2 times 20 minutes of speech (spontaneous and read). The whole orthographic script was transcribed to phonetic symbols by the Grapheme-to-Phoneme conversion module of an experimental speech synthesizer developed at the Department of Phonetics at the University of Nijmegen. One of the authors checked the transcription and marked words for sentence accent by listening.
Phoneme boundaries were placed using a waveform display with audio feedback (Boersma, 1996) combined with linked displays of the Harmonicity-to-noise ratio, total energy, and the spectral balance, i.e., energy in the high- (above 3 kHz) versus low- (below 750 Hz), high- versus mid- (between 750 and 3000 Hz), and mid- versus low- frequencies. In cases were none of the displays suggested a boundary, audio cues were used exclusively. The boundaries between vowels and consonants were placed, as much as possible, on waveform zero-crossings that corresponded to "visible" changes in the spectral composition of the waveform. If present, priority was given to spectral changes that indicated the start or end of a constriction (e.g., abrupt changes in the spectral balance). Formant analysis was done using LPC-based formant extraction algorithm.
Vowel reduction is characterized by a centralization of the distribution of vowel realizations in the F1/F2 plane. The vowels from the spontaneous VCV segments used in this study show such a centralization with respect to those from read VCV segments (figure 1, see also the independent analysis of the same speech by Koopmans-Van Beinum, 1992). Unfortunately, there is no corresponding compact consonant formant space where reduction could show up. However, the formant transitions in the vowel onset are important for consonant identification in CV sequences. There is a regularity between the F2 frequency at the start of the vowel and the frequency inside the vowel kernel. If the F2 frequencies at the start of the vowel are plotted against the frequencies in the vowel center, then, for each consonant, the points cluster along a straight line.
Usually, the F2-locus equations are used only for plosives, but the regularities extend to other consonants as well (cf. the high correlation coefficients in the right panel of figure 2). No articulatory or perceptual correlates are known for these locus equations, but they do indicate a specific invariant distinctiveness in articulation. Therefore, the size of the correlation coefficient can be used to quantify the consistency of articulation of a certain consonant. The occurrence of consonant reduction should be visible as a less distinctive articulation resulting in lower correlation coefficients. In the right hand panel of figure 2 it can be seen that the correlation coefficients are indeed consistently lower in spontaneous speech for most consonants than in read speech, indicating reduction.
Duration is one of the strongest correlates of vowel reduction (e.g., Lindblom, 1963). As is to be expected, there is a very consistent decrease in vowel duration in the spontaneous members of each pair (figure 3, left hand panel for individual vowels, right hand panel for pooled values, see also Koopmans-Van Beinum, 1992). The odd one out is the /E:/, which is extremely rare in Dutch. The consonant realizations too are shorter in spontaneous speech. This holds for all consonantal categories, except for the vowel-like laterals, /l ~/, where duration seems to remain constant.
The center of gravity of a spectrum is in a sense, the "mean" frequency. It is calculated by dividing [[integral]]f.E(f).df by [[integral]]E(f).df. For sonorants, the center of gravity is related to the spectral slope, the steeper the slope, the lower the center of gravity. The steepness of the spectral slope, in its turn, is determined by the steepness of the glottal pulse which is a measure of speech effort. For turbulent noise, the center of gravity is determine by the size of the quotient of (air flow speed) / (constriction area) which again is determined by speech effort.
In figure 4, a subdivision of the phonemes can be seen. Very high absolute frequencies are found for the center of gravity of obstruents (plosives and fricatives) due to the strong high frequency component in the noise. For fricatives, the absolute height of the center of gravity is inversely related to the size of the cavity in front of the noise source. For plosives the pattern is more intricate. The frequencies for /tdkg/ from spontaneous speech are indistinguishable or higher than those from read speech (statistically not significant). The vowel-like frequencies for /pb/ show the mark of the open oral cavity behind the sound source.
Quite low frequencies are found for the sonorants (vowels and consonants) with
vowels having higher values than consonants. For the consonant sonorants, the
center of gravity is dominated by the damping of the higher frequencies due to
their closed articulation.
One of the most salient differences between vowels and consonants is in their respective sound energy. Vowels generally have much higher energy levels than consonants. Vowel reduction decreases the maximal sound energy level of vowels. Whether the energy level of consonants changes by the same amount can be determined by measuring the sound energy, or the relative energy, of consonants with respect to their flanking vowels. The sound energy difference is measured as indicated in figure 5.
Four correlates of reduction have been studied for consonants with respect to speaking style: 1) F2-locus equations, 2) Duration, 3) Center of Gravity, and 4) Intervocalic sound energy difference.
In spontaneous speech, consonant realizations shorten like vowels. The decrease in duration of consonants is such that the relative duration, as a fraction of total VCV segment duration, remains unchanged (not shown). Therefore, the change in duration in seems to be a "global" feature of the change in speaking style.
Except for the plosives, all consonants and vowels showed a decrease in center of gravity. This indicates that both the vowels and the non-plosive consonants show a diminishing source strength in spontaneous speech. This in return, implies a decrease in vocal and articulatory effort. As the center of gravity is strongly linked to the spectral slope at high frequencies, this lowering can be expected to correlate with a decrease in the perceived stress (Sluijter, 1995a,b).
In spontaneous speech, the sonorant consonants (nasals and the /l ~/) "weaken" somewhat more than the neighbouring vowels whereas non-sonorants (fricatives and plosives) "weaken" somewhat less than the vowels (figure 6). Combined with the results for the spectral center of gravity this implies that the slope of the spectrum determines the size of the difference in sound energy that results from a difference in speaking style.
Uttered in a more informal style, consonant realizations show reduction in terms of diminishing articulatory precision and global effort. Furthermore, consonant reduction resembles vowel reduction in both type and extent of the changes in the produced sounds. Details of the spectral changes in consonants due to speaking style depend on the source of the speech sound: vocal folds or fricative noise.
In our future research we will extend coverage of the consonant used. Next to speaking style, we will also look at the effects of stress on reduction. The global measures of acoustic reduction presented here will be supplemented with more detailed types of analysis. Identification experiments to determine the effects of acoustic reduction on consonant identification are under way.
The authors want to thank Dr. Florien Koopmans-van Beinum for supplying the speech recordings and Dr. Noortje Blauw for her transliteration of the spontaneous speech. This research was made possible by grant 300-173-029 of the Dutch Organization of Research (NWO).