Institute of Phonetic Sciences,
University of Amsterdam,
Proceedings 19 (1995), 83-91.

ACOUSTIC CONSONANT REDUCTION: A COMPARISON[*]

R.J.J.H. van Son and Louis C.W. Pols

Abstract

Vowel reduction has been studied for years. It is a universal phenomenon that reduces the distinction of vowels in informal speech and unstressed syllables. How consonants behave in situations where vowels are reduced is not known. In this paper we compare durational and spectral data for both intervocalic consonants and vowels, uttered in read speech with otherwise identical segments from spontaneous speech. On a global level, it shows that consonants reduce like vowels when the speaking style becomes informal. On a more detailed level there are differences related to the manner of articulation of the consonants.

1. Introduction

Vowel reduction is a well established phenomenon that has found its place in phonetics textbooks (e.g., O'Shaughnessy, 1987; Clark and Yallop, 1990). Briefly summarized, vowels are pronounced more "sloppily" and with less distinction when speaking style is informal, or when the vowels are part of unstressed syllables. Essentially, vowels become more centralized and/or more like the phonemes that surround them. Although there is an ongoing debate about the details, vowel reduction is generally considered to be a universal phenomenon of speech (cf. Van Bergem, 1995).

Given such a universal phenomenon, it is quite natural to ask how consonants behave in situations where vowels become reduced. In other words, does something like "consonant reduction" really exist, and if it does, how can it be described? It must be stressed that there are reasons to expect that consonants do not behave exactly like vowels. Because of their lower acoustic energy and their often intricate structure, consonants are often more difficult to perceive and identify than vowels (however, see Steeneken, 1992). Therefore, the penalty for reduction, in terms of reduced intelligibility, could be much higher for consonants than for vowels. There have been studies that investigated acoustic and articulatory consonant reduction in relation to vowel reduction, but these were generally limited to only a few classes of consonants, with only limited speech material (some recent papers are, e.g., Byrd, 1994; Keating et al., 1994; Duez, 1995; Farnetani, 1995; Schmidt and Flege, 1995). From these studies it is difficult to discern the general effects of consonant reduction in "normal" speech situations.

To study whether and how consonants reduce, we decided to contrast speech from reading aloud with that of "spontaneous" story telling. It is known that vowels spoken informally or spontaneously are severely reduced with respect to vowels that were read aloud from text. When there exists a corresponding phenomenon that can be called consonant reduction, this too can be expected to show itself when informal speech is compared with read speech.

Consonant reduction can happen in several ways. At the broadest level, it would manifest itself as a loss of distinction in the manner of articulation and in the place of articulation. The former would result in blurring the borders between, for example, vowel-like consonants and vowels, or fricatives and plosives. The latter could result in, for example, palatalization of fricatives and plosives (Byrd, 1994), or a lack of distinction between alveolar and labio-dental consonants due to incomplete or inappropriate closure. As the perception of place and manner of articulation for consonants also depends on the transition between vowels and consonants, consonants might additionally be perceived differently due to changes in the neighbouring vowel segments alone. At the moment, any understanding of the way reduction affects the spectro-temporal structure of consonants and the way it influences consonant identification is seriously lacking. Therefore, it is difficult to point to specific features of articulation where reduction will affect the phonemic distinction of consonants. We will limit ourselves in this paper to an inventarization of aspects of consonant acoustics that parallel the vowel characteristics that are affected by vowel reduction. One important question that we want to answer is whether acoustic consonant reduction is indeed similar to vowel reduction.


Table 1. Dutch consonants used in this paper. Columns: Place of articulation (Back, Middle, Alveolar and LabioDental) Rows: Manner of articulation (Plosives, Fricatives, Nasals and Vowel-like)
         Back    Mid     Alve    LabD   
 Plos    k g             t d     p b    
 Fric    x V     J S     s z     f v    
Nasal     N               n       m     
V-like                   l ~            


Table 2. Number of matched VCV pairs, separated on consonant identity.
         Back    Mid    Alve     LabD   Total   
 Plos     62            65        61     188    
 Fric     77      3     62        75     217    
Nasal     14            72        63     149    
V-like                  44                44    
Total    153      3     243      199     598    

Four aspects of vowels and consonants are studied to characterize consonant reduction:
1. Formant values

2. Duration

3. Center of Gravity of the spectrum (i.e., the "mean" frequency)

4. Sound energy difference between vowels and consonants To be able to compare realizations across both speaking styles, we will ignore the limit of consonant reduction, i.e., complete deletion, where these aspects are undefined.

2. Material and methods

For this study we used speech material of an experienced newscaster who first told some stories and anecdotes to an interviewer (who he knew quite well). This speech was transliterated and after some time he was asked to read the transcription. This way, we obtained 2 times 20 minutes of speech (spontaneous and read). The whole orthographic script was transcribed to phonetic symbols by the Grapheme-to-Phoneme conversion module of an experimental speech synthesizer developed at the Department of Phonetics at the University of Nijmegen. One of the authors checked the transcription and marked words for sentence accent by listening.

From the phonetic transcription, all Vowel-Consonant-Vowel (VCV) segments were located in the speech recordings (also those crossing word boundaries). 1847 VCV realization pairs had both realizations originating from corresponding positions in the utterances with identical syllable structure, syllable boundary type, and sentence and word stress. Of these VCV-pairs, 598 have been analyzed in detail for this paper (see table 1 and 2) and will be used here to study consonant reduction in more detail.

Phoneme boundaries were placed using a waveform display with audio feedback (Boersma, 1996) combined with linked displays of the Harmonicity-to-noise ratio, total energy, and the spectral balance, i.e., energy in the high- (above 3 kHz) versus low- (below 750 Hz), high- versus mid- (between 750 and 3000 Hz), and mid- versus low- frequencies. In cases were none of the displays suggested a boundary, audio cues were used exclusively. The boundaries between vowels and consonants were placed, as much as possible, on waveform zero-crossings that corresponded to "visible" changes in the spectral composition of the waveform. If present, priority was given to spectral changes that indicated the start or end of a constriction (e.g., abrupt changes in the spectral balance). Formant analysis was done using LPC-based formant extraction algorithm.


Figure 1. Spectral reduction in Dutch vowel space. In the vowel plane, reduced realizations occupy a more central position. Illustrated here by the difference between vowels from Spontaneous and Read speech. Arrows indicate direction of change under reduction.

3. Results

3.1. Formant values

Vowel reduction is characterized by a centralization of the distribution of vowel realizations in the F1/F2 plane. The vowels from the spontaneous VCV segments used in this study show such a centralization with respect to those from read VCV segments (figure 1, see also the independent analysis of the same speech by Koopmans-Van Beinum, 1992). Unfortunately, there is no corresponding compact consonant formant space where reduction could show up. However, the formant transitions in the vowel onset are important for consonant identification in CV sequences. There is a regularity between the F2 frequency at the start of the vowel and the frequency inside the vowel kernel. If the F2 frequencies at the start of the vowel are plotted against the frequencies in the vowel center, then, for each consonant, the points cluster along a straight line.


Figure 2. Left panel: examples of plots corresponding to F2-locus equations for plosives. For each CV-transition, the F2-onset frequency is plotted against the F2-target frequency. For each individual consonant, the equation of the linear regression line through these points is the locus equation. Right panel: reduction in F2-locus equations of groups of Dutch consonants. The correlation coefficients of F2-locus equations indicate the "consistency" of CV (co-)articulation. The * symbol indicates a statistically significant difference between correlation coefficients in read versus spontaneous speech, p <= 0.01, two tailed test of Fisher Z-transforms of the correlation coefficients

Figure 3. Durational reduction in Dutch vowels and consonants, left vowels (initial and final vowels in the VCV segments pooled), right consonants and mean values for vowels (V1: initial; V2: final; no #: excluding pauses). * p <= 0.01, two tailed Sign test (p<=0.001 for all vowels pooled)

The linear regression line through these points represents the F2-locus equation for that consonant (e.g., figure 2, left panel; Sussman et al., 1991, 1993, 1995). The correlation coefficients of the linear regression lines indicate the strength of the relation between onset and kernel values of the second formant and hence the consistency of articulation.

Usually, the F2-locus equations are used only for plosives, but the regularities extend to other consonants as well (cf. the high correlation coefficients in the right panel of figure 2). No articulatory or perceptual correlates are known for these locus equations, but they do indicate a specific invariant distinctiveness in articulation. Therefore, the size of the correlation coefficient can be used to quantify the consistency of articulation of a certain consonant. The occurrence of consonant reduction should be visible as a less distinctive articulation resulting in lower correlation coefficients. In the right hand panel of figure 2 it can be seen that the correlation coefficients are indeed consistently lower in spontaneous speech for most consonants than in read speech, indicating reduction.

3.2. Duration

Duration is one of the strongest correlates of vowel reduction (e.g., Lindblom, 1963). As is to be expected, there is a very consistent decrease in vowel duration in the spontaneous members of each pair (figure 3, left hand panel for individual vowels, right hand panel for pooled values, see also Koopmans-Van Beinum, 1992). The odd one out is the /E:/, which is extremely rare in Dutch. The consonant realizations too are shorter in spontaneous speech. This holds for all consonantal categories, except for the vowel-like laterals, /l ~/, where duration seems to remain constant.

Both vowels and consonants become shorter when spoken spontaneously. Furthermore, they become shorter by the same amount. The relative duration of consonants in the VCV segments, i.e., as a fraction of the total, does not change when speaking style changes (not shown).
Figure 4. Reduction of the Center of Gravity for Dutch vowels and consonants, left vowels, right mean vowel values and consonants. Plotted are the averages of the maximal values within phoneme realizations. Note the logarithmic scale in the right hand plot. V1: mean of all initial vowels, V2: mean of all final vowels, no #: excluding pauses. * p <= 0.001, two tailed Sign test

3.3. Center of Gravity

The center of gravity of a spectrum is in a sense, the "mean" frequency. It is calculated by dividing [[integral]]f.E(f).df by [[integral]]E(f).df. For sonorants, the center of gravity is related to the spectral slope, the steeper the slope, the lower the center of gravity. The steepness of the spectral slope, in its turn, is determined by the steepness of the glottal pulse which is a measure of speech effort. For turbulent noise, the center of gravity is determine by the size of the quotient of (air flow speed) / (constriction area) which again is determined by speech effort.

For Dutch (and English), a more level spectral slope, i.e., a higher center of gravity, strongly correlates with perceived sentence accent (Sluijter, 1995a,b). As the de-accentuation of vowels strongly correlates with vowel reduction, we can predict that reduction will show up as a lower center of gravity. In figure 4 this prediction bears out for the vowel realizations. For each and every vowel, spontaneous realizations have a lower center of gravity than the read realizations. For the sonorants and fricatives we see a similar picture (lower centers of gravity for spontaneous realizations). For the release bursts of the plosives we see an erratic behaviour that does not seem to indicate a definite difference in the center of gravity with respect to speaking style.

In figure 4, a subdivision of the phonemes can be seen. Very high absolute frequencies are found for the center of gravity of obstruents (plosives and fricatives) due to the strong high frequency component in the noise. For fricatives, the absolute height of the center of gravity is inversely related to the size of the cavity in front of the noise source. For plosives the pattern is more intricate. The frequencies for /tdkg/ from spontaneous speech are indistinguishable or higher than those from read speech (statistically not significant). The vowel-like frequencies for /pb/ show the mark of the open oral cavity behind the sound source.

Quite low frequencies are found for the sonorants (vowels and consonants) with vowels having higher values than consonants. For the consonant sonorants, the center of gravity is dominated by the damping of the higher frequencies due to their closed articulation.

Figure 5. Definition of the intervocalic sound energy difference. Vmax = (V1,max+V2,max)/2: Plosives and fricatives: Vmax - Cmax , Nasals and vowel-like: Vmax - - Cmin.

3.4. Intervocalic sound energy difference

One of the most salient differences between vowels and consonants is in their respective sound energy. Vowels generally have much higher energy levels than consonants. Vowel reduction decreases the maximal sound energy level of vowels. Whether the energy level of consonants changes by the same amount can be determined by measuring the sound energy, or the relative energy, of consonants with respect to their flanking vowels. The sound energy difference is measured as indicated in figure 5.

Figure 6 displays the sound energy differences for read and spontaneous speech. For obstruents (plosives and fricatives) the difference is smaller in spontaneous speech than in read speech, for sonorant consonants (nasals and the /l ~/) it is larger. Altogether, the differences between the intervocalic sound energy differences in read and those in spontaneous speech seem to be small, of the order of 1 dB. Therefore, any change in the sound level of the vowels seems to be largely mirrored by the consonants. The differences between the behaviour of obstruents and sonorant consonants in this respect can, most likely, be traced back to the differences in the mechanism by which the speech sounds are generated. In sonorants, sound originates in the glottal pulses, in obstruents it originates in the air flow turbulence near an obstruction.
Figure 6. Reduction of intervocalic sound energy difference. * p <= 0.001, two tailed Sign test .

In section 3.3 (figure 4) it was shown that the spectral differences between read and spontaneous speech could be explained by a lower speech effort in spontaneous speech than in read speech. Such a reduced speech effort would not only result in a lower center of gravity, but also in a lower sound energy level. Figure 6 indicates that there is a relation between the center of gravity (i.e., the spectral slope) and the change in the sound energy: the lower the center of gravity (i.e., the steeper the spectral slope), the larger the difference in total sound energy between speaking styles. The size of the center of gravity of vowels lays in between that of nasals and that of fricatives (figure 4). The observed effect of a more informal speaking style on the intervocalic sound energy difference of nasals and fricatives implies that the absolute sound energy level of vowels changes by an amount in between of that of nasals and fricatives, i.e., less than nasals, more than fricatives (figure 6). In other words, a more informal speaking style results in an increase in the intervocalic sound energy difference for nasals (whose energy level reduces more than vowels) and a decrease for fricatives (that reduce less than vowels).

4. Discussion

Four correlates of reduction have been studied for consonants with respect to speaking style: 1) F2-locus equations, 2) Duration, 3) Center of Gravity, and 4) Intervocalic sound energy difference.

The generally lower correlation coefficients of the F2-locus equations in spontaneous speech indicate a loss of (articulatory) distinction. This is equivalent to the spectral effects of articulatory reduction found in vowel space.

In spontaneous speech, consonant realizations shorten like vowels. The decrease in duration of consonants is such that the relative duration, as a fraction of total VCV segment duration, remains unchanged (not shown). Therefore, the change in duration in seems to be a "global" feature of the change in speaking style.

Except for the plosives, all consonants and vowels showed a decrease in center of gravity. This indicates that both the vowels and the non-plosive consonants show a diminishing source strength in spontaneous speech. This in return, implies a decrease in vocal and articulatory effort. As the center of gravity is strongly linked to the spectral slope at high frequencies, this lowering can be expected to correlate with a decrease in the perceived stress (Sluijter, 1995a,b).

In spontaneous speech, the sonorant consonants (nasals and the /l ~/) "weaken" somewhat more than the neighbouring vowels whereas non-sonorants (fricatives and plosives) "weaken" somewhat less than the vowels (figure 6). Combined with the results for the spectral center of gravity this implies that the slope of the spectrum determines the size of the difference in sound energy that results from a difference in speaking style.

6. Conclusions

Uttered in a more informal style, consonant realizations show reduction in terms of diminishing articulatory precision and global effort. Furthermore, consonant reduction resembles vowel reduction in both type and extent of the changes in the produced sounds. Details of the spectral changes in consonants due to speaking style depend on the source of the speech sound: vocal folds or fricative noise.

In our future research we will extend coverage of the consonant used. Next to speaking style, we will also look at the effects of stress on reduction. The global measures of acoustic reduction presented here will be supplemented with more detailed types of analysis. Identification experiments to determine the effects of acoustic reduction on consonant identification are under way.

7. Acknowledgements

The authors want to thank Dr. Florien Koopmans-van Beinum for supplying the speech recordings and Dr. Noortje Blauw for her transliteration of the spontaneous speech. This research was made possible by grant 300-173-029 of the Dutch Organization of Research (NWO).

8. References