Source-filter synthesis 4. Using existing sounds

1. How to extract the filter from an existing speech sound

You can separate source and filter with the help of the technique of linear prediction (see Sound: LPC analysis). This technique tries to approximate a given frequency spectrum with a small number of peaks, for which it finds the mid frequencies and the bandwidths. If we do this for an overlapping sequence of windowed parts of a sound signal (i.e. a short-term analysis), we get a quasi-stationary approximation of the signal's spectral characteristics as a function of time, i.e. a smoothed version of the Spectrogram.

For a speech signal, the peaks are identified with the resonances (formants) of the vocal tract. Since the spectrum of a vowel spoken by an average human being falls off with approximately 6 dB per octave, pre-emphasis is applied to the signal before the linear-prediction analysis, so that the algorithm will not try to match only the lower parts of the spectrum.

For an adult female human voice, tradition assumes five formants in the range between 0 and 5500 hertz, say at 550, 1650, 2750, 3850, and 4950 hertz. For the linear prediction in Praat, you will have to implement this 5500-Hz band-limiting by resampling the original speech signal to 11 kHz. For a male voice, you would use 10 kHz; for a young child, 20 kHz.

To perform the resampling, you use Sound: Resample...: you select a Sound object, and click Resample.... In the rest of this tutorial, I will use the syntax that you would use in a script, though you will usually do these things by clicking on objects and buttons. Thus:

    selectObject: “Sound hallo”
    Resample: 11000, 50

You can then perform a linear-prediction analysis on the resampled sound with Sound: To LPC (burg)...:

    selectObject: “Sound hallo_11000”
    To LPC (burg): 10, 0.025, 0.005, 50

This says that your analysis is done with 10 linear-prediction parameters (which will yield at most five formant-bandwidth pairs), with an analysis window effectively 25 milliseconds long, with time steps of 5 milliseconds (so that the windows will appreciably overlap), and with a pre-emphasis frequency of 50 Hz (which is the point above which the sound will be amplified by 6 dB/octave prior to the analysis proper).

As a result, an object called “LPC hallo” will appear in the list of objects. This LPC object is a time function with 10 linear-prediction coefficients in each time frame. These coefficients are rather opaque even to the expert (try to view them with Inspect), but they are the raw material from which formant and bandwidth values can be computed. To see the smoothed Spectrogram associated with the LPC object, choose LPC: To Spectrogram...:

    selectObject: “LPC hallo_11000”
    To Spectrogram: 20, 0, 50
    Paint: 0, 0, 0, 0, 50, 0, 0, “yes”

Note that when drawing this Spectrogram, you will want to set the pre-emphasis to zero (the fifth 0 in the last line), because pre-emphasis has already been applied in the analysis.

You can get and draw the formant-bandwidth pairs from the LPC object, with LPC: To Formant and Formant: Speckle...:

    selectObject: “LPC hallo_11000”
    To Formant
    Speckle: 0, 0, 5500, 30, “yes”

Note that in converting the LPC into a Formant object, you may have lost some information about spectral peaks at very low frequencies (below 50 Hz) or at very high frequencies (near the Nyquist frequency of 5500 Hz. Such peaks usually try to fit an overall spectral slope (if the 6 dB/octave model is inappropriate), and are not seen as related with resonances in the vocal tract, so they are ignored in a formant analysis. For resynthesis purposes, they might still be important.

Instead of using the intermediate LPC object, you could have done a formant analysis directly on the original Sound, with Sound: To Formant (burg)...:

    selectObject: “Sound hallo”
    To Formant (burg): 0.005, 5, 5500, 0.025, 50

A Formant object has a fixed sampling (time step, frame length), and for every formant frame, it contains a number of formant-bandwidth pairs.

From a Formant object, you can create a FormantGrid with Formant: Down to FormantGrid. A FormantGrid object contains a number of tiers with time-stamped formant points and bandwidth points.

Any of these three types (LPC, Formant, and FormantGrid) can represent the filter in source-filter synthesis.

2. How to extract the source from an existing speech sound

If you are only interested in the filter characteristics, you can get by with Formant objects. To get at the source signal, however, you need the raw LPC object: you select it together with the resampled Sound, and apply inverse filtering:

    selectObject: “Sound hallo_11000”, “LPC hallo_11000”
    Filter (inverse)

A new Sound named hallo_11000 will appear in the list of objects (you could rename it to source). This is the estimated source signal. Since the LPC analysis was designed to yield a spectrally flat filter (through the use of pre-emphasis), this source signal represents everything in the speech signal that cannot be attributed to the resonating cavities. Thus, the “source signal” will consist of the glottal volume-velocity source (with an expected spectral slope of -12 dB/octave for vowels) and the radiation characteristics at the lips, which cause a 6 dB/octave spectral rise, so that the resulting spectrum of the “source signal” is actually the derivative of the glottal flow, with an expected spectral slope of -6 dB/octave.

Note that with inverse filtering you cannot measure the actual spectral slope of the source signal. Even if the actual slope is very different from -6 dB/octave, formant extraction will try to match the pre-emphasized spectrum. Thus, by choosing a pre-emhasis of -6 dB/octave, you impose a slope of -6 dB/octave on the source signal.

3. How to do the synthesis

You can create a new Sound from a source Sound and a filter, in at least four ways.

If your filter is an LPC object, you select it and the source, and choose LPC & Sound: Filter...:

    selectObject: “Sound source”, “LPC filter”
    Filter: “no”

If you had computed the source and filter from an LPC analysis, this procedure should give you back the original Sound, except that windowing has caused 25 milliseconds at the beginning and end of the signal to be set to zero.

If your filter is a Formant object, you select it and the source, and choose Sound & Formant: Filter:

    selectObject: “Sound source”, “Formant filter”
    Filter

If you had computed the source and filter from an LPC analysis, this procedure will not generally give you back the original Sound, because some linear-prediction coefficients will have been ignored in the conversion to formant-bandwidth pairs.

If your filter is a FormantGrid object, you select it and the source, and choose Sound & FormantGrid: Filter:

    selectObject: “Sound source”, “FormantGrid filter”
    Filter

Finally, you could just know the impulse response of your filter (in a Sound object). You then select both Sound objects, and choose Sounds: Convolve...:

    selectObject: “Sound source”, “Sound filter”
    Convolve: “integral”, “zero”

As a last step, you may want to bring the resulting sound within the [-1; +1] range:

    Scale peak: 0.99

4. How to manipulate the filter

You can hardly change the values in an LPC object in a meaningful way: you would have to manually change its rather opaque data with the help of Inspect.

A Formant object can be changed in a friendlier way, with Formant: Formula (frequencies)... and Formant: Formula (bandwidths).... For instance, to multiply all formant frequencies by 0.9, you do

    selectObject: “Formant filter”
    Formula (frequencies): “self * 0.9”

To add 200 hertz to all values of F2, you do

    Formula (frequencies): ~ if row = 2 then self + 200 else self fi

A FormantGrid object can be changed by adding or removing points:

FormantGrid: Add formant point...
FormantGrid: Add bandwidth point...
FormantGrid: Remove formant points between...
FormantGrid: Remove bandwidth points between...

5. How to manipulate the source signal

You can manipulate the source signal in the same way you that would manipulate any sound, for instance with the ManipulationEditor.

Links to this page


© ppgb 20170828