Formant: Track...

A command to extract a specified number of formant tracks from each selected Formant object. The tracks represent the cheapest paths through the measured formant values in consecutive frames.

How to use

In order to be capable of producing three tracks (i.e. F1, F2, and F3), there must be at least three formant candidates in every frame of the Formant object. The typical use of this command, therefore, is to analyse five formants with Sound: To Formant (burg)... and then use the tracking command to extract three tracks.

When to use, when not

This command only makes sense if the whole of the formant contour makes sense. For speech, formant contours make sense only for vowels and the like. During some consonants, the Formant object may have fewer than three formant values, and trying to create three tracks through them will fail. You will typically use this command for the contours in diphthongs, if at all.

Settings

To be able to interpret the settings, you should know that the aim of the procedure is to minimize the sum of the costs associated with the three tracks.

Number of tracks
the number of formant tracks that the procedure must find. If this number is 3, the procedure will try to find tracks for F1, F2, and F3; if the Formant object contains a frame with less than three formants, the tracking procedure will fail.
Reference F1 (Hz)
the preferred value near which the first track wants to be. For average (i.e. adult female) speakers, this value will be around the average F1 for vowels of female speakers, i.e. 550 Hz.
Reference F2 (Hz)
the preferred value near which the second track wants to be. A good value will be around the average F2 for vowels of female speakers, i.e. 1650 Hz.
Reference F3 (Hz)
the preferred value near which the third track wants to be. A good value will be around the average F3 for vowels of female speakers, i.e. 2750 Hz. This argument will be ignored if you choose to have fewer than three tracks, i.e., if you are only interested in F1 and F2.
Reference F4 (Hz)
the preferred value near which the fourth track wants to be. A good value may be around 3850 Hz, but you will usually not want to track F4, because traditional formant lore tends to ignore it (however inappropriate this may be for the vowel [i]), and because Formant objects often contain not more than three formant values in some frames. So you will not usually specify a higher Number of tracks than 3, and in that case, this argument will be ignored.
Reference F5 (Hz)
the preferred value near which the five track wants to be. In the unlikely case that you want five tracks, a good value may be around 4950 Hz.
Frequency cost (per kiloHertz)
the local cost of having a formant value in your track that deviates from the reference value. For instance, if a candidate (i.e. any formant in a frame of the Formant object) has a formant frequency of 800 Hz, and Frequency cost is 1.0/kHz, the cost of putting this formant in the first track is 0.250, because the distance to the reference F1 of 550 Hz is 250 Hz. The cost of putting the formant in the second track would be 0.850 (= (1.650 kHz - 0.600 kHz) · 1.0/kHz), so we see that the procedure locally favours the inclusion of the 800 Hz candidate into the F1 track. But the next two cost factors may override this local preference.
Bandwidth cost
the local cost of having a bandwidth, relative to the formant frequency. For instance, if a candidate has a formant frequency of 400 Hz and a bandwidth of 80 Hz, and Bandwidth cost is 1.0, the cost of having this formant in any track is (80/400) · 1.0 = 0.200. So we see that the procedure locally favours the inclusion of candidates with low relative bandwidths.
Transition cost (per octave)
the cost of having two different consecutive formant values in a track. For instance, if a proposed track through the candidates has two consecutive formant values of 300 Hz and 424 Hz, and Transition cost is 1.0/octave, the cost of having this large frequency jump is (0.5 octave) · (1.0/octave) = 0.500.

Algorithm

This command uses a Viterbi algorithm with multiple planes. For instance, if the selected Formant object contains up to five formants per frame, and you request three tracks, the Viterbi algorithm will have to choose between ten candidates (the number of combinations of three out of five) for each frame.

The formula for the cost of e.g. track 3, with proposed values F2i (i = 1...N, where N is the number of frames) is:

i=1..N frequencyCost·|F3ireferenceF3|/1000 +
+ ∑i=1..N bandWidthCost·B3i/F3i +
+ ∑i=1..N-1 transitionCost·|log2(F3i/F3,i+1)|

Analogous formulas compute the cost of track 1 and track 2. The procedure will assign those candidates to the three tracks that minimize the sum of three track costs.

Links to this page


© ppgb, March 8, 2002