pitch analysis by raw autocorrelation

Purpose

to perform a pitch analysis based on an autocorrelation method.

Usage

Raw autocorrelation is the pitch analysis method of choice if you want measure the raw periodicity of a signal.

Note that the preferred method for speech (intonation, vocal fold vibration) is pitch analysis by filtered autocorrelation. See how to choose a pitch analysis method for details.

Algorithm

The algorithm performs an acoustic periodicity detection on the basis of an accurate autocorrelation method, as described in Boersma (1993). This method is more accurate, noise-resistant, and robust, than methods based on cepstrum or combs, or the original autocorrelation methods. The reason why other methods were invented, was the failure to recognize the fact that if you want to estimate a signal's short-term autocorrelation function on the basis of a windowed signal, you should divide the autocorrelation function of the windowed signal by the autocorrelation function of the window:

rx (τ) ≈ rxw (τ) / rw (τ)

The pitch is basically determined as the inverse of the time (lag) where the autocorrelation function r has its maximum. However, there are likely to be multiple peaks in r, and all of these can be pitch candidates. For each moment in time (e.g. every 10 ms), the algorithm determines the (typically) 15 highest peaks in r, regards these as candidates, and then tracks an optimal path through the candidates over time.

Settings

Several settings are already described in Intro 4.2. Configuring the pitch contour. The explanations below assume that you have gone through that part of the Intro.

The settings that control the recruitment of the candidates are:

Time step (s) (standard value: 0.0)
the measurement interval (frame duration), in seconds. If you supply 0, Praat will use a time step of 0.75 / (pitch floor), e.g. 0.01 seconds if the pitch floor is 75 Hz; in this example, Praat computes 100 pitch values per second.
Pitch floor (Hz) (standard value: 75 Hz)
candidates below this frequency will not be recruited. This parameter determines the effective length of the analysis window: it will be 3 longest periods long, i.e., if the pitch floor is 75 Hz, the window will be effectively 3/75 = 0.04 seconds long.

Note that if you set the time step to zero, the analysis windows for consecutive measurements will overlap appreciably: Praat will always compute 4 pitch values within one window length, i.e., the degree of oversampling is 4.

Pitch ceiling (Hz) (standard value: 600 Hz)
candidates above this frequency will be ignored.
Max. number of candidates (standard value: 15)
each frame will contain at least this many pitch candidates. One of them is the “unvoiced candidate”; the others correspond to time lags over which the signal is more or less similar to itself.
Very accurate (standard value: off)
if off, the window is a Hanning window with a physical length of 3 / (pitch floor). If on, the window is a Gaussian window with a physical length of 6 / (pitch floor), i.e. twice the effective length.

A post-processing algorithm seeks the cheapest path through the candidates. The settings that determine the cheapest path are:

Silence threshold (standard value: 0.03)
frames that do not contain amplitudes above this threshold (relative to the global maximum amplitude), are probably silent.
Voicing threshold (standard value: 0.45)
the strength of the unvoiced candidate, relative to the maximum possible autocorrelation. If the amount of periodic energy in a frame is more than this fraction of the total energy (the remainder being noise), then Praat will prefer to regard this frame as voiced; otherwise as unvoiced. To increase the number of unvoiced decisions, increase the voicing threshold.
Octave cost (standard value: 0.01 per octave)
degree of favouring of high-frequency candidates, relative to the maximum possible autocorrelation. This is necessary because even (or: especially) in the case of a perfectly periodic signal, all undertones of F0 are equally strong candidates as F0 itself. To more strongly favour recruitment of high-frequency candidates, increase this value.
Octave-jump cost (standard value: 0.35)
degree of disfavouring of pitch changes, relative to the maximum possible autocorrelation. To decrease the number of large frequency jumps, increase this value. In contrast with what is described in the article, this value will be corrected for the time step: multiply by 0.01 s / TimeStep to get the value in the way it is used in the formulas in the article.
Voiced / unvoiced cost (standard value: 0.14)
degree of disfavouring of voiced/unvoiced transitions, relative to the maximum possible autocorrelation. To decrease the number of voiced/unvoiced transitions, increase this value. In contrast with what is described in the article, this value will be corrected for the time step: multiply by 0.01 s / TimeStep to get the value in the way it is used in the formulas in the article.

The standard settings are best in most cases. For some pathological voices, you will want to set the voicing threshold to much less than the standard of 0.45, in order to get pitch values even in irregular parts of the signal. For prevoiced plosives, you may want to lower the silence threshold from 0.03 to 0.01 or so.

Availability in Praat

Pitch analysis by raw autocorrelation is available in two ways in Praat:

• via Sound: To Pitch (raw autocorrelation)... from the Analyse periodicity menu in the Objects window when you select a Sound object;
• via Show Pitch and Pitch analysis method is raw autocorrelation from the Pitch menu when you are viewing a Sound or TextGrid object (SoundEditor, TextGridEditor).

Links to this page


© Paul Boersma 1996,2001–2003,2022-2024