Sound: To Pitch (raw ac)...

A command that creates a Pitch object from every selected Sound object.

Purpose

to perform a pitch analysis based on an autocorrelation method.

Usage

Normally, you will instead use Sound: To Pitch..., which uses the same method. The command described here is mainly for experimenting with the parameters, or for the analysis of non-speech signals, which may require different standard settings of the parameters.

Algorithm

The algorithm performs an acoustic periodicity detection on the basis of an accurate autocorrelation method, as described in Boersma (1993). This method is more accurate, noise-resistant, and robust, than methods based on cepstrum or combs, or the original autocorrelation methods. The reason why other methods were invented, was the failure to recognize the fact that if you want to estimate a signal's short-term autocorrelation function on the basis of a windowed signal, you should divide the autocorrelation function of the windowed signal by the autocorrelation function of the window:

rx (τ) ≈ rxw (τ) / rw (τ)

Settings

The settings that control the recruitment of the candidates are:

Time step (s) (standard value: 0.0)
the measurement interval (frame duration), in seconds. If you supply 0, Praat will use a time step of 0.75 / (pitch floor), e.g. 0.01 seconds if the pitch floor is 75 Hz; in this example, Praat computes 100 pitch values per second.
Pitch floor (Hz) (standard value: 75 Hz)
candidates below this frequency will not be recruited. This parameter determines the effective length of the analysis window: it will be 3 longest periods long, i.e., if the pitch floor is 75 Hz, the window will be effectively 3/75 = 0.04 seconds long.

Note that if you set the time step to zero, the analysis windows for consecutive measurements will overlap appreciably: Praat will always compute 4 pitch values within one window length, i.e., the degree of oversampling is 4.

Pitch ceiling (Hz) (standard value: 600 Hz)
candidates above this frequency will be ignored.
Max. number of candidates (standard value: 15)
each frame will contain at least this many pitch candidates. One of them is the “unvoiced candidate”; the others correspond to time lags over which the signal is more or less similar to itself.
Very accurate (standard value: off)
if off, the window is a Hanning window with a physical length of 3 / (pitch floor). If on, the window is a Gaussian window with a physical length of 6 / (pitch floor), i.e. twice the effective length.

A post-processing algorithm seeks the cheapest path through the candidates. The settings that determine the cheapest path are:

Silence threshold (standard value: 0.03)
frames that do not contain amplitudes above this threshold (relative to the global maximum amplitude), are probably silent.
Voicing threshold (standard value: 0.45)
the strength of the unvoiced candidate, relative to the maximum possible autocorrelation. If the amount of periodic energy in a frame is more than this fraction of the total energy (the remainder being noise), then Praat will prefer to regard this frame as voiced; otherwise as unvoiced. To increase the number of unvoiced decisions, increase the voicing threshold.
Octave cost (standard value: 0.01 per octave)
degree of favouring of high-frequency candidates, relative to the maximum possible autocorrelation. This is necessary because even (or: especially) in the case of a perfectly periodic signal, all undertones of F0 are equally strong candidates as F0 itself. To more strongly favour recruitment of high-frequency candidates, increase this value.
Octave-jump cost (standard value: 0.35)
degree of disfavouring of pitch changes, relative to the maximum possible autocorrelation. To decrease the number of large frequency jumps, increase this value. In contrast with what is described in the article, this value will be corrected for the time step: multiply by 0.01 s / TimeStep to get the value in the way it is used in the formulas in the article.
Voiced / unvoiced cost (standard value: 0.14)
degree of disfavouring of voiced/unvoiced transitions, relative to the maximum possible autocorrelation. To decrease the number of voiced/unvoiced transitions, increase this value. In contrast with what is described in the article, this value will be corrected for the time step: multiply by 0.01 s / TimeStep to get the value in the way it is used in the formulas in the article.

Links to this page


© Paul Boersma 1996,2001–2003,2022,2023