Sounds: Cross-correlate...

A command available in the Combine menu when you select two Sound objects. This command cross-correlates two selected Sound objects with each other. As a result, a new Sound will appear in the list of objects; this new Sound is the cross-correlation of the two original Sounds.


Amplitude scaling
Here you can choose between the `principled' options integral, sum, and normalize, which are explained in 1, 2 and 3 below. There is also a `pragmatic' option, namely peak 0.99, which scales the resulting sound in such a way that its absolute peak becomes 0.99, so that the sound tends to be clearly audible without distortion when you play it (see Sound: Scale peak...).
Signal outside time domain is...
Here you can choose whether outside their time domains the sounds are considered to be zero (the standard value), or similar to the sounds within the time domains. This is explained in 4 below.

1. Cross-correlation as an integral

The cross-correlation of two continuous time signals f(t) and g(t) is a function of the lag time τ, and defined as the integral

cross-corr (f, g) (τ) ≡ ∫ f(t) g(t+τ) dt

If f and g are sampled signals (as Sounds are in Praat), with the same sampling period Δt, the definition is discretized as

cross-corr (f, g) [τ] ≡ ∑t f[t] g[t+τ] Δt

where τ and t+τ are the discrete times at which f and g are defined, respectively.

Cross-correlation is not a commutative operation, i.e. cross-corr (g, f) equals the time reversal of cross-corr (f, g). This means that the order in which you put the two Sounds in the object list does matter: the two results are each other's time reversals.

2. Cross-correlation as a sum

You can see in the formula above that if both input Sounds are expressed in units of Pa, the resulting Sound should ideally be expressed in Pa2s. Nevertheless, Praat will express it in Pa, because Sounds cannot be expressed otherwise.

This basically means that it is impossible to get the amplitude of the resulting Sound correct for all purposes. For this reason, Praat considers a different definition of cross-correlation as well, namely as the sum

cross-corr (f, g) [τ] ≡ ∑t f[t] g[t+τ]

The difference between the integral and sum definitions is that in the sum definition the resulting sound is divided by Δt.

3. Normalized cross-correlation

The normalized cross-correlation is defined as

norm-cross-corr (f, g) (τ) ≡ ∫ f(t) g(t+τ) dt / √ (∫ f2(t) dtg2(t) dt)

4. Shape scaling

The boundaries of the integral in 1 are -∞ and +∞. However, f and g are Sound objects in Praat and therefore have finite time domains. If f runs from t1 to t2 and is assumed to be zero before t1 and after t2, and g runs from t3 to t4 and is assumed to be zero outside that domain, then the cross-correlation will be zero before t3 - t2 and after t4 - t1, while between t3 - t2 and t4 - t1 it is

cross-corr (f, g) (τ) = ∫t1t2 f(t) g(t+τ) dt

In this formula, the argument of f runs from t1 to t2, but the argument of g runs from t1 + (t3 - t2) to t2 + (t4 - t1), i.e. from t3 - (t2 - t1) to t4 + (t2 - t1). This means that the integration is performed over two equal stretches of time during which g must be taken zero, namely a time stretch before t3 and a time stretch after t4, both of duration t2 - t1 (equivalent equations can be formulated that rely on two stretches of t4 - t3 of zeroes in f rather than in g, or on a stretch of t2 - t1 of zeroes in g and a stretch of t4 - t3 of zeroes in f.

If you consider the sounds outside their time domains as similar to what they are within their time domains, instead of zero, the discretized formula in 1 should be based on the average over the jointly defined values of f[τ] and g[t-τ], without counting any multiplications of values outside the time domains. Suppose that f is defined on the time domain [0, 1.2] with the value of 1 everywhere, and g is defined on the time domain [0, 3] with the value 1 everywhere. Their cross-correlation under the assumption that they are zero elsewhere is then

but under the assumption that the sounds are similar (i.e. 1) elsewhere, their cross-correlation should be

i.e. a constant value of 1.2. This is what you get by choosing the similar option; if f is shorter than g, the first and last parts of the cross-correlation will be divided by a straight line of duration t2 - t1 to compensate for the fact that the cross-correlation has been computed over fewer values of f and g there.

5. Behaviour

The start time of the resulting Sound will be the start time of f minus the end time of g, the end time of the resulting Sound will be the end time of f minus the start time of g, the time of the first sample of the resulting Sound will be the first sample of f minus the last sample of g, the time of the last sample of the resulting Sound will be the last sample of f minus the first sample of g, and the number of samples in the resulting Sound will be the sum of the numbers of samples of f and g minus 1.

6. Behaviour for stereo and other multi-channel sounds

You can cross-correlate e.g. a 10-channel sound either with another 10-channel sound or with a 1-channel (mono) sound.

If both Sounds have more than one channel, the two Sounds have to have the same number of channels; each channel of the resulting Sound is then computed as the cross-correlation of the corresponding channels of the original Sounds. For instance, if you cross-correlate two 10-channel sounds, the resulting sound will have 10 channels, and its 9th channel will be the cross-correlation of the 9th channels of the two original sounds.

If one of the original Sounds has multiple channels and the other Sound has only one channel, the resulting Sound will have multiple channels; each of these is computed as the cross-correlation of the corresponding channel of the multiple-channel original and the single channel of the single-channel original. For instance, if you cross-correlate a 10-channel sound with a mono sound, the resulting sound will have 10 channels, and its 9th channel will be the cross-correlation of the mono sound with the 9th channel of the original 10-channel sound.

The amplitude scaling factor will be the same for all channels, so that the relative amplitude of the channels will be preserved in the resulting sound. For the normalize scaling, for instance, the norm of f in the formula above is taken over all channels of f. For the peak 0.99 scaling, the resulting sound will typically have an absolute peak of 0.99 in only one channel, and lower absolute peaks in the other channels.

7. Algorithm

The computation makes use of the fact that cross-correlation in the time domain corresponds to multiplication of the time-reversal of f with g in the frequency domain: we first pad f with a stretch of t4 - t3 of zeroes and g with a stretch of t2 - t1 of zeroes (see 4 above), so that both sounds obtain a duration of (t2 - t1) + (t4 - t3); we then calculate the spectra of the two zero-padded sounds by Fourier transformation, then multiply the complex conjugate of the spectrum of f with the spectrum of g, and finally Fourier-transform the result of this multiplication back to the time domain; the result will again have a duration of (t2 - t1) + (t4 - t3).

Links to this page

© djmw & ppgb, April 4, 2010