
A command available in the Combine menu when you select two Sound objects. This command crosscorrelates two selected Sound objects with each other. As a result, a new Sound will appear in the list of objects; this new Sound is the crosscorrelation of the two original Sounds.
The crosscorrelation of two continuous time signals f(t) and g(t) is a function of the lag time τ, and defined as the integral
crosscorr (f, g) (τ) ≡ ∫ f(t) g(t+τ) dt 
If f and g are sampled signals (as Sounds are in Praat), with the same sampling period Δt, the definition is discretized as
crosscorr (f, g) [τ] ≡ ∑_{t} f[t] g[t+τ] Δt 
where τ and t+τ are the discrete times at which f and g are defined, respectively.
Crosscorrelation is not a commutative operation, i.e. crosscorr (g, f) equals the time reversal of crosscorr (f, g). This means that the order in which you put the two Sounds in the object list does matter: the two results are each other's time reversals.
You can see in the formula above that if both input Sounds are expressed in units of Pa, the resulting Sound should ideally be expressed in Pa^{2}s. Nevertheless, Praat will express it in Pa, because Sounds cannot be expressed otherwise.
This basically means that it is impossible to get the amplitude of the resulting Sound correct for all purposes. For this reason, Praat considers a different definition of crosscorrelation as well, namely as the sum
crosscorr (f, g) [τ] ≡ ∑_{t} f[t] g[t+τ] 
The difference between the integral and sum definitions is that in the sum definition the resulting sound is divided by Δt.
The normalized crosscorrelation is defined as
normcrosscorr (f, g) (τ) ≡ ∫ f(t) g(t+τ) dt / √ (∫ f^{2}(t) dt ∫ g^{2}(t) dt) 
The boundaries of the integral in 1 are ∞ and +∞. However, f and g are Sound objects in Praat and therefore have finite time domains. If f runs from t_{1} to t_{2} and is assumed to be zero before t_{1} and after t_{2}, and g runs from t_{3} to t_{4} and is assumed to be zero outside that domain, then the crosscorrelation will be zero before t_{3}  t_{2} and after t_{4}  t_{1}, while between t_{3}  t_{2} and t_{4}  t_{1} it is
crosscorr (f, g) (τ) = ∫_{t1}^{t2} f(t) g(t+τ) dt 
In this formula, the argument of f runs from t_{1} to t_{2}, but the argument of g runs from t_{1} + (t_{3}  t_{2}) to t_{2} + (t_{4}  t_{1}), i.e. from t_{3}  (t_{2}  t_{1}) to t_{4} + (t_{2}  t_{1}). This means that the integration is performed over two equal stretches of time during which g must be taken zero, namely a time stretch before t_{3} and a time stretch after t_{4}, both of duration t_{2}  t_{1} (equivalent equations can be formulated that rely on two stretches of t_{4}  t_{3} of zeroes in f rather than in g, or on a stretch of t_{2}  t_{1} of zeroes in g and a stretch of t_{4}  t_{3} of zeroes in f.
If you consider the sounds outside their time domains as similar to what they are within their time domains, instead of zero, the discretized formula in 1 should be based on the average over the jointly defined values of f[τ] and g[tτ], without counting any multiplications of values outside the time domains. Suppose that f is defined on the time domain [0, 1.2] with the value of 1 everywhere, and g is defined on the time domain [0, 3] with the value 1 everywhere. Their crosscorrelation under the assumption that they are zero elsewhere is then
but under the assumption that the sounds are similar (i.e. 1) elsewhere, their crosscorrelation should be
i.e. a constant value of 1.2. This is what you get by choosing the similar option; if f is shorter than g, the first and last parts of the crosscorrelation will be divided by a straight line of duration t_{2}  t_{1} to compensate for the fact that the crosscorrelation has been computed over fewer values of f and g there.
The start time of the resulting Sound will be the start time of f minus the end time of g, the end time of the resulting Sound will be the end time of f minus the start time of g, the time of the first sample of the resulting Sound will be the first sample of f minus the last sample of g, the time of the last sample of the resulting Sound will be the last sample of f minus the first sample of g, and the number of samples in the resulting Sound will be the sum of the numbers of samples of f and g minus 1.
You can crosscorrelate e.g. a 10channel sound either with another 10channel sound or with a 1channel (mono) sound.
If both Sounds have more than one channel, the two Sounds have to have the same number of channels; each channel of the resulting Sound is then computed as the crosscorrelation of the corresponding channels of the original Sounds. For instance, if you crosscorrelate two 10channel sounds, the resulting sound will have 10 channels, and its 9th channel will be the crosscorrelation of the 9th channels of the two original sounds.
If one of the original Sounds has multiple channels and the other Sound has only one channel, the resulting Sound will have multiple channels; each of these is computed as the crosscorrelation of the corresponding channel of the multiplechannel original and the single channel of the singlechannel original. For instance, if you crosscorrelate a 10channel sound with a mono sound, the resulting sound will have 10 channels, and its 9th channel will be the crosscorrelation of the mono sound with the 9th channel of the original 10channel sound.
The amplitude scaling factor will be the same for all channels, so that the relative amplitude of the channels will be preserved in the resulting sound. For the normalize scaling, for instance, the norm of f in the formula above is taken over all channels of f. For the peak 0.99 scaling, the resulting sound will typically have an absolute peak of 0.99 in only one channel, and lower absolute peaks in the other channels.
The computation makes use of the fact that crosscorrelation in the time domain corresponds to multiplication of the timereversal of f with g in the frequency domain: we first pad f with a stretch of t_{4}  t_{3} of zeroes and g with a stretch of t_{2}  t_{1} of zeroes (see 4 above), so that both sounds obtain a duration of (t_{2}  t_{1}) + (t_{4}  t_{3}); we then calculate the spectra of the two zeropadded sounds by Fourier transformation, then multiply the complex conjugate of the spectrum of f with the spectrum of g, and finally Fouriertransform the result of this multiplication back to the time domain; the result will again have a duration of (t_{2}  t_{1}) + (t_{4}  t_{3}).
© djmw & ppgb, April 4, 2010