blind source separation

Blind source separation (BSS) is a technique for estimating individual source components from their mixtures at multiple sensors. It is called blind because we don't use any other information besides the mixtures.

For example, imagine a room with a number of persons present and a number of microphones for recording. When one or more persons are speaking at the same time, each microphone registers a different mixture of individual speaker's audio signals. It is the task of BSS to untangle these mixtures into their sources, i.e. the individual speaker's audio signals. In general, this is a difficult problem because of several complicating factors.

• Different locations of speakers and microphones in the room: the individual speaker's audio signals do not reach all microphones at the same time.
• Room acoustics: the signal that reaches a microphone is composed of the signal that directly travels to the microphone and parts that come from room reverberations and echos.
• Varying distances to microphones: one or more speakers might be moving. This makes the mixing time dependent.

If the number of sensors is larger than the number of sources we speak of an overdetermined problem. If the number of sensors and the number of sources are equal we speak of a determined problem. The more difficult problem is the underdetermined one where the number of sensors is less than the number of sources.

Typology of mixtures

In general two different types of mixtures are considered in the literature: instantaneous mixtures and convolutive mixtures.

Instantaneous mixtures
where the mixing is instantaneous, corresponds to the model Y=A·X. In this model Y is a matrix with the recorded microphone sounds, A is a so-called mixing matrix and X is a matrix with the independent source signals. Essentially the model says that the signal that each microphone records is a (possibly different) linear combination of the same source signals. If we would know the mixing matrix A we could easily solve the model above for X by standard means. However, in general we don't know A and X and there are an infinite number of possible decompositions for Y. The problem is however solvable by making some (mild) assumptions about A and X.
Convolutive mixtures
are mixtures where the mixing is of convolutive nature, i.e. the model is
yi (n) = ΣjdΣτMij-1 hij(τ)xj(n-τ) + Ni(n), for i=1..m.
Here yi (n) is the n-th sample of the i-th microphone signal, m is the number of microphones, hij(τ) is the multi-input multi-output linear filter with the source-microphone impulse responses that characterize the propagation of the sound in the room and Ni is a noise source. This model is typically much harder to solve than the previous one because of the hij(τ) filter term that can have thousands of coefficients. For example, the typical reverberation time of a room is approximately 0.3 s which corresponds to 2400 samples, i.e. filter coefficients, for an 8 kHz sampled sound.

Solving the blind source separation for instantaneous mixtures

Various techniques exist for solving the blind source separation problem for instantaneous mixtures. Very popular ones make make use of second order statistics (SOS) by trying to simultaneously diagonalize a large number of cross-correlation matrices. Other techniques like independent component analysis use higher order statistics (HOS) to find the independent components, i.e. the sources.

Given the decomposition problem Y=A·X, we can see that the solution is determined only upto a permutation and a scaling of the components. This is called the indeterminancy problem of BSS. This can be seen as follows: given a permutation matrix P, i.e. a matrix which contains only zeros except for one 1 in every row and column, and a diagonal scaling matrix D, any scaling and permutation of the independent components Xn=(D·PX can be compensated by the reversed scaling of the mixing matrix An=A·(D·P)-1 because A·(D·P)-1·(D·PX = A·X = Y.

Solving the blind source separation for convolutive mixtures

Solutions for convolutive mixture problems are much harder to achieve. One normally starts by transforming the problem to the frequency domain where the convolution is turned into a multiplication. The problem then translates into a separate instantaneous mixing problem for each frequency in the frequency domain. It is here that the indeterminacy problem hits us because it is not clear beforehand how to combine the independent components of each frequency bin.

Links to this page


© djmw 20120907