Hearing is the most important organ of communication for humans. But the efficiency of this organ goes far beyond the mere transmission of linguistic information. Our sense of hearing helps us to orientate ourselves and we can filter out the relevant signal components from a multitude of sounds. A transfer of these abilities (localisation and separation) to technical systems could be applied in many areas of our lives. Especially the separation of the signals causes big problems. Using time-frequency representations, this project aims to develop methods for signal separation.
Example for signal separation
Here you can find a small example for the separation of acoustic signals. The data set was made available at SiSEC 2010. The reverberation time was 250 ms.
Mixed signals:
Signal at left microphone |
Reconstructed Signals:
Speaker 1 | Speaker 2 | Speaker 3 |
Licensing issue: These files are made available under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 2.0 license. The authors are Another Dreamer and Alex Q for music source signals and Hiroshi Sawada, Shoko Araki and Emmanuel Vincent for mixture signals.
Motivation and problem definition
The human ear perceives a multitude of different sounds every second. The proportion of interfering signals is particularly high in noisy environments and in crowds of people. Nevertheless, we can disentangle this mix of different sounds and concentrate on our conversation partner. We can also determine the origin of the sounds.
The efficient transfer of these human capabilities (localisation and separation of acoustic signals) to technical systems could be applied in many areas.
The difficulty of a technical realisation lies in the indeterminacy of the system. One knows only the sensor signals that can be recorded with a certain number of microphones. From this information the transmission matrix must be determined and the original signals reconstructed.
problem-solving concepts
For about 10 years, intensive research has been conducted in this field. There are already various methods for the separation of unknown signals (blind source separation). However, the application of these methods is subject to various restrictions. Especially for real environments (reflections from objects, etc.) and an unknown number of signals there are almost no reasonable methods available.
The best results are obtained using time-frequency plots. Through this transformation, the convolution of the source signal with the specific room impulse response goes into a multiplication in the frequency domain. However, due to the instationality of the speech signals, only limited time periods can be considered. Especially the short-time Fourier transform is used very often. The disadvantage of this transformation is a rigid size of the frequency and time resolution. This problem can be solved by using other time-frequency representations, especially the Analytical Wavelet Packets. In addition, the influence of reflections can be reduced by clever preprocessing of the signals.
Project goals and tasks
The project aims to improve signal separation methods and develop new algorithms. The improvement of separation results under real environmental conditions (reflections, unknown number of sound sources) is of special interest. In addition to a separation of the signals, a localization of the individual sources shall be enabled. The individual research topics within the project also result from these requirements.
- Separation of the signals
- Echo suppression
- Localization of sound sources
- Object tracking