The doctoral dissertations of the former Helsinki University of Technology (TKK) and Aalto University Schools of Technology (CHEM, ELEC, ENG, SCI) published in electronic format are available in the electronic publications archive of Aalto University - Aaltodoc.

Studies on Binaural and Monaural Signal Analysis – Methods and Applications

Sampo Vesa

Dissertation for the degree of Doctor of Science in Technology to be presented with due permission of the Faculty of Information and Natural Sciences for public examination and debate in Auditorium T1 at Helsinki University of Technology (Espoo, Finland) on the 4th of December, 2009, at 12 noon.

Overview in PDF format (ISBN 978-952-248-239-6)   [751 KB]
Dissertation is also available in print (ISBN 978-952-248-238-9)


Sound signals can contain a lot of information about the environment and the sound sources present in it. This thesis presents novel contributions to the analysis of binaural and monaural sound signals. Some new applications are introduced in this work, but the emphasis is on analysis methods. The three main topics of the thesis are computational estimation of sound source distance, analysis of binaural room impulse responses, and applications intended for augmented reality audio.

A novel method for binaural sound source distance estimation is proposed. The method is based on learning the coherence between the sounds entering the left and right ears. Comparisons to an earlier approach are also made. It is shown that these kinds of learning methods can correctly recognize the distance of a speech sound source in most cases.

Methods for analyzing binaural room impulse responses are investigated. These methods are able to locate the early reflections in time and also to estimate their directions of arrival. This challenging problem could not be tackled completely, but this part of the work is an important step towards accurate estimation of the individual early reflections from a binaural room impulse response.

As the third part of the thesis, applications of sound signal analysis are studied. The most notable contributions are a novel eyes-free user interface controlled by finger snaps, and an investigation on the importance of features in audio surveillance.

The results of this thesis are steps towards building machines that can obtain information on the surrounding environment based on sound. In particular, the research into sound source distance estimation functions as important basic research in this area. The applications presented could be valuable in future telecommunications scenarios, such as augmented reality audio.

This thesis consists of an overview and of the following 7 publications:

  1. Sampo Vesa. 2007. Sound source distance learning based on binaural signals. In: Proceedings of the 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2007). New Paltz, NY, USA. 21-24 October 2007, pages 271-274. © 2007 IEEE. By permission.
  2. Sampo Vesa. 2009. Binaural sound source distance learning in rooms. IEEE Transactions on Audio, Speech, and Language Processing, volume 17, number 8, pages 1498-1507. © 2009 IEEE. By permission.
  3. Sampo Vesa and Tapio Lokki. 2006. Detection of room reflections from a binaural room impulse response. In: Proceedings of the 9th International Conference on Digital Audio Effects (DAFx 2006). Montreal, Canada. 18-20 September 2006, pages 215-220. © 2006 by authors.
  4. Sampo Vesa and Tapio Lokki. 2009. Segmentation and analysis of early reflections from a binaural room impulse response. Espoo, Finland: Helsinki University of Technology, Department of Media Technology. 10 pages. TKK Reports in Media Technology, Technical Report TKK-ME-R-1. © 2009 by authors.
  5. Sampo Vesa and Aki Härmä. 2005. Automatic estimation of reverberation time from binaural signals. In: Proceedings of the 30th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005). Philadelphia, PA, USA. 18-23 March 2005, volume 3, pages 281-284. © 2005 IEEE. By permission.
  6. Sampo Vesa and Tapio Lokki. 2005. An eyes-free user interface controlled by finger snaps. In: Proceedings of the 8th International Conference on Digital Audio Effects (DAFx 2005). Madrid, Spain. 20-22 September 2005, pages 262-265. © 2005 by authors.
  7. Sampo Vesa. 2007. The effect of features on clustering in audio surveillance. In: Proceedings of the AES 30th International Conference on Intelligent Audio Environments. Saariselkä, Finland. 15-17 March 2007. 10 pages. © 2007 Audio Engineering Society (AES). By permission.

Errata of publications 1, 3, 5, 6 and 7

Keywords: audio signal analysis, audio signal processing, augmented reality audio, binaural signals, sound source distance, room impulse responses, reverberation time, eyes-free user interfaces, audio surveillance

This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.

© 2009 Helsinki University of Technology

Last update 2011-05-26