The doctoral dissertations of the former Helsinki University of Technology (TKK) and Aalto University Schools of Technology (CHEM, ELEC, ENG, SCI) published in electronic format are available in the electronic publications archive of Aalto University - Aaltodoc.

Perceptual and Modeling Studies on Spatial Sound

Toni Hirvonen

Dissertation for the degree of Doctor of Science in Technology to be presented with due permission of the Department of Electrical and Communications Engineering for public examination and debate in Auditorium S4 at Helsinki University of Technology (Espoo, Finland) on the 7th of December, 2007, at 12 o'clock noon.

Overview in PDF format (ISBN 978-951-22-9051-2)   [647 KB]
Dissertation is also available in print (ISBN 978-951-22-9050-5)


Humans have the ability to perceive various spatial auditory attributes, such as the localization and width of sound sources. The study of spatial hearing is important not only in terms of basic perceptual research, but also because ever more sophisticated audio reproduction algorithms and systems are introduced to consumers. From such systems, listeners regularly perceive complicated spatial auditory scenes involving several simultaneous sounds from different directions. These scenes can be thought as being complex ones, as opposed to perceiving a single, point-like source in an anechoic environment.

The first part of this thesis investigates the perceptual issues related to such complex sound scenes via subjective listening tests. A single anechoic source results in localization cues that most listeners unambiguously interpret as indicating the actual direction of the sound. In the case of several interfering sound sources, the cues may vary greatly as a function of frequency. As illustrated by the results presented here, this is a common occurrence in modern multichannel reproduction systems. To gain further insight on this little-researched phenomenon, specific test cases where localization cues were manipulated as a function of frequency in the horizontal plane were investigated. The subjects reported the localization and width of the complex sounds, and these responses revealed several interesting phenomena. Most importantly, the listeners always perceived a horizontally wide sound source as being much narrower than it's physical width. Strong perceptual contrasts were also found to be significant.

Another focus of this thesis is auditory modeling. The stimuli used in the previous experiments were simulated utilizing established auditory modeling techniques. The simulation results were not found to correspond entirely with the psychoacoustical results in all cases, prompting additional weighting of different frequencies in the modeling. This thesis also introduces a novel, general auditory model concept inspired by recent psychoacoustical results that partly contradict the previous modeling approaches. The model's capacity to account for common spatial hearing phenomena was examined. The initial simulation results validate the proposed concept. Quantitative comparisons with psychoacoustical results, including the data obtained from the listening tests performed in this thesis, are planned to be done in the future.

This thesis consists of an overview and of the following 7 publications:

  1. Pulkki, V. and Hirvonen, T., Localization of Virtual Sources in Multichannel Audio Reproduction, IEEE Transactions on Speech and Audio Processing, Vol. 13, No. 1, Jan. 2005, pp. 105-119. © 2005 IEEE. By permission.
  2. Hirvonen, T. and Pulkki, V., Center and Spatial Extent of Auditory Events as Caused by Multiple Sound Sources in Frequency-Dependent Directions, Acta Acustica united with Acustica, Vol. 92, No. 2, Jan. 2006, pp. 320-330. © 2006 S. Hirzel Verlag. By permission.
  3. Hirvonen, T., Segregation of Two Simultaneously Arriving Narrowband Noise Signals as a Function of Spatial and Frequency Separation, in Proceedings of the 8th International Conference on Digital Audio Effects (DAFx'05), Madrid, Spain, September 20-22, 2005. © 2005 by author.
  4. Hirvonen, T. and Pulkki, V., Perception and Analysis of Selected Auditory Events with Frequency-Dependent Directions, Journal of the Audio Engineering Society, Vol. 54, No. 9, Sep. 2006, pp. 803-814. © 2006 Audio Engineering Society. By permission.
  5. Hirvonen, T. and Pulkki, V., Interaural Coherence Estimation with Instantaneous ILD, in Proceedings of the 7th Nordic Signal Processing Symposium (NORSIG 2006), Reykjavik, Iceland, June 7-9, 2006, pp. 122-125. © 2006 IEEE. By permission.
  6. Hirvonen, T. and Pulkki, V., Predicting Binaural Masking Level Difference and Dichotic Pitch Using Instantaneous ILD Model, in Proceedings of the AES 30th International Conference on Intelligent Audio Environments, Saariselkä, Finland, March 15-17, 2007. © 2007 Audio Engineering Society. By permission.
  7. Pulkki, V. and Hirvonen, T., Computational Count-Comparison Models for ITD and ILD decoding, in Proceedings of the 19th International Congress on Acoustics (ICA 2007), Madrid, Spain, September 2-7, 2007. © 2007 by authors.

Errata of publication 5

Keywords: spatial hearing, auditory modeling, virtual sources

This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.

© 2007 Helsinki University of Technology

Last update 2011-05-26