The doctoral dissertations of the former Helsinki University of Technology (TKK) and Aalto University Schools of Technology (CHEM, ELEC, ENG, SCI) published in electronic format are available in the electronic publications archive of Aalto University - Aaltodoc.

Studies on Auditory Processing of Spatial Sound and Speech by Neuromagnetic Measurements and Computational Modeling

Kalle Palomäki

Dissertation for the degree of Doctor of Science in Technology to be presented with due permission of the Department of Electrical and Communications Engineering for public examination and debate in Auditorium S4 at Helsinki University of Technology (Espoo, Finland) on the 17th of June, 2005, at 12 o'clock noon.

Overview in PDF format (ISBN 951-22-7717-4)   [1693 KB]
Dissertation is also available in print (ISBN 951-22-7716-6)


This thesis addresses the auditory processing of spatial sound and speech. The thesis consists of two research branches: one, magnetoencephalographic (MEG) brain measurements on spatial localization and speech perception, and two, construction of computational auditory scene analysis models, which exploit spatial cues and other cues that are robust in reverberant environments. In the MEG research branch, we have addressed the processing of the spatial stimuli in the auditory cortex through studies concentrating to the following issues: processing of sound source location with realistic spatial stimuli, spatial processing of speech vs. non-speech stimuli, and finally processing of range of spatial location cues in the auditory cortex. Our main findings are as follows: Both auditory cortices respond more vigorously to contralaterally presented sound, whereby responses exhibit systematic tuning to the sound source direction. Responses and response dynamics are generally larger in the right hemisphere, which indicates right hemispheric specialization in the spatial processing. These observations hold over the range of speech and non-speech stimuli. The responses to speech sounds are decreased markedly if the natural periodic speech excitation is changed to random noise sequence. Moreover, the activation strength of the right auditory cortex seems to reflect processing of spatial cues, so that the dynamical differences are larger and the angular organization is more orderly for realistic spatial stimuli compared to impoverished spatial stimuli (e.g. isolated interaural time and level difference cues).

In the auditory modeling part, we constructed models for the recognition of speech in the presence of interference. Firstly, we constructed a system using binaural cues in order to segregate target speech from spatially separated interference, and showed that the system outperforms a conventional approach at low signal-to-noise ratios. Secondly, we constructed a single channel system that is robust in room reverberation using strong speech modulations as robust cues, and showed that it outperforms a baseline approach in the most reverberant test conditions. In this case, the baseline approach was specifically optimized for recognition of speech in reverberation. In summary, this thesis addresses the auditory processing of spatial sound and speech in both brain measurement and auditory modeling. The studies aim to clarify cortical processes of sound localization, and to construct computational auditory models for sound segregation exploiting spatial cues, and strong speech modulations as robust cues in reverberation.

This thesis consists of an overview and of the following 6 publications:

  1. Palomäki K., Alku P., Mäkinen V., May P. and Tiitinen H., 2000. Sound localization in the human brain: neuromagnetic observations. NeuroReport 11 (7), pages 1535-1538.
  2. Palomäki K. J., Tiitinen H., Mäkinen V., May P. and Alku P., 2002. Cortical processing of speech sounds and their analogues in a spatial auditory environment. Cognitive Brain Research 14 (2), pages 294-299. © 2002 Elsevier Science. By permission.
  3. Alku P., Sivonen P., Palomäki K. J. and Tiitinen H., 2001. The periodic structure of vowel sounds is reflected in human electromagnetic brain responses. Neuroscience Letters 298 (1), pages 25-28. © 2001 Elsevier Science. By permission.
  4. Palomäki K. J., Tiitinen H., Mäkinen V., May P. and Alku P., 2005. Spatial processing in human auditory cortex: the effects of 3D, ITD, and ILD stimulation techniques. Cognitive Brain Research, accepted for publication. © 2005 Elsevier Science. By permission.
  5. Palomäki K. J., Brown G. J. and Wang D. L., 2004. A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation. Speech Communication 43 (4), pages 361-378. © 2004 Elsevier Science. By permission.
  6. Palomäki K. J., Brown G. J. and Barker J., 2004. Techniques for handling convolutional distortion with 'missing data' automatic speech recognition. Speech Communication 43 (1-2), pages 123-142. © 2004 Elsevier Science. By permission.

Errata of publications 1, 2 and 6

Keywords: spatial localization, auditory cortex, MEG, N1m, binaural models, CASA, missing data speech recognition

This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.

© 2005 Helsinki University of Technology

Last update 2011-05-26