The doctoral dissertations of the former Helsinki University of Technology (TKK) and Aalto University Schools of Technology (CHEM, ELEC, ENG, SCI) published in electronic format are available in the electronic publications archive of Aalto University - Aaltodoc.

Language Models for Automatic Speech Recognition: Construction and Complexity Control

Vesa Siivola

Dissertation for the degree of Doctor of Science in Technology to be presented with due permission of the Department of Computer Science and Engineering for public examination and debate in Auditorium T2 at Helsinki University of Technology (Espoo, Finland) on the 3rd of September, 2007, at 12 o'clock noon.

Overview in PDF format (ISBN 978-951-22-8894-6)   [740 KB]
Dissertation is also available in print (ISBN 978-951-22-8893-9)


The language model is one of the key components of a large vocabulary continuous speech recognition system. Huge text corpora can be used for training the language models. In this thesis, methods for extracting the essential information from the training data and expressing the information as a compact model are studied.

The thesis is divided in three main parts. In the first part, the issue of choosing the best base modeling unit for the prevalent language modeling method, n-gram language modeling, is examined. The experiments are focused on morpheme-like subword units, although syllables are also tried. Rule-based grammatical methods and unsupervised statistical methods for finding morphemes are compared with the baseline word model. The Finnish cross-entropy and speech recognition experiments show that significantly more efficient models can be created using automatically induced morpheme-like subword units as the basis of the language model.

In the second part, methods for choosing the n-grams that have explicit probability estimates in the n-gram model are studied. Two new methods specialized on selecting the n-grams for Kneser-Ney smoothed n-gram models are presented, one for pruning and one for growing the model. The methods are compared with entropy-based pruning and Kneser pruning. Experiments on Finnish and English text corpora show that the proposed pruning method gives considerable improvements over the previous pruning algorithms for Kneser-Ney smoothed models and also is better than entropy pruned Good-Turing smoothed model. Using the growing algorithm for creating a starting point for the pruning algorithm further improves the results. The improvements in Finnish speech recognition over the other Kneser-Ney smoothed models were significant as well.

To extract more information from the training corpus, words should not be treated as independent tokens. The syntactic and semantic similarities of the words should be taken into account in the language model. The last part of this thesis explores, how these similarities can be modeled by mapping the words into continuous space representations. A language model formulated in the state-space modeling framework is presented. Theoretically, the state-space language model has several desirable properties. The state dimension should determine, how much the model is forced to generalize. The need to learn long-term dependencies should be automatically balanced with the need to remember the short-term dependencies in detail. The experiments show that training a model that fulfills all the theoretical promises is hard: the training algorithm has high computational complexity and it mainly finds local minima. These problems still need further research.

This thesis consists of an overview and of the following 6 publications:

  1. Vesa Siivola, Teemu Hirsimäki, Mathias Creutz, and Mikko Kurimo. Unlimited vocabulary speech recognition based on morphs discovered in an unsupervised manner. In Proceedings of the 8th European Conference on Speech Communication and Technology (Eurospeech 2003), pages 2293-2296, Geneva, Switzerland, September 2003. © 2003 by authors.
  2. Teemu Hirsimäki, Mathias Creutz, Vesa Siivola, Mikko Kurimo, Sami Virpioja, and Janne Pylkkönen. Unlimited vocabulary speech recognition with morph language models applied to Finnish. Computer Speech and Language, volume 20 (4), pages 515-541, 2006.
  3. Vesa Siivola and Bryan L. Pellom. Growing an n-gram language model. In Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech 2005), pages 1309-1312, Lisbon, Portugal, September 2005. © 2005 by authors.
  4. Vesa Siivola, Teemu Hirsimäki, and Sami Virpioja. On growing and pruning Kneser–Ney smoothed N-Gram models. IEEE Transactions on Audio, Speech, and Language Processing, volume 15 (5), pages 1617-1624, 2007. © 2007 IEEE. By permission.
  5. Vesa Siivola. Language modeling based on neural clustering of words. Technical report IDIAP-COM 00-02, IDIAP, Martigny, Switzerland, 2000.
  6. Vesa Siivola and Antti Honkela. A state-space method for language modeling. In Proceedings of the 8th IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU 2003), pages 548-553, St. Thomas, U.S. Virgin Islands, November 2003. © 2003 IEEE. By permission.

Keywords: language model, speech recognition, subword unit, morpheme segmentation, variable order n-gram model, pruning, growing, state-space language model

This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.

© 2007 Helsinki University of Technology

Last update 2011-05-26