The doctoral dissertations of the former Helsinki University of Technology (TKK) and Aalto University Schools of Technology (CHEM, ELEC, ENG, SCI) published in electronic format are available in the electronic publications archive of Aalto University - Aaltodoc.

Objects Extraction and Recognition for Camera-Based Interaction: Heuristic and Statistical Approaches

Hao Wang

Dissertation for the degree of Doctor of Science in Technology to be presented with due permission of the Department of Electrical and Communications Engineering for public examination and debate in Auditorium D at Helsinki University of Technology (Espoo, Finland) on the 14th of December, 2007, at 12 noon.

Overview in PDF format (ISBN 978-951-22-9134-2)   [846 KB]
Dissertation is also available in print (ISBN 978-951-22-9137-3)


In this thesis, heuristic and probabilistic methods are applied to a number of problems for camera-based interactions. The goal is to provide solutions for a vision based system that is able to extract and analyze interested objects in camera images and to use that information for various interactions for mobile usage. New methods and new attempts of combination of existing methods are developed for different applications, including text extraction from complex scene images, bar code reading performed by camera phones, and face/facial feature detection and facial expression manipulation.

The application-driven problems of camera-based interaction can not be modeled by a uniform and straightforward model that has very strong simplifications of reality. The solutions we learned to be efficient were to apply heuristic but easy of implementation approaches at first to reduce the complexity of the problems and search for possible means, then use developed statistical learning approaches to deal with the remaining difficult but well-defined problems and get much better accuracy. The process can be evolved in some or all of the stages, and the combination of the approaches is problem-dependent.

Contribution of this thesis resides in two aspects: firstly, new features and approaches are proposed either as heuristics or statistical means for concrete applications; secondly engineering design combining seveal methods for system optimization is studied. Geometrical characteristics and the alignment of text, texture features of bar codes, and structures of faces can all be extracted as heuristics for object extraction and further recognition. The boosting algorithm is one of the proper choices to perform probabilistic learning and to achieve desired accuracy. New feature selection techniques are proposed for constructing the weak learner and applying the boosting output in concrete applications. Subspace methods such as manifold learning algorithms are introduced and tailored for facial expression analysis and synthesis. A modified generalized learning vector quantization method is proposed to deal with the blurring of bar code images. Efficient implementations that combine the approaches in a rational joint point are presented and the results are illustrated.

This thesis consists of an overview and of the following 7 publications:

  1. Hao Wang, Jari Kangas, Text location in color scene images for information acquisition by mobile terminals, Proceedings of the 5th World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2001), Vol. 6, pp. 436-441, Orlando, Florida, USA, 2001, IIIS. © 2001 International Institute of Informatics and Systemics (IIIS). By permission.
  2. Hao Wang, Jari Kangas, Character-like region verification for extracting text in scene images, Proceedings of the 6th International Conference on Document Analysis and Recognition (ICDAR 2001), pp. 957-962, Seattle, WA, USA, 2001, IEEE.
  3. Kongqiao Wang, Yanming Zou, Hao Wang, 1D bar code reading on camera phones, International Journal of Image and Graphics, vol. 7, no. 3, pp. 529-550, 2007, World Scientific Publishing, ISSN 0219-4678. © 2007 World Scientific Publishing Company. By permission.
  4. Hao Wang, Yanming Zou, 2D bar codes reading: solutions for camera phones, International Journal of Signal Processing, Vol. 3, No. 3, pp. 164-170, 2006, World Academy of Science, Engineering and Technology, ISSN 1304-4478. © 2006 World Academy of Science, Engineering and Technology (WASET). By permission.
  5. Hao Wang, Kongqiao Wang, Facial feature extraction and image-based face drawing, Proceedings of the 6th International Conference on Signal Processing (ICSP 2002), Vol. 1, pp. 699-702, Beijing, China, 2002, IEEE.
  6. Hao Wang, Image-based face drawing using active shape models and parametric morphing, Proceedings of the 2003 IEEE International Conference on Neural Networks and Signal Processing (ICNNSP 2003), Vol. 2, pp. 1017-1020, Nanjing, China, 2003, IEEE.
  7. Hao Wang, Kongqiao Wang, Affective interaction based on person-independent facial expression space, Neurocomputing, Special Issue for Vision Research, Vol. 71, No. 10-12, pp. 1889-1901, 2008, Elsevier, ISSN 0925-2312. © 2008 by authors and © 2008 Elsevier Science. By permission.

Keywords: camera-based interaction, text extraction, bar code, facial expression, boosting, manifold learning

This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.

© 2007 Helsinki University of Technology

Last update 2011-05-26