The doctoral dissertations of the former Helsinki University of Technology (TKK) and Aalto University Schools of Technology (CHEM, ELEC, ENG, SCI) published in electronic format are available in the electronic publications archive of Aalto University - Aaltodoc.
Aalto

Visual Category Detection: An Experimental Perspective

Ville Viitaniemi

Doctoral dissertation for the degree of Doctor of Science in Technology to be presented with due permission of the School of Science for public examination and debate in Auditorium TU2 at the Aalto University School of Science (Espoo, Finland) on the 9th of May 2012 at 12 noon.

Overview in PDF format (ISBN 978-952-60-4586-3)   [4274 KB]
Dissertation is also available in print (ISBN 978-952-60-4585-6)

Abstract

Nowadays huge volumes of digital visual data are constantly being produced and archived. Automatically distilling useful information from such information masses requires one to somehow answer the grand long-standing question of computer vision: how to make computers understand images?

In this thesis the visual content analysis problem is looked at as a category detection problem. In the category detection formulation, the general image content understanding task is partitioned into a number of small binary decision tasks. In each of the sub-tasks, one decides whether an image belongs to some pre-defined category. A category could be defined, for example, to consist of images taken indoors. By defining an appropriate set of categories, the visual content of an image can be described on a desired level of granularity by determining the image's membership in each one of the categories.

This thesis discusses a framework for visual category detection that consists of three major components: feature extraction, feature-wise detection and fusion of the detection results. The point of view in the discussion is empirical: the framework is validated by the good levels of performance systems implementing it have demonstrated in various benchmark tasks of visual analysis. A body of experiments is described that compare various technological alternatives for implementing the three major components of the framework. In addition to comparing implementation techniques, the experiments demonstrate that the discussed generic category detection architecture is very versatile: a set of diverse visual analysis problems can be addressed using the same visual category detection system as a backbone component by equipping the system with a task-specific front-end.

From the experiments and discussion in the thesis, one can conclude that the category detection formulation is a useful way of approaching the general image content understanding problem. In category detection, performances reaching the state-of-the-art can be realised using the presented fusion-based system architecture and implementation technologies of the system components.

This thesis consists of an overview and of the following 11 publications:

  1. Ville Viitaniemi and Jorma Laaksonen. Techniques for still image scene classification and object detection. In Proceedings of the International Conference on Artificial Neural Networks (ICANN 2006), Part II, pages 35-44, Athens, Greece, September 2006.
  2. Ville Viitaniemi and Jorma Laaksonen. Techniques for image classification, object detection and object segmentation. In Proceedings of the 10th International Conference on Visual Information Systems (VISUAL 2008), pages 231-234, Salerno, Italy, September 2008.
  3. Ville Viitaniemi and Jorma Laaksonen. Evaluating the performance in automatic image annotation: example case by adaptive fusion of global image features. Signal Processing: Image Communication, Volume 22, issue 6, pages 557-568, July 2007.
  4. Ville Viitaniemi and Jorma Laaksonen. Improving the accuracy of global feature fusion based image categorisation. In Proceedings of the 2nd International Conference on Semantic and Digital Media Technologies (SAMT 2007), pages 1-14, Genova, Italy, December 2007.
  5. Mats Sjöberg, Markus Koskela, Ville Viitaniemi and Jorma Laaksonen. Indoor location recognition using fusion of SVM-based visual classifiers. In Proceedings of the 2010 IEEE International Workshop on Machine Learning for Signal Processing, pages 343-348, Kittilä, Finland, August-September 2010.
  6. Ville Viitaniemi, Mats Sjöberg, Markus Koskela and Jorma Laaksonen. Concept-based video search with the PicSOM multimedia retrieval system. Technical report TKK-ICS-R39, Aalto University, December 2010.
  7. Ville Viitaniemi and Jorma Laaksonen. Experiments on selection of codebooks for local image feature histograms. In Proceedings of the 10th International Conference on Visual Information Systems (VISUAL 2008), pages 126-137, Salerno, Italy, September 2008.
  8. Ville Viitaniemi and Jorma Laaksonen. Combining local feature histograms of different granularities. In Proceedings of the 16th Scandinavian Conference on Image Analysis (SCIA 2009), pages 636-645, Oslo, Norway, June 2009.
  9. Ville Viitaniemi and Jorma Laaksonen. Spatial extensions to bag of visual words. In Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR 2009), Fira, Greece, July 2009.
  10. Ville Viitaniemi and Jorma Laaksonen. Region matching techniques for spatial bag of visual words based image category recognition. In Proceedings of the 20th International Conference on Artificial Neural Networks (ICANN 2010), Part I, pages 531-540, Thessaloniki, Greece, September 2010.
  11. Ville Viitaniemi and Jorma Laaksonen. Representing images with χ2 distance based histograms of SIFT descriptors. In Proceedings of the 19th International Conference on Artificial Neural Networks (ICANN 2009), Part II, pages 694-703, Limassol, Cyprus, September 2009.

Keywords: computer vision, image analysis, visual category, feature fusion, local image descriptor

This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.

© 2012 Aalto University


Last update 2012-10-31