The doctoral dissertations of the former Helsinki University of Technology (TKK) and Aalto University Schools of Technology (CHEM, ELEC, ENG, SCI) published in electronic format are available in the electronic publications archive of Aalto University - Aaltodoc.
Aalto

Dimensionality Reduction for Visual Exploration of Similarity Structures

Jarkko Venna

Dissertation for the degree of Doctor of Science in Technology to be presented with due permission of the Department of Computer Science and Engineering for public examination and debate in Auditorium T2 at Helsinki University of Technology (Espoo, Finland) on the 8th of June, 2007, at 12 o'clock noon.

Overview in PDF format (ISBN 978-951-22-8752-9)   [5813 KB]
Dissertation is also available in print (ISBN 978-951-22-8751-2)

Abstract

Visualizations of similarity relationships between data points are commonly used in exploratory data analysis to gain insight on new data sets. Answers are searched for questions like: Does the data consist of separate groups of points? What is the relationship of the previously known interesting data points to other data points? Which points are similar to the points known to be of interest? Visualizations can be used both to amplify the cognition of the analyst and to help in communicating interesting similarity structures found in the data to other people.

One of the main problems faced in information visualization is that while the data is typically very high-dimensional, the display is limited to only two or at most three dimensions. Thus, for visualization, the dimensionality of the data has to be reduced. In general, it is not possible to preserve all pairwise relationships between data points in the dimensionality reduction process. This has lead to the development of a large number of dimensionality reduction methods that focus on preserving different aspects of the data. Most of these methods were not developed to be visualization methods, which makes it hard to assess their suitability for the task of visualizing similarity structures. This problem is made more severe by the lack of suitable quality measures in the information visualization field.

In this thesis a new visualization task, visual neighbor retrieval, is introduced. It formulates information visualization as an information retrieval task. To assess the performance of dimensionality reduction methods in this task two pairs of new quality measures are introduced and the performance of several dimensionality reduction methods are analyzed. Based on the insight gained on the existing methods, three new dimensionality reduction methods (NeRV, fNeRV and LocalMDS) aimed for the visual neighbor retrieval task, are introduced. All three new methods outperform other methods in numerical experiments; they vary in their speed and accuracy.

A new color coding scheme, similarity-based color coding, is introduced in this thesis for visualization of similarity structures, and the applicability of the new methods in the task of creating graph layouts is studied. Finally, new approaches to visually studying the results and convergence of Markov Chain Monte Carlo methods are introduced.

This thesis consists of an overview and of the following 9 publications:

  1. Samuel Kaski, Jarkko Venna, and Teuvo Kohonen. Coloring that reveals cluster structures in multivariate data. Australian Journal of Intelligent Information Processing Systems, 6: 82-88, 2000. © 2000 The University of Western Australia, Centre for Intelligent Information Processing Systems (CIIPS). By permission.
  2. Jarkko Venna and Samuel Kaski. Neighborhood preservation in nonlinear projection methods: An experimental study. In Georg Dorffner, Horst Bischof, and Kurt Hornik, editors, Proceedings of the 11th International Conference on Artificial Neural Networks (ICANN 2001), Vienna, Austria, August 21-25, pp. 485-491, Springer, Berlin, 2001. © 2001 by authors and © 2001 Springer Science+Business Media. By permission.
  3. Samuel Kaski, Janne Nikkilä, Merja Oja, Jarkko Venna, Petri Törönen, and Eero Castrén. Trustworthiness and metrics in visualizing similarity of gene expression. BMC Bioinformatics, 4: 48, 2003. © 2003 by authors.
  4. Jarkko Venna and Samuel Kaski. Comparison of visualization methods for an atlas of gene expression data sets. Information Visualization, to appear. © 2007 by authors and © 2007 Palgrave Macmillan. By permission.
  5. Jarkko Venna and Samuel Kaski. Local multidimensional scaling. Neural Networks, 19: 889-899, 2006.
  6. Jarkko Venna and Samuel Kaski. Visualizing gene interaction graphs with local multidimensional scaling. In Michel Verleysen, editor, Proceedings of the 14th European Symposium on Artificial Neural Networks (ESANN 2006), Bruges, Belgium, April 26-28, pp. 557-562, d-side, Evere, Belgium, 2006. © 2006 d-side publications. By permission.
  7. Jarkko Venna and Samuel Kaski. Nonlinear dimensionality reduction as information retrieval. In Marina Meila and Xiaotong Shen, editors, Proceedings of the 11th International Conference on Artificial Intelligence and Statistics (AISTATS 2007), San Juan, Puerto Rico, March 21-24, pp. 568-575, 2007. © 2007 by authors.
  8. Jarkko Venna and Samuel Kaski. Visualizing high-dimensional posterior distributions in Bayesian modeling. In O. Kaynak, E. Alpaydin, E. Oja, and L. Xu, editors, Supplementary Proceedings of the Joint International Conference on Artificial Neural Networks and Neural Information Processing (ICANN/ICONIP 2003), Istanbul, Turkey, June 26-29, pp. 165-168, 2003.
  9. Jarkko Venna, Samuel Kaski, and Jaakko Peltonen. Visualizations for assessing convergence and mixing of MCMC. In N. Lavrac, D. Gamberger, H. Blockeel, and L. Todorovski, editors, Proceedings of the 14th European Conference on Machine Learning (ECML 2003), Cavtat - Dubrovnik, Croatia, September 22-26, pp. 432-443, Springer, Berlin, 2003. © 2003 by authors and © 2003 Springer Science+Business Media. By permission.

Keywords: dimensionality reduction, exploratory data analysis, information retrieval, information visualization, manifold learning, Markov Chain Monte Carlo

This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.

© 2007 Helsinki University of Technology


Last update 2011-05-26