The doctoral dissertations of the former Helsinki University of Technology (TKK) and Aalto University Schools of Technology (CHEM, ELEC, ENG, SCI) published in electronic format are available in the electronic publications archive of Aalto University - Aaltodoc.
|
|
|
Data Exploration with Learning Metrics
Jaakko Peltonen
Dissertation for the degree of Doctor of Science in Technology to be presented with
due permission of the Department of Computer Science and Engineering for public
examination and debate in Auditorium T2 at Helsinki University of Technology
(Espoo, Finland) on the 17th of November, 2004, at 2 o'clock p.m.
Overview in PDF format (ISBN 951-22-7345-4) [4109 KB]
Dissertation is also available in print (ISBN 951-22-7344-6)
Abstract
A crucial problem in exploratory analysis of data is that it is difficult for computational
methods to focus on interesting aspects of data. Traditional methods of
unsupervised learning cannot differentiate between interesting and noninteresting
variation, and hence may model, visualize, or cluster parts of data that are not
interesting to the analyst. This wastes the computational power of the methods
and may mislead the analyst.
In this thesis, a principle called "learning metrics" is used to develop visualization
and clustering methods that automatically focus on the interesting aspects, based on
auxiliary labels supplied with the data samples. The principle yields non-Euclidean
(Riemannian) metrics that are data-driven, widely applicable, versatile, invariant
to many transformations, and in part invariant to noise.
Learning metric methods are introduced for five tasks: nonlinear visualization
by Self-Organizing Maps and Multidimensional Scaling, linear projection, and clustering
of discrete data and multinomial distributions. The resulting methods either
explicitly estimate distances in the Riemannian metric, or optimize a tailored cost
function which is implicitly related to such a metric. The methods have rigorous
theoretical relationships to information geometry and probabilistic modeling, and
are empirically shown to yield good practical results in exploratory and information
retrieval tasks.
This thesis consists of an overview and of the following 8 publications:
- Samuel Kaski, Janne Sinkkonen, and Jaakko Peltonen, 2001. Bankruptcy analysis
with self-organizing maps in learning metrics. IEEE Transactions on Neural
Networks 12, number 4, pages 936-947.
© 2001 IEEE. By permission.
- Jaakko Peltonen, Arto Klami, and Samuel Kaski, 2002. Learning more accurate
metrics for Self-Organizing Maps. In: José R. Dorronsoro (editor),
Proceedings of the International Conference on Artificial Neural Networks (ICANN 2002). Madrid, Spain,
27-30 August 2002. Berlin, Springer-Verlag. Lecture Notes in Computer Science 2415, pages 999-1004.
© 2002 Springer-Verlag. By permission.
- Jaakko Peltonen, Janne Sinkkonen, and Samuel Kaski, 2002. Discriminative clustering
of text documents. In: Lipo Wang, Jagath C. Rajapakse, Kunihiko
Fukushima, Soo-Young Lee, and Xin Yao (editors), Proceedings of the 9th
International Conference on Neural Information Processing (ICONIP'02).
Singapore, 18-22 November 2002. Piscataway, NJ, IEEE, volume 4, pages 1956-1960.
© 2002 IEEE. By permission.
- Jarkko Venna, Samuel Kaski, and Jaakko Peltonen, 2003. Visualizations for assessing
convergence and mixing of MCMC. In: Nada Lavrač, Dragan Gamberger, Ljupco Todorovski, and Hendrik
Blockeel (editors), Proceedings of the 14th European Conference
on Machine Learning (ECML 2003). Cavtat - Dubrovnik, Croatia, 22-26 September 2003. Berlin, Springer-Verlag.
Lecture Notes in Artificial Intelligence 2837, pages 432-443.
© 2003 Springer-Verlag. By permission.
- Samuel Kaski and Jaakko Peltonen, 2003. Informative discriminant analysis. In:
Tom Fawcett and Nina Mishra (editors), Proceedings of the Twentieth International
Conference on Machine Learning (ICML-2003). Washington DC, USA, 21-24 August 2003.
Menlo Park, CA, AAAI Press, pages 329-336.
© 2003 American Association for Artificial Intelligence (AAAI). By permission.
- Jaakko Peltonen, Janne Sinkkonen, and Samuel Kaski, 2004. Sequential information
bottleneck for finite data. In: Russ Greiner and Dale Schuurmans (editors),
Proceedings of the Twenty-First International Conference on Machine
Learning (ICML 2004). Banff, Canada, 4-8 July 2004. Madison, WI, Omnipress, pages 647-654.
© 2004 by authors.
- Jaakko Peltonen, Arto Klami, and Samuel Kaski. Improved learning of Riemannian
metrics for exploratory analysis. Neural Networks, accepted for
publication.
© 2004 by authors and © 2004 Elsevier Science. By permission.
- Jaakko Peltonen and Samuel Kaski. Discriminative components of data.
IEEE Transactions on Neural Networks, accepted for publication.
© 2004 IEEE. By permission.
Keywords:
clustering, data mining, exploratory data analysis, learning metrics,
supervision, visualization
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
© 2004 Helsinki University of Technology
Last update 2011-05-26