The doctoral dissertations of the former Helsinki University of Technology (TKK) and Aalto University Schools of Technology (CHEM, ELEC, ENG, SCI) published in electronic format are available in the electronic publications archive of Aalto University - Aaltodoc.

Probabilistic Analysis of the Human Transcriptome with Side Information

Leo Lahti

Doctoral dissertation for the degree of Doctor of Science in Technology to be presented with due permission of the Faculty of Information and Natural Sciences for public examination and debate in Auditorium AS1 at the Aalto University School of Science and Technology (Espoo, Finland) on the 17th of December 2010 at 13 o'clock.

Overview in PDF format (ISBN 978-952-60-3368-6)   [1673 KB]
Dissertation is also available in print (ISBN 978-952-60-3367-9)


Recent advances in high-throughput measurement technologies and efficient sharing of biomedical data through community databases have made it possible to investigate the complete collection of genetic material, the genome, which encodes the heritable genetic program of an organism. This has opened up new views to the study of living organisms with a profound impact on biological research.

Functional genomics is a subdiscipline of molecular biology that investigates the functional organization of genetic information. This thesis develops computational strategies to investigate a key functional layer of the genome, the transcriptome. The time- and context-specific transcriptional activity of the genes regulates the function of living cells through protein synthesis. Efficient computational techniques are needed in order to extract useful information from high-dimensional genomic observations that are associated with high levels of complex variation. Statistical learning and probabilistic models provide the theoretical framework for combining statistical evidence across multiple observations and the wealth of background information in genomic data repositories.

This thesis addresses three key challenges in transcriptome analysis. First, new preprocessing techniques that utilize side information in genomic sequence databases and microarray collections are developed to improve the accuracy of high-throughput microarray measurements. Second, a novel exploratory approach is proposed in order to construct a global view of cell-biological network activation patterns and functional relatedness between tissues across normal human body. Information in genomic interaction databases is used to derive constraints that help to focus the modeling in those parts of the data that are supported by known or potential interactions between the genes, and to scale up the analysis. The third contribution is to develop novel approaches to model dependency between co-occurring measurement sources. The methods are used to study cancer mechanisms and transcriptome evolution; integrative analysis of the human transcriptome and other layers of genomic information allows the identification of functional mechanisms and interactions that could not be detected based on the individual measurement sources. Open source implementations of the key methodological contributions have been released to facilitate their further adoption by the research community.

This thesis consists of an overview and of the following 6 publications:

  1. Laura L. Elo, Leo Lahti, Heli Skottman, Minna Kyläniemi, Riitta Lahesmaa, and Tero Aittokallio. 2005. Integrating probe-level expression changes across generations of Affymetrix arrays. Nucleic Acids Research, volume 33, number 22, e193, 10 pages. © 2005 by authors.
  2. Leo Lahti, Laura L. Elo, Tero Aittokallio, and Samuel Kaski. 2011. Probabilistic analysis of probe reliability in differential gene expression studies with short oligonucleotide arrays. IEEE/ACM Transactions on Computational Biology and Bioinformatics, volume 8, number 1, pages 217-225. © 2011 Institute of Electrical and Electronics Engineers (IEEE). By permission.
  3. Leo Lahti, Juha E. A. Knuuttila, and Samuel Kaski. 2010. Global modeling of transcriptional responses in interaction networks. Bioinformatics, volume 26, number 21, pages 2713-2720. © 2010 by authors.
  4. Leo Lahti, Samuel Myllykangas, Sakari Knuutila, and Samuel Kaski. 2009. Dependency detection with similarity constraints. In: Tülay Adali, Jocelyn Chanussot, Christian Jutten, and Jan Larsen (editors). Proceedings of the 19th IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2009). Grenoble, France. 1-4 September 2009. Piscataway, NJ, USA. IEEE. Pages 89-94. ISBN 978-1-4244-4947-7. © 2009 Institute of Electrical and Electronics Engineers (IEEE). By permission.
  5. Janne Sinkkonen, Janne Nikkilä, Leo Lahti, and Samuel Kaski. 2004. Associative clustering. In: Jean-François Boulicaut, Floriana Esposito, Fosca Giannotti, and Dino Pedreschi (editors). Proceedings of the 15th European Conference on Machine Learning (ECML 2004). Pisa, Italy. 20-24 September 2004. Berlin, Heidelberg, Germany. Springer. Lecture Notes in Computer Science, volume 3201, pages 396-406. ISBN 3-540-23105-6. © 2004 by authors and © 2004 Springer Science+Business Media. By permission.
  6. Samuel Kaski, Janne Nikkilä, Janne Sinkkonen, Leo Lahti, Juha E. A. Knuuttila, and Christophe Roos. 2005. Associative clustering for exploring dependencies between functional genomics data sets. IEEE/ACM Transactions on Computational Biology and Bioinformatics: Special Issue on Machine Learning for Bioinformatics - Part 2, volume 2, number 3, pages 203-216. © 2005 Institute of Electrical and Electronics Engineers (IEEE). By permission.

Keywords: data integration, exploratory data analysis, functional genomics, probabilistic modeling, transcriptomics

This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.

© 2010 Aalto University School of Science and Technology

Last update 2011-05-26