The doctoral dissertations of the former Helsinki University of Technology (TKK) and Aalto University Schools of Technology (CHEM, ELEC, ENG, SCI) published in electronic format are available in the electronic publications archive of Aalto University - Aaltodoc.
Aalto

Methodologies for Time Series Prediction and Missing Value Imputation

Antti Sorjamaa

Doctoral dissertation for the degree of Doctor of Science in Technology to be presented with due permission of the Faculty of Information and Natural Sciences for public examination and debate in Auditorium T2 at the Aalto University School of Science and Technology (Espoo, Finland) on the 19th of November 2010 at 12 noon.

Overview in PDF format (ISBN 978-952-60-3453-9)   [1269 KB]
Dissertation is also available in print (ISBN 978-952-60-3452-2)

Abstract

The amount of collected data is increasing all the time in the world. More sophisticated measuring instruments and increase in the computer processing power produce more and more data, which requires more capacity from the collection, transmission and storage.

Even though computers are faster, large databases need also good and accurate methodologies for them to be useful in practice. Some techniques are not feasible to be applied to very large databases or are not able to provide the necessary accuracy.

As the title proclaims, this thesis focuses on two aspects encountered with databases, time series prediction and missing value imputation. The first one is a function approximation and regression problem, but can, in some cases, be formulated also as a classification task. Accurate prediction of future values is heavily dependent not only on a good model, which is well trained and validated, but also preprocessing, input variable selection or projection and output approximation strategy selection. The importance of all these choices made in the approximation process increases when the prediction horizon is extended further into the future.

The second focus area deals with missing values in a database. The missing values can be a nuisance, but can be also be a prohibiting factor in the use of certain methodologies and degrade the performance of others. Hence, missing value imputation is a very necessary part of the preprocessing of a database. This imputation has to be done carefully in order to retain the integrity of the database and not to insert any unwanted artifacts to aggravate the job of the final data analysis methodology. Furthermore, even though the accuracy is always the main requisite for a good methodology, computational time has to be considered alongside the precision.

In this thesis, a large variety of different strategies for output approximation and variable processing for time series prediction are presented. There is also a detailed presentation of new methodologies and tools for solving the problem of missing values. The strategies and methodologies are compared against the state-of-the-art ones and shown to be accurate and useful in practice.

This thesis consists of an overview and of the following 10 publications:

  1. Antti Sorjamaa, Jin Hao, Nima Reyhani, Yongnan Ji, and Amaury Lendasse. 2007. Methodology for long-term prediction of time series. Neurocomputing, volume 70, numbers 16-18, pages 2861-2869.
  2. Antti Sorjamaa, Yoan Miche, Robert Weiss, and Amaury Lendasse. 2008. Long-term prediction of time series using NNE-based projection and OP-ELM. In: Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IJCNN 2008), part of the 5th IEEE World Congress on Computational Intelligence (WCCI 2008). Hong Kong. 1-6 June 2008. Chennai, India. Research Publishing Services. Pages 2675-2681. ISBN 978-1-4244-1821-3.
  3. Antti Sorjamaa and Amaury Lendasse. 2006. Time Series Prediction using DirRec Strategy. In: Michel Verleysen (editor). Proceedings of the 14th European Symposium on Artificial Neural Networks (ESANN 2006). Bruges, Belgium. 26-28 April 2006. Bruges, Belgium. d-side publications. Pages 143-148. ISBN 2-930307-06-4.
  4. Souhaib Ben Taieb, Antti Sorjamaa, and Gianluca Bontempi. 2010. Multiple-output modeling for multi-step-ahead time series forecasting. Neurocomputing, volume 73, numbers 10-12, pages 1950-1957.
  5. Antti Sorjamaa, Amaury Lendasse, Yves Cornet, and Eric Deleersnijder. 2010. An improved methodology for filling missing values in spatiotemporal climate data set. Computational Geosciences, volume 14, number 1, pages 55-64.
  6. Antti Sorjamaa, Paul Merlin, Bertrand Maillet, and Amaury Lendasse. 2009. A non-linear approach for computing missing values in temporal databases. European Journal of Economic and Social Systems, volume 22, number 1, pages 99-117.
  7. Antti Sorjamaa and Amaury Lendasse. 2007. Time series prediction as a problem of missing values: Application to ESTSP2007 and NN3 competition benchmarks. In: Proceedings of the 2007 International Joint Conference on Neural Networks (IJCNN 2007). Orlando, Florida, USA. 12-17 August 2007. Eau Claire, Wisconsin, USA. Documation LLC. Pages 2948-2953. ISBN 1-4244-1380-X.
  8. Paul Merlin, Antti Sorjamaa, Bertrand Maillet, and Amaury Lendasse. 2010. X-SOM and L-SOM: A double classification approach for missing value imputation. Neurocomputing, volume 73, numbers 7-9, pages 1103-1108.
  9. Antti Sorjamaa, Francesco Corona, Yoan Miche, Paul Merlin, Bertrand Maillet, Eric Séverin, and Amaury Lendasse. 2009. Sparse linear combination of SOMs for data imputation: Application to financial database. In: José C. Príncipe and Risto Miikkulainen (editors). Proceedings of the 7th International Workshop on Advances in Self-Organizing Maps (WSOM 2009). St. Augustine, Florida, USA. 8-10 June 2009. Berlin, Heidelberg, Germany. Springer. Lecture Notes in Computer Science, volume 5629, pages 290-297. ISBN 978-3-642-02396-5.
  10. Antti Sorjamaa and Amaury Lendasse. 2010. Fast missing value imputation using ensemble of SOMs. Espoo, Finland: Aalto University School of Science and Technology. 20 pages. TKK Reports in Information and Computer Science, Report TKK-ICS-R33. ISBN 978-952-60-3247-4. ISSN 1797-5034.

Keywords: time series prediction, missing values, large databases, prediction strategy, variable selection, nonlinear imputation, EOF pruning, ensemble of SOMs

This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.

© 2010 Aalto University School of Science and Technology


Last update 2011-05-26