The doctoral dissertations of the former Helsinki University of Technology (TKK) and Aalto University Schools of Technology (CHEM, ELEC, ENG, SCI) published in electronic format are available in the electronic publications archive of Aalto University - Aaltodoc.
|
|
|
Dissertation for the degree of Doctor of Technology to be presented with due permission of the Department of Computer Science and Engineering for public examination and debate in Auditorium T2 at Helsinki University of Technology (Espoo, Finland) on the 19th of September, 2003, at 12 o'clock noon.
Overview in PDF format (ISBN 951-22-6670-9) [1204 KB]
Dissertation is also available in print (ISBN 951-22-6669-5)
Although the engineers of industry have access to process data, they seldom use advanced statistical tools to solve process control problems. Why this reluctance? I believe that the reason is in the history of the development of statistical tools, which were developed in the era of rigorous mathematical modelling, manual computation and small data sets. This created sophisticated tools. The engineers do not understand the requirements of these algorithms related, for example, to pre-processing of data. If algorithms are fed with unsuitable data, or parameterized poorly, they produce unreliable results, which may lead an engineer to turn down statistical analysis in general.
This thesis looks for algorithms that probably do not impress the champions of statistics, but serve process engineers. This thesis advocates three properties in an algorithm: supervised operation, robustness and understandability. Supervised operation allows and requires the user to explicate the goal of the analysis, which allows the algorithm to discover results that are relevant to the user. Robust algorithms allow engineers to analyse raw process data collected from the automation system of the plant. The third aspect is understandability: the user must understand how to parameterize the model, what is the principle of the algorithm, and know how to interpret the results.
The above criteria are justified with the theories of human learning. The basis is the theory of constructivism, which defines learning as construction of mental models. Then I discuss the theories of organisational learning, which show how mental models influence the behaviour of groups of persons. The next level discusses statistical methodologies of data analysis, and binds them to the theories of organisational learning. The last level discusses individual statistical algorithms, and introduces the methodology and the algorithms proposed by this thesis. This methodology uses three types of algorithms: visualization, variable selection and feature extraction. The goal of the proposed methodology is to reliably and understandably provide the user with information that is related to a problem he has defined interesting.
The above methodology is illustrated by an analysis of an industrial case: the concentrator of the Hitura mine. This case illustrates how to define the problem with off-line laboratory data, and how to search the on-line data for solutions. A major advantage of algorithmic study of data is efficiency: the manual approach reported in the early took approximately six man months; the automated approach of this thesis created comparable results in few weeks.
This thesis consists of an overview and of the following 8 publications:
Keywords: human learning, visualization, variable selection, feature selection, feature extraction, Self-Organizing Map, data mining, statistical analysis
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
© 2003 Helsinki University of Technology