The doctoral dissertations of the former Helsinki University of Technology (TKK) and Aalto University Schools of Technology (CHEM, ELEC, ENG, SCI) published in electronic format are available in the electronic publications archive of Aalto University - Aaltodoc.
|
|
|
Doctoral dissertation for the degree of Doctor of Science in Technology to be presented with due permission of the School of Science for public examination and debate in Auditorium T2 at the Aalto University School of Science (Espoo, Finland) on the 30th of March 2012 at 12 noon.
Overview in PDF format (ISBN 978-952-60-4516-0) [2475 KB]
Dissertation is also available in print (ISBN 978-952-60-4515-3)
When facing a typical pattern recognition task, one usually comes up with a number of so-called features: properties that describe the objects to be recognised. Based on these features, the task of the classifier building algorithm is to find useful rules that are suitable for the recognition of new objects.
Feature selection is a process where one tries to identify the useful features from among a potentially large set of candidates. The task is notoriously hard, and researchers have been tackling it already for decades. Solving the problem properly might today be more important than ever before, because in many applications, dataset sizes seem to grow faster than does the processing power of computers. For example, in the domain of genetic microarray data, there can easily be thousands of features.
Several research groups have published comparisons aiming to identify the feature selection method that is universally the best. Unfortunately, too often the way that such comparisons are done is just plain wrong. Based on the results of such studies, the computationally intensive search algorithms seem to perform much better than the simple approaches. However, it is shown in this thesis that when the comparison is done properly, it very often turns out that the simple and fast algorithms give results that are just as good, if not even better.
In addition, many studies suggest that excluding some of the features is much more useful than it actually is. This observation is relevant in practice, because the selection process typically takes a lot of time and computing resources – therefore, it would be very convenient not to have to carry it out at all. This thesis shows that the benefits obtained may be negligible compared to what has been presented previously, provided that they are measured correctly.
Moreover, the thesis presents a better-performing approach for accuracy estimation in case the amount of data is small. Further, extensions are discussed from feature selection to generic model selection, from the selection of individual features to that of sensors measuring the features, and from classification to regression. Finally, an industrial application is pointed out where the methods discussed prove useful.
This thesis consists of an overview and of the following 5 publications:
Keywords: machine learning, feature selection, variable selection, overfitting, search algorithms, comparison of algorithms, model selection, classification, regression
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
© 2012 Aalto University