Crisp, Fuzzy, and Probabilistic Faceted Semantic Search

Markus Holi

Doctoral dissertation for the degree of Doctor of Science in Technology to be presented with due permission of the Faculty of Information and Natural Sciences for public examination and debate in Auditorium AS1 at the Aalto University School of Science and Technology (Espoo, Finland) on the 9th of June 2010 at 12 noon.

This dissertation presents contributions to the development of the faceted semantic search (FSS) paradigm. First, two fundamental solutions to FSS, which have been widely used since their development are presented. The first is the projection of search facets from annotation ontologies using logical rules. The second is the logic rule-based generation of recommendation links for search items based on the semantic relations of these items.

After presenting these solutions, the rest of the dissertation focuses on solving the following deficiencies of FSS: the lack of capabilities to model uncertainty, the inability to rank search results according to relevance, and the usability problems resulting from naively using annotation ontology concepts as search categories. Two sets of solutions to these problems are presented.

First, a fuzzy faceted semantic search (FFSS) framework is developed, which extends the crisp set basis of FSS to fuzzy sets. This framework is based on two main ingredients: First, weighted annotations, which are used to determine the membership degrees of search items in annotation concepts. Second, fuzzy mappings of separate end-user categories onto the annotation concepts.

In addition, also a probabilistic faceted semantic search (PFSS) framework was developed, which incorporates weighted annotations, modeling of uncertainty in Semantic Web taxonomies, sophisticated mappings of end-user facets onto annotation ontologies, and the combination of evidence from multiple ranking schemes.

These ranking methods were empirically analyzed. According to the preliminary evaluation both ranking methods significantly improve quality of search results compared to crisp FSS. Both also outperformed a currently used heuristical ranking method. However, in the case of FFSS this difference did not reach the level of statistical significance.

Keywords: Semantic Web, ontology, fuzzy sets, probability theory, faceted search

