Project description >> ClassDB component

Classifications are tree-like hierarchies where each node is assigned a natural language label. These labels describe concepts and the edge structure of a classification encodes how these concepts are interrelated in the domain modeled by the classification. Classifications have always been used by humans as the most effective and intuitive way to organize their knowledge according to their (subjective) view of a domain of interest. Classification hierarchies are used pervasively: they are embedded into users’ personal classifications of email messages, favorite web sites, and other files classified in the users’ file systems; they are used to organize goods’ categories in e-commerce web sites, to classify web pages into web directories (e.g., DMOZ), and in many other fields. Classifications are integral part of users’ every day experience of navigating, searching, and organizing data and information.

Because their labels are expressed in natural language and the semantics of their edges does not necessarily represent an ontological relation (and it can often be generalized to the “intersection” relation), classifications are very hard to reason about for automated computer agents. The goal of the ClassDB system is to overcome this limitation by making explicit the semantics encoded into the classification’s labels and structure. Particularly, ClassDB allows it to convert a classification into a formal classification – a structurally identical hierarchy whose labels are expressed in propositional description logic (DL) language. Converting classifications into formal classifications allows for automating of some essential operations to help users better manage their data repositories. Examples of such operations include:

  • Encoding of a (generic) document classification algorithm into an equivalent satisfiability problem;
  • Definition of (custom) user classification choices and encoding them as DL formulas assigned to the classification nodes;
  • Automatic classification of documents according to the predefined classification algorithm and user classification choices. This operation assumes that documents to be classified are assigned DL formulas which describe their contents;
  • Semantic search for documents and nodes which satisfy a given search criterion expressed as a DL formula;
  • Rational reconstruction of the classification structure according to taxonomic relationships that hold between nodes of the classification;
  • Identification of nodes with equivalent meaning located in different parts of the classification tree.

The core enabling factors for these and many other ClassDB operations are the availability of an Oracle which defines base forms for natural language words and shows how they are syntactically related; and the availability of a satisfiability reasoning engine, which allows for sound and complete propositional reasoning.

The following paper and presentation give a better insight of our work and provide some concrete examples of ClassDB operations:

F. Giunchiglia, M. Marchese, and I. Zaihrayeu: Encoding classifications into lightweight ontologies. In Proceedings of ESWC'06, Budva, Montenegro, June 2006. Springer.

F. Giunchiglia: Towards a Theory of Formal Classification. Invited talk at ESWC’05.

Contact: Viktor Pravdin (component manager)