Teaching

Kernel-based Learning to Rank with Syntactic and Semantic Structures

Abstract

In recent years, machine learning (ML) has been more and more used to solve complex tasks in different disciplines, ranging from Data Mining to Information Retrieval (IR) or Natural Language Processing (NLP). These tasks often require the processing of structured input. For example, NLP applications critically deal with syntactic and semantic structures. Modeling the latter in terms of feature vectors for ML algorithms requires large expertise, intuition and deep knowledge about the target linguistic phenomenon. Kernel Methods (KMs) are powerful ML techniques (see e.g., [5]), which can alleviate the data representation problem as they substitute scalar product between feature vectors with similarity functions (kernels) directly defined between training/test instances, e.g., syntactic trees, (thus features are not needed anymore). Additionally, kernel engineering, i.e., the composition or adaptation of several prototype kernels, facilitates the design of the similarity functions required for new tasks, e.g., [1, 2]. KMs can be very valuable for IR research, e.g., KMs allow us to easily exploit syntactic/semantic structures, e.g., dependency, constituency or shallow semantic structures, in learning to rank algorithms [3, 4]. In general, KMs can make it easier the use of NLP techniques in IR tasks.

Summary of the Content

This tutorial aims at introducing essential and simplified theory of Support Vector Machines (SVMs) and KMs for the design of practical applications. It describes effective kernels for easily engineering automatic classifiers and learning to rank algorithms, also using structured data and semantic processing. Some examples are drawn from well-known tasks, i.e., Question Answering and Passage Re-ranking, Short and Long Text Categorization, Relation Extraction, Named Entity Recognition, Co-Reference Resolution. Moreover, some practical demonstrations are given with SVM-Light-TK (tree kernel) toolkit. More in detail, best practices for successfully using KMs for IR and NLP are presented according to the following outline:

(i) a very brief introduction to SVMs (explained from an application viewpoint) and KM theory (the essential content for understanding practical procedures).
(ii) Presentation of kernel engineering building blocks, such as linear, polynomial, lexical, sequence and tree kernels, by focusing on their function, accuracy and efficiency rather than their mathematical characterization, so that they can be easily understood.
(iii) Illustration of important applications for which kernels achieve the state of the art, i.e., Question Classification, Question and Answer (passage) Reranking, Relation Extraction, coreference resolution and hierarchical text categorization. In this perspective kernels for reranking will be presented as an efficient and effective approach to learning dependencies between structured input and output.
(iv) Practical exercise on quick design of ML systems using SVM-Light-TK toolkit, which encodes several kernels in SVMs.
(v) Summary of the key points to engineer innovative and effective kernels starting from basic kernels and using systematic data transformations.
(vi) Presentation of the latest KM findings: kernel-based learning on large-scale with fast SVMs, generalized structural and semantic kernels and reverse kernel engineering.

References
[1] A. Moschitti. Efficient convolution kernels for dependency and constituent syntactic trees. In Proceedings of ECML, 2006.
[2] A. Moschitti. Kernel methods, syntax and semantics for relational text categorization. In Proceeding of CIKM, 2008.
[3] A. Moschitti and S. Quarteroni. Linguistic kernels for answer re-ranking in question answering systems. Information Processing and Management, 2011.
[4] A. Severyn and A. Moschitti. Structural relationships for large-scale learning of answer re-ranking. In Proceedings of SIGIR, 2012.
[5] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004

Other references are available at the Publication page.

Tutorial Outline (slightly different from the final one)

Outline and Motivation (10 min)

Kernel Machines (30 min)

- Perceptron

- Support Vector Machines

- Kernel Definition (Kernel Trick)

- Mercer's Conditions

- Kernel Operators

- Efficiency issue: when can we use kernels?

Basic Kernels and their Feature Spaces (35 min)

- Linear Kernel

- Polynomial Kernel

- Lexical Kernels

- String and Word Sequence Kernels

- Syntactic Tree Kernel, Partial Tree kernel (PTK), Semantic Syntactic Tree Kernel, Smoothed PTK

Classification with Kernels (30 min)

- Question Classification using constituency, dependency and semantic structures

- Question Classification (QC) in Jeopardy!

- Relation Extraction

- Coreference Resolution

Break (30 min)

Practical Exercise with SVM-Light-TK (30 min)

- The kernel toolkit, SVM-Light-TK

- Experiments in classroom on Question Classification

- Inspection of the input, output, and model files

- Passage reranking exercise (if there is time left)

Reranking with kernels (40 min)

- Classification of Question/Answer (QA) pairs

- Preference Reranking Kernel

- Reranking NLP tasks:

Named Entities in a Sentence
Predicate Argument Structures
Segmentation and labeling of Speech Transcriptions

- Reranking the output of a hierarchical text classifier

- Reranking Passages with relational representations:

the IBM Watson system case

Large-scale learning with kernels (15 min)

- Cutting Plane Algorithm for SVMs

- Sampling methods (uSVMs)

- Compacting space with DAGs

Reverse Kernel Engineering (15 min)

- Model linearization

- Question Classification

Conclusions and Future Directions (5 min)

Additional Documentation

Machine Learning Lectures
- Statistical Learning Theory: linear classifiers
- Support Vector Machines
- Structured Output Spaces
- Kernel Methods
- Kernel Methods for Natural Language Processing

As referring text please use my new chapter:

Kernel-Based Machines for Abstract and Easy Modeling of Automatic Learning

along with the old book (with some typos)

Roberto Basili and Alessandro Moschitti, Automatic Text Categorization: from Information Retrieval to Support Vector Learning. Aracne editrice, Rome, Italy.

Some Natural Language Processing Lectures

- POS-Tagging and Named Entity Recognition
- Syntactic Parsing
- Semantic Role Labeling
- UIMA Int roduction
- Coreference Resolution
- Latent Semantic Analysis

Laboratory Lectures

- Answer reranking in Answerbag
- Zip file for the exercise (this is the Exercise 2)
- Answerbag dataset

Tutorial Slides

Exercise 1

Exercise 2

Sections

Abstract

Summary

Tutorial Outline

Previous Tutorials

Further Documentation

Related Publications

Abstract

Summary of the Content

(i) a very brief introduction to SVMs (explained from an application viewpoint) and KM theory (the essential content for understanding practical procedures).

(ii) Presentation of kernel engineering building blocks, such as linear, polynomial, lexical, sequence and tree kernels, by focusing on their function, accuracy and efficiency rather than their mathematical characterization, so that they can be easily understood.

(iv) Practical exercise on quick design of ML systems using SVM-Light-TK toolkit, which encodes several kernels in SVMs.

(v) Summary of the key points to engineer innovative and effective kernels starting from basic kernels and using systematic data transformations.

(vi) Presentation of the latest KM findings: kernel-based learning on large-scale with fast SVMs, generalized structural and semantic kernels and reverse kernel engineering.

Other references are available at the Publication page.

Tutorial Outline (slightly different from the final one)

Tutorial at ACL 2012:

State-of-the-Art Kernels for Natural Language Processing

- ACL slides

Tutorials at Coling 2010 and Interspeech 2010:

Kernel Engineering for Fast and Easy Design of Natural Language Applications

- Coling slides

- Interspeech slides