Information and Communication Technology Department

Tutorial: Standards, Tools and Architectures for Natural Language Spoken Dialogue Systems

A scalable approach to speech applications

Giuseppe Di Fabbrizio (AT&T Labs - Research)

7-8 September 2006- Povo –Trento

Tutorial Chair

Giuseppe Riccardi, University of Trento

Registration

Limited seats available.

Need to Register and send email to: HLT06-unitn@dit.unitn.it

This tutorial is organized by the Adaptive Multimodal Interface Lab (Department of Information and Communication Technology, University of Trento) and supported by Marie Curie Actions grant

“Technology and Architecture for Spoken Dialog Technology”.

Thursday, September 7^th, 2006 – Room 105

9:30 am – 12:30 pm Lecture 1

2:30 pm – 4:30 pm Lab 1

Advanced Speech Technology Series Tutorial (Part I)

Friday, September 8^th, 2006 – Room 105

9:30 am – 12:30 pm Lecture 2

2:30 pm – 4:30 pm Lab 2

Advanced Speech Technology Series Tutorial (Part II)

Spoken Dialogue Systems (SDS) have been receiving a great deal of attention from the research community and the industry. SDS allow individuals to interact with computer systems using spoken natural language in order to perform specific tasks as they would with human agents. Examples of interactions include tasks such as retrieving information about sports, weather, news, stock quotes (voice portal). More challenging systems, like automated customer care and help desks, endeavor to automatically fulfill customer transactions that are typically achieved through human agents. The promise of automating human-operated services is an attractive proposition for enterprises that are dealing with crowds of customers daily flocking at call centers tool free numbers. However, the recent widespread adoption of W3C standards in the speech industry fostered the notion that ‘well-established’ web authoring approaches would easily apply to speech-enabled applications. But properly capturing the user intentions and successfully orchestrate a human-machine dialogue is nor easy or a scalable task with current standards alone.

Part of the challenge is the integration of interdisciplinary techniques in a general and flexible development framework. For example, current state-of-the-art SDS research relies on the several advanced components, such as automatic speech recognizer (ASR), natural language understanding (NLU), text to speech (TTS), natural language generation (NLG), dialogue management (DM), and general database backend access. Each of these components is typically tightly integrated into a telephony platform that offers the execution environment and the needed interfaces to the public switched telephony network (PSTN). Current speech standards only address part of the authoring task, while the general design, the NLU, and language modeling required by a modern conversational system are still art rather than science.

This tutorial seeks to educate speech researchers and practitioners to overcome those limitations. It will introduce general speech design principles, standards, tools, architectures, and protocols in a coherent environment driven by the latest advances in the research forefront and industry trends. Based upon the lesson learned on large natural language speech application deployments, this class will recast the task of designing and implement speech–enabled services in a flexible and scalable way. The tutorial is divided in four modules organized in two morning lectures and two afternoon hands-on projects with practical assignments.

The tutorial and the teaching material are in English.

Lecture 1

Historical background
Speech-enabled applications components and architecture
Elements of web programming
Voice Applications life cycle
VoiceXML markup language, overview with examples
SSML markup language
ABNF/XML speech grammars and language modeling
NLSML/EMMA overview

Lecture 2

Natural language processing with GATE
“The whole nine yards”: A NL SDS case study
Advanced Topics
Voice over IP
Multimodal/Multimedia Systems
Stochastic Spoken and Natural Language Generation
Question/Answering Systems

Lab 1

Getting started
VoiceXML tools
Setting up the application framework
A simple speech recognition application (form interpretation)
ABNF/XML grammars examples
VoiceXML events/error management

Lab 2

Dynamic VoiceXML generation
Cookies and session management
GATE integration examples
VoIP examples