Information and Communication   Technology  Department




Tutorial: Standards, Tools and Architectures for Natural Language Spoken Dialogue Systems

A scalable approach to speech applications

Giuseppe Di Fabbrizio (AT&T Labs - Research)

7-8 September 2006- PovoTrento



Tutorial Chair

Giuseppe Riccardi, University of Trento



Limited seats available.

Need to Register and send email to:



This tutorial is organized by the Adaptive Multimodal Interface Lab (Department of Information and Communication Technology, University of Trento) and supported by Marie Curie Actions grant

“Technology and Architecture for Spoken Dialog Technology”.




Thursday, September 7th, 2006 – Room 105

9:30 am – 12:30 pm Lecture 1

2:30 pm – 4:30 pm Lab 1

Advanced Speech Technology Series Tutorial (Part I)


Friday, September 8th, 2006 – Room 105

9:30 am – 12:30 pm Lecture 2

2:30 pm – 4:30 pm   Lab 2

Advanced Speech Technology Series Tutorial (Part II)



Spoken Dialogue Systems (SDS) have been receiving a great deal of attention from the research community and the industry. SDS allow individuals to interact with computer systems using spoken natural language in order to perform specific tasks as they would with human agents. Examples of interactions include tasks such as retrieving information about sports, weather, news, stock quotes (voice portal). More challenging systems, like automated customer care and help desks, endeavor to automatically fulfill customer transactions that are typically achieved through human agents. The promise of automating human-operated services is an attractive proposition for enterprises that are dealing with crowds of customers daily flocking at call centers tool free numbers. However, the recent widespread adoption of W3C standards in the speech industry fostered the notion that ‘well-established’ web authoring approaches would easily apply to speech-enabled applications.  But properly capturing the user intentions and successfully orchestrate a human-machine dialogue is nor easy or a scalable task with current standards alone.

Part of the challenge is the integration of interdisciplinary techniques in a general and flexible development framework. For example, current state-of-the-art SDS research relies on the several advanced components, such as automatic speech recognizer (ASR), natural language understanding (NLU), text to speech (TTS), natural language generation (NLG), dialogue management (DM), and general database backend access. Each of these components is typically tightly integrated into a telephony platform that offers the execution environment and the needed interfaces to the public switched telephony network (PSTN). Current speech standards only address part of the authoring task, while the general design, the NLU, and language modeling required by a modern conversational system are still art rather than science.

This tutorial seeks to educate speech researchers and practitioners to overcome those limitations. It will introduce general speech design principles, standards, tools, architectures, and protocols in a coherent environment driven by the latest advances in the research forefront and industry trends. Based upon the lesson learned on large natural language speech application deployments, this class will recast the task of designing and implement speech–enabled services in a flexible and scalable way. The tutorial is divided in four modules organized in two morning lectures and two afternoon hands-on projects with practical assignments.

The tutorial and the teaching material are in English.


Lecture 1

  • Historical background
  • Speech-enabled applications components and architecture
  • Elements of web programming
  • Voice Applications life cycle
  • VoiceXML markup language, overview with examples
  • SSML markup language
  • ABNF/XML speech grammars and language modeling
  • NLSML/EMMA overview

Lecture 2

  • Natural language processing with GATE
  • “The whole nine yards”:  A NL SDS case study
  • Advanced Topics
  • Voice over IP
  • Multimodal/Multimedia Systems
  • Stochastic Spoken and Natural Language Generation
  • Question/Answering Systems



Lab 1

  • Getting started
  • VoiceXML tools
  • Setting up the application framework
  • A simple speech recognition application (form interpretation)
  • ABNF/XML grammars examples
  • VoiceXML events/error management



Lab 2

  • Dynamic VoiceXML generation
  • Cookies and session management
  • GATE integration examples
  • VoIP examples