The Loqui project is funded by the National Science Foundation under awards IIS-0745369, IIS-0744904 and IIS-084966.
Automated telephone dialogue systems rely disproportionately
on accurate transcription of the speech signal into
readable text. As the performance of automatic speech
recognition (ASR) decreases, dialogue system performance often
falls off sharply. Our project seeks better dialogue strategies
that are less dependent on accurate ASR, and that degrade
gracefully. Our novel methodology, wizard ablation, collects
simulated human-system dialogues that vary in controlled ways.
Our testbed application, the CheckItOut dialogue system, is
modeled on a corpus of telephone transactions between patrons
and librarians that we collected at
New York City’s Andrew
Heiskell Braille & Talking Book Library. This application
has appropriately limited complexity, and potentially
broad social benefit. We based CheckItOut on
the Olympus spoken dialogue system architecture and RavenClaw dialogue manager
developed at Carnegie
Originally in wizard-of-Oz studies, unsuspecting users
interacted with human wizards “behind-the-screen”
to provide data on the humans interacting with
(what they believed to be) machines. In ablated wizard studies,
wizards are restricted to some of the data
available to the dialogue system; wizard and
system collaborate to produce responses to users.
This allows us to model wizard behavior using system features.
We study multiple wizards to identify relatively more
successful dialogue strategies, and to learn models from the best
wizard teachers. Our experiments show that wizards differ
in the accuracy of their interpretations of ASR, and that we can
model the best wizards using a combination of features from ASR,
voice search (database query with the ASR), semantic parsing,
and dialogue state.
The project has generated two ablated wizard corpora that will
be released after the project ends. The first is a corpus of
approximately 4,200 wizard-caller turn exchanges in which callers
request a book by title. The second consists of 913 full
dialogues (20,422 user utterances) with 6 wizards and 10 callers.
Callers requested four books per dialogue, by title, author or
catalogue number. The key to learning a machine-usable model of
wizard behavior is the selection of an appropriate set of
features, meaning data available to the system at decision time that
characterizes the user utterance and the dialogue context. Such
selection is non-trivial. The datasets our project produces support
a wide range of research and engineering goals. Our current research
applies a variety of dialogue-specific feature selection methods to feature
sets much larger than those commonly used to learn dialogue strategies from corpora.
To create more habitable dialogue, our work has begun to move toward an
architecture that integrates utterance interpretation and
dialogue management in a way that profits more fully from the rich set of
features we now use to model wizard behavior.