The Loqui Corpus will be released for public distribution at the close of the project.
During July and August of 2006 (prior to the current NSF award), Rebecca J. Passonneau and Esther Levin recorded approximately 175 telephone dialogues at the Andrew Heiskell Braille and Talking Book Library of New York City, under IRB protocol AAAB8075 from the Columbia University Morningside IRB. The calls were placed by patrons of the library, or their representatives (such as a spouse or social worker making a call for a patron). The calls were taken by one of several librarians on duty at the Heiskell Library Readers Adviser desk.
Between March, 2008 and April 2009, eighty-two of the calls were used to create the human-human segment of the Loqui corpus, based on their content and audio quality. All calls in this subset pertain to book orders. The digitized recordings were transcribed, and utterances were aligned with the speech signal, using Transcriber, a speech trannscription tool available from Sourceforge.net. The transcriptions and audiofiles were edited to replace all personal identifying information. Names, phone numbers, and spellings of names were replaced in the transription with tags; the relevant intervals of the audio file were replaced with spoken tags of the same duration (e.g., "patron first name").
Forty-eight of the eighty-two dialogues in the Loqui Human-Human corpus were annotated. The annotation scheme, developed by Rebecca Passonneau and Owen Rambow, consists of Dialogue Function Units (DFUs), which are intended to represent abstract units of interaction. It builds on our previous work in intention-based segmentation (Passonneau and Litman, 1997), and on mixing a formal schema with natural language descriptions (Nenkova et al., 2007).
The annotators worked from a combination of the transcription and the audio. Three annotators were trained together, annotated up to a dozen dialogues independently, then discussed, adjudicated and merged ten of them. During this phase, the annotation guidelines were refined and revised. One of the three annotators subsequently annotated 38 additional dialogues.
Our unit of annotation is the DFU. DFUs have an extent, a dialogue act (DA) label along with a description, and possibly one or more forward and/or backward links. For further details, see (Hu, Passonneau and Rambow, 2009).
Our transcription guidelines were created to instruct our undergraduates in conventions for transribing our spoken, digitized corpus, and for aligning the transcript with the audio signal.