Department of Psychology

Corpus Transcription and Analysis

A detailed description of the procedures and conventions used in creating the Buckeye corpus can be found in the corpus manual (The Variation in Conversation (ViC) Project: Creation of the Buckeye Corpus of Conversational Speech; Kiesling, Dilley, & Raymond).

Archiving the sound files and corresponding text transcriptions

The digital recordings of the collected conversations were transferred onto a PC using a digital I/O card, preserving the quality of the recordings by eliminating the need to resample the recordings using an A/D card, which can introduce additional noise into the speech files. The recorded conversations were transcribed into written English text by undergraduate transcribers using Soundscriber software. Creation of the English transcriptions was completed in Fall of 2001. The transcripts are stored as ASCII text files along with the sound files to which they are keyed (see below).

Automatic word and phone alignment: A first pass at identification and segmentation

The sound files and the corresponding written transcriptions were input to an automatic phonetic transcription program, Entropics Aligner. Aligner uses acoustic phone models that were trained on the TIMIT corpus of spoken English. It comes with a dictionary that lists several alternative pronunciations for many words and a facility to add newly encountered pronunciations and words to its dictionary. The phonetic alphabet used by Aligner is essentially that used in the DARPA standard in research on automatic speech recognition. Research assistants use Aligner to select the best fitting alternative pronunciations of words from among the alternatives listed in the dictionary and align the selected words and their phones to a portion of the sound wave.

Hand realignment: Creating a detailed word and phone transcription

Errors in the automatic alignment of words and phones produced by Aligner are corrected by phonetically trained research assistants. Identification and segmentation conventions are described in detail in the manual (see Publications). In general, corrections need to be made to the automatic transcriptions when the Aligner's labels are placed at the wrong locations or when a label that is not a part of Aligner's segmental repertoire is needed. For the hand alignment procedure, deciding upon the appropriate transcription of a given sequence is done using combined waveform and spectrographic displays of the signal using Entropics waves+ or Wavesurfer software.

The .words, .phones and .log label files

The alignment procedure creates three (ASCII text) 'label' files corresponding to each sound file. The first contains the word labels and their offset times (as offsets from the beginning of the sound file, in seconds). The second contains the phone labels and offset times. The third label file is a log of notes supplied by the labelers, marking instances of unusual voice quality, manner of speaking, nasality, etc. In addition to word and phone labels, the files contain labels to mark a variety of events - for example, pauses, dysfluencies (such as production errors and cutoffs), stretches of time when the interviewer was talking, etc. See the phonetic alphabet listing in the manual for details.

Ohio State nav bar

Buckeye Corpus