Speech Corpora. When you conduct research on speech you can either (1) record your own data or (2) use a ready-made speech corpus.

Similar documents
On the Formation of Phoneme Categories in DNN Acoustic Models

ROSETTA STONE PRODUCT OVERVIEW

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Speech Recognition at ICSI: Broadcast News and beyond

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Approved Foreign Language Courses

The ABCs of O-G. Materials Catalog. Skills Workbook. Lesson Plans for Teaching The Orton-Gillingham Approach in Reading and Spelling

Section V Reclassification of English Learners to Fluent English Proficient

Eyebrows in French talk-in-interaction

5 Guidelines for Learning to Spell

Mandarin Lexical Tone Recognition: The Gating Paradigm

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

English-German Medical Dictionary And Phrasebook By A.H. Zemback

Language Center. Course Catalog

A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition

DLM NYSED Enrollment File Layout for NYSAA

Characterizing and Processing Robot-Directed Speech

Modeling function word errors in DNN-HMM based LVCSR systems

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Section 7, Unit 4: Sample Student Book Activities for Teaching Listening

My First Spanish Phrases (Speak Another Language!) By Jill Kalz

5/26/12. Adult L3 learners who are re- learning their L1: heritage speakers A growing trend in American colleges

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Part I. Figuring out how English works

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Jacqueline C. Kowtko, Patti J. Price Speech Research Program, SRI International, Menlo Park, CA 94025

Florida Reading Endorsement Alignment Matrix Competency 1

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University

Case study Norway case 1

Phonological Processing for Urdu Text to Speech System

Why Is the Chinese Curriculum Difficult for Immigrants Children from Southeast Asia

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

The Structure of the ORD Speech Corpus of Russian Everyday Communication

Word Stress and Intonation: Introduction

CEFR Overall Illustrative English Proficiency Scales

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Basic German: CD/Book Package (LL(R) Complete Basic Courses) By Living Language

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

L1 Influence on L2 Intonation in Russian Speakers of English

Effect of Word Complexity on L2 Vocabulary Learning

Spanish IV Textbook Correlation Matrices Level IV Standards of Learning Publisher: Pearson Prentice Hall

Aviation English Solutions

Richardson, J., The Next Step in Guided Writing, Ohio Literacy Conference, 2010

Annotation Pro. annotation of linguistic and paralinguistic features in speech. Katarzyna Klessa. Phon&Phon meeting

Client Psychology and Motivation for Personal Trainers

THE ALLEGORY OF THE CATS By David J. LeMaster

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Learning English with CBC

Dialog Act Classification Using N-Gram Algorithms

The Indiana Cooperative Remote Search Task (CReST) Corpus

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

International Advanced level examinations

Speech Emotion Recognition Using Support Vector Machine

Tour. English Discoveries Online

Name of Course: French 1 Middle School. Grade Level(s): 7 and 8 (half each) Unit 1

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand

The influence of metrical constraints on direct imitation across French varieties

Automatic English-Chinese name transliteration for development of multilingual resources

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach

Rhythm-typology revisited.

A Believable Accent: The Phonology of the Pink Panther

Busuu The Mobile App. Review by Musa Nushi & Homa Jenabzadeh, Introduction. 30 TESL Reporter 49 (2), pp

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Journal of Phonetics

The influence of written task descriptions in Wizard of Oz experiments

Let's Learn English Lesson Plan

Consonants: articulation and transcription

Introduction to the Revised Mathematics TEKS (2012) Module 1

Lecture 9: Speech Recognition

Automatic intonation assessment for computer aided language learning

Letter-based speech synthesis

Longman English Interactive

Abbey Academies Trust. Every Child Matters

The Acquisition of English Intonation by Native Greek Speakers

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

Small-Vocabulary Speech Recognition for Resource- Scarce Languages

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

Unit 9. Teacher Guide. k l m n o p q r s t u v w x y z. Kindergarten Core Knowledge Language Arts New York Edition Skills Strand

place only as incidental to this main objective (p.5).

English Language and Applied Linguistics. Module Descriptions 2017/18

Modeling function word errors in DNN-HMM based LVCSR systems

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Conversions among Fractions, Decimals, and Percents

Loughton School s curriculum evening. 28 th February 2017

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

Chapter 5: Language. Over 6,900 different languages worldwide

Fisk Street Primary School

This course may not be taken for a Letter Grade. Students may choose between these options instead:

Syllabus FREN1A. Course call # DIS Office: MRP 2019 Office hours- TBA Phone: Béatrice Russell, Ph. D.

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

RESPONSE TO LITERATURE

MFL SPECIFICATION FOR JUNIOR CYCLE SHORT COURSE

The Ohio State University. Colleges of the Arts and Sciences. Bachelor of Science Degree Requirements. The Aim of the Arts and Sciences

A Neural Network GUI Tested on Text-To-Phoneme Mapping

English Language Arts Summative Assessment

IN THIS UNIT YOU LEARN HOW TO: SPEAKING 1 Work in pairs. Discuss the questions. 2 Work with a new partner. Discuss the questions.

Transcription:

Speech Corpora Speech corpus a large collection of audio recordings of spoken language. Most speech corpora also have additional text files containing transcriptions of the words spoken and the time each word occurred in the recording. When you conduct research on speech you can either (1) record your own data or (2) use a ready-made speech corpus. Recording your own data: Linguists usually collect their own data in a phonetics laboratory where there is a soundattenuated booth and high-quality recording equipment. They ask speakers to read words or phrases that have been chosen specifically for the experiment. Words are read in the same carrier phrase in order to control for outside factors. Say heed two times. Say hid two times. Using a speech corpus: If you decide to use a speech corpus for your research, the Linguistics Department at Stanford has many available. Corpora are located either on: the AFS server the corpus computer in the Linguistics Department CDs, which can be checked out See the corpora webpage for detailed information about corpora available and gaining access: http://www.stanford.edu/dept/linguistics/corpora/ Speech corpora can be divided into two types: (1) Read speech Excerpts from books News broadcasts Word lists Number sequences (2) Spontaneous Speech Dialogs and meetings free conversations between 2 or more people Narratives one person telling a story Map-tasks two people are each given a map that other person cannot see. The maps are identical, except that one has a route specified. The person with the route must explain it to the other person. Appointment-tasks two people are given individual schedules and are supposed to find a free time to meet. Wizard of Oz simulations modeling a real-life situation, like booking a flight 1

Examples of English Speech Corpora in the Linguistics Department Speech Corpus Type of data Size Type of Annotation TIMIT Read sentences 630 speakers each reading 10 sentences Phonetic 8 US dialects Broadcast News News reports 104 hours of television and radio TIDIDIGITS Connected digit sequences Switchboard Phone conversations between strangers on an assigned topic CallHome Phone conversations with family and close friends. ICSI meetings Weekly meetings of various research groups HCRC Map Task Map-task broadcasts 326 speakers each reading 77 digit sequences 2400 conversations 543 speakers Many US dialects 120 conversations Up to 30 min each 72 hours 53 speakers 18 hours 62 speakers (mainly Scots English) Some phonetic ATIS Flight booking 36 speakers The vast majority of corpora are in English, but other languages are available as well: Arabic, Bulgarian, Cantonese, Czech, Farsi, French, German, Hindi, Japanese, Korean, Mandarin, Portuguese, Russian, Spanish, Tamil, Vietnamese. Advantages of using a speech corpus: (1) Time saving no need to collect and process recordings (2) Large amounts of data (3) Searchability (4) Real language usage Disadvantages of using a speech corpus: (1) Recording quality often lower than in a phonetic laboratory (2) Too much information may need to work on subsets (3) Messy - not as controlled as speech collected in a phonetic laboratory (4) Currently only available for mainstream languages 2

Types of Annotation In order for speech corpora to be useful for research they need to be labeled in some way. At the minimum the words spoken are transcribed in standard orthography. Sometimes additional linguistic information is provided: syllables, sounds, intonation, disfluencies, filled pauses (um, uh). Phonetic transcription is usually done in ARPABET (see chart below). Typically the actual recordings and the annotations are in separate files linked by a common filename. and phonetic transcriptions are usually simple text files. You may need to write small scripts to process the transcriptions or at least be able to use simple search commands such as grep. Audio Recording: 0.3319 0-0.4257 0 1.18893 Time (s) transcription (not time-aligned): A: What I was doing at, at home, is like I work nights here, so that's another long story that we will talk about. It's funny that I got you though. transcription (time-aligned): A 6.40 0.14 It's A 6.54 0.20 funny A 6.74 0.06 that A 6.80 0.12 I A 6.92 0.14 got A 7.06 0.18 you A 7.24 0.18 though. Phonetic Transcription (IPA: [Its f )i D R ai ƒatsu DoU]) 0.334407 121 h# 0.460000 121 ih t s 0.591176 121 f ah_n 0.650000 121 iy 0.732149 121 dh ah 0.828198 121 dx ay 0.940895 121 g_ap aa 1.140000 121 ch uw 1.339699 121 dh ow 1.464997 121 h# Examples of phonetic research with speech corpora: 3

Comparing pronunciations in different dialects Comparing pronunciation by males and females Flapping across word boundaries in spontaneous speech The effect of disfluencies on neighboring words Duration of sounds at the end of an utterance Pronunciation of unstressed vowels The omission of sounds (sound deletion) Palatalization across word boundaries whatcha, gotcha, wouja Intonational patterns In addition to general linguistic research, speech corpora play a crucial role in automatic speech recognition and speech synthesis. To work with speech, I recommend using Praat. It can be downloaded for free from http://www.praat.org and works on all platforms. (It s a good idea to go through the tutorial first.) Praat lets you measure following things (you will learn about these later in the course): Duration Vowel formants Fundamental frequency (Pitch) Intensity (Loudness) Practice with spontaneous speech The best part of speech corpora is having physical evidence of how we actually speak on a daily basis. Spontaneous speech is full of surprises! It s fascinating to compare how we think a phrase is pronounced with how someone actually says it in real conversation. You will hear the following utterances. Transcribe them phonetically using the IPA. Example 1: Example 2: Example 3: Example 4: Example 5: It s funny that I got you though. Yeah I guess that about does it. What s what s your most recent one that you ve seen. is you sit down at the table. On Monday I wear the worst looking one. 4

ARPABET and approximate IPA equivalents If you work with a phonetically transcribed corpus, most likely the sounds will be transcribed using the ARPABET (developed by the Advanced Research Projects Agency). Since you are learning the IPA in Ling 110, you may find this conversion chart useful for your project. ARPABET IPA ARPABET IPA p ph, p l l b b r t th, t w w d d y j k kh, k er g g iy i f f ih I v v ey ei th T eh E dh D ae Q s s aa A z z ah sh S ax zh Z ao ç hh h ow ou ch ts uh U jh dz uw u m m ay ai n n aw au ng N oy çi Sample Searches Searching for examples of the word probably in the Switchboard Corpus: % cd /afs/ir/data/linguistic-data/switchboard/switchboard-transcripts/swb1/trans % grep i probably phase*/disc*/*.txt Searching for sequence what you in the Switchboard Corpus: % cd /afs/ir/data/linguistic-data/switchboard/switchboard-transcripts/swb1/trans % grep i what you phase*/disc*/*.txt Many searches however may require a bit of programming to process the data. If this seems daunting you can ask around; someone may already have the program written that you need. 5