Analyzing Human and Machine Performance In Resolving Ambiguous Spoken Sentences

Similar documents
Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Copyright and moral rights for this thesis are retained by the author

Compositional Semantics

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Linking Task: Identifying authors and book titles in verbose queries

Thornhill Primary School - Grammar coverage Year 1-6

Parsing of part-of-speech tagged Assamese Texts

Context Free Grammars. Many slides from Michael Collins

Word Stress and Intonation: Introduction

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Part I. Figuring out how English works

Speech Emotion Recognition Using Support Vector Machine

CS 598 Natural Language Processing

5 th Grade Language Arts Curriculum Map

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Mandarin Lexical Tone Recognition: The Gating Paradigm

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Candidates must achieve a grade of at least C2 level in each examination in order to achieve the overall qualification at C2 Level.

SCHEMA ACTIVATION IN MEMORY FOR PROSE 1. Michael A. R. Townsend State University of New York at Albany

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Good-Enough Representations in Language Comprehension

Natural Language Processing. George Konidaris

Speech Recognition at ICSI: Broadcast News and beyond

Loughton School s curriculum evening. 28 th February 2017

CEFR Overall Illustrative English Proficiency Scales

Organizing Comprehensive Literacy Assessment: How to Get Started

THE VERB ARGUMENT BROWSER

A Case Study: News Classification Based on Term Frequency

Using dialogue context to improve parsing performance in dialogue systems

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Lower and Upper Secondary

Construction Grammar. University of Jena.

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Modeling function word errors in DNN-HMM based LVCSR systems

On the Formation of Phoneme Categories in DNN Acoustic Models

Developing Grammar in Context

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Journal of Phonetics

Morphosyntactic and Referential Cues to the Identification of Generic Statements

Formulaic Language and Fluency: ESL Teaching Applications

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

Eyebrows in French talk-in-interaction

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

What the National Curriculum requires in reading at Y5 and Y6

Modeling function word errors in DNN-HMM based LVCSR systems

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Human Emotion Recognition From Speech

A study of speaker adaptation for DNN-based speech synthesis

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Facing our Fears: Reading and Writing about Characters in Literary Text

NAME OF ASSESSMENT: Reading Informational Texts and Argument Writing Performance Assessment

Syntactic Ambiguity Resolution in Sentence Processing: New Evidence from a Morphologically Rich Language

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

Eye Movements in Speech Technologies: an overview of current research

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Degeneracy results in canalisation of language structure: A computational model of word learning

Language Acquisition Chart

Fountas-Pinnell Level P Informational Text

REVIEW OF CONNECTED SPEECH

Proof Theory for Syntacticians

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

The College Board Redesigned SAT Grade 12

Discourse Structure in Spoken Language: Studies on Speech Corpora

Effects of speaker gaze on spoken language comprehension: Task matters

Multi-modal Sensing and Analysis of Poster Conversations toward Smart Posterboard

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

Illinois WIC Program Nutrition Practice Standards (NPS) Effective Secondary Education May 2013

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

A Pumpkin Grows. Written by Linda D. Bullock and illustrated by Debby Fisher

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

PRESENTED BY EDLY: FOR THE LOVE OF ABILITY

Effect of Word Complexity on L2 Vocabulary Learning

A Corpus-Based Analysis of Students Composition Writing

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

Designing a Speech Corpus for Instance-based Spoken Language Generation

SLINGERLAND: A Multisensory Structured Language Instructional Approach

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

ENGLISH. Progression Chart YEAR 8

Learning Methods in Multilingual Speech Recognition

Eliciting Language in the Classroom. Presented by: Dionne Ramey, SBCUSD SLP Amanda Drake, SBCUSD Special Ed. Program Specialist

Control and Boundedness

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

IEEE Proof Print Version

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Getting the Story Right: Making Computer-Generated Stories More Entertaining

Transcription:

Analyzing Human and Machine Performance In Resolving Ambiguous Spoken Sentences Hussein Ghaly 1 and Michael Mandel 2 1 Graduate Center, City University of New York, 2 Brooklyn College, City University of New York

Motivation When an ambiguous sentence is spoken, what information does speech have which text alone doesn t? Our goal is to examine this information by analyzing human disambiguation of both text and speech for different types of ambiguities, and developing a model for automatic disambiguation using this information

Research Summary - Record sentences containing some ambiguity, with the speaker aware of the correct interpretation - Subjects hear or read sentences, predict the correct interpretation - Analyze acoustic features of each utterance, including multiple recordings of the same sentence - Develop a Machine Learning approach to predict the intended reading given the acoustic features

Types of Ambiguity - Lexical Ambiguity (I forgot my bag at the bank) - Syntactic Ambiguity (old men and women) - Comma Ambiguity - PP-attachment - NP-ambiguity - Coordination ambiguity etc

Comma Ambiguity A woman without her man is nothing. A woman: without her, man is nothing.

Comma Ambiguity - Without punctuation (e.g. out of ASR) text can be ambiguous - Can the written text be disambiguated by humans? - Can the spoken sentence be disambiguated by humans?

PP-Attachment I saw [the boy with the telescope] I saw the boy [with the telescope] me the boy

PP-Attachment This sentence has two possible interpretations, i.e., a structural ambiguity me the boy me the boy

PP-Attachment - Early vs. Late Closure Late Closure late closure is the principle that new words (or "incoming lexical items") tend to be associated with the phrase or clause currently being processed rather than with structures farther back in the sentence. * Early Closure me the boy me the boy *https://www.thoughtco.com/late-closure-sentence-processing-1691101

Hypotheses - When there is ambiguity in any sentence and the speaker is aware of the correct reading, they will convey their knowledge of the correct reading using certain prosodic cues. - Listeners will be able to use these cues to identify the correct reading better than readers will - These prosodic cues can be measured and analyzed and used as features for automatic disambiguation system using machine learning

Previous Research Psychology - Snedeker and Trueswell, 2003 - Using prosody to avoid ambiguity: Effects of speaker awareness and referential context. informative prosodic cues depend upon speaker's knowledge of the situation: speakers provide prosodic cues when needed; listeners use these prosodic cues when present. Prosodic cues include pauses and word durations, as shown from the utterances of a speaker who is aware of the intended meaning. tap [the frog with the flower] - modifier tap the frog [with the flower] - instrument

Previous Research NLP - Levi et al, 2012 - The effect of pitch, intensity and pause duration in punctuation detection Predicting punctuation from different prosodic cues of speech using neural networks Cues included: pitch, intensity and pause duration Achieved a punctuation detection rate of 54%

Data - We created a collection of 26 constructed sentences (6 pairs of sentences with comma ambiguity and 7 pairs of sentences with PP-attachment ambiguity) - We recorded the sentences spoken by a native speaker, each sentence recorded five times (total 130 recording files)

Comma Ambiguity - Speaker Tasks Record 6 pairs of constructed Comma-ambiguous sentences Example: 3a: John, said Mary, was the nicest person at the party. 3b: John said Mary was the nicest person at the party.

Comma Ambiguity - Listener Tasks For each Comma-ambiguous sentence, identify the intended meaning: Task 1 - Using Text only Task 2 - Using Audio Only Example: Sentence: John, said Mary, was the nicest person at the party. Question: Who was said to be the nicest person at the party? A- John B- Mary

PP-Attachment Ambiguity - Speaker Tasks Record 7 pairs of sentences with PP-attachment ambiguity, each pair contains a different preceding context supporting one reading of the sentence Example: 4a: One of the boys got a telescope. I saw the boy with the telescope. 4b:- I have a new telescope. I saw the boy with the telescope.

PP-Attachment Ambiguity - Listener Tasks For the following settings, identify the correct meaning by answering a question. For the last setting, sentences recordings were trimmed from the previous context. Who has the telescope? A- The boy B- The speaker Setting Text with context Presentation I have a new telescope. I saw the boy with the telescope. Audio with context Text without context I saw the boy with the telescope. Audio without context

Results - Human Evaluation Ambiguity Modality Accuracy Comma Text 99.3% Comma Audio 94.7% PP-attachment with context Text 93.1% PP-attachment with context Audio 97.1% PP-attachment without context Text 52.0% PP-attachment without context Audio 74.4%

Preceding Silent Pause Preposition Following NP Results - PP-Attachment - Acoustic Analysis acoustic feature values averaged over the 20 productions of the following sentences They discussed the mistakes in the second meeting. The lawyer contested the proceedings in the third hearing. Late Early Preposition Duration (ms) 147 143 Preceding silent pauses (ms) 0 48 Intensity (db) 57.8 56.4 Following NP duration (ms) 579 640

Acoustic Analysis - Early vs. Late Closure Early Closure Late Closure

Results - PP-Attachment - Machine Evaluation Feature Matrix - Extracted manually from 10 audio files for the sentence They discussed the mistakes in the second meeting. duration of preceding preposition (ms) silence (ms) following NP duration (ms) Preposition Intensity (db) Closure Type 160 0 690 56.6 early 175 0 660 59.0 late 120 0 470 56.2 late 140 80 620 55.6 early 145 0 600 58.7 late 140 90 635 57.8 early 135 0 510 61.1 late 150 110 600 57.9 early 130 0 620 61.0 late 140 60 580 58.8 early

Machine Evaluation Using Decision Trees for 20 data points with 5-fold cross-validation: 80% average accuracy in predicting early vs. late closure All sentences were using in training and testing each fold

Conclusions - Humans can disambiguate sentences with comma ambiguity with audio alone almost as well as with text containing punctuation - Humans can disambiguate spoken sentences with PP-attachment ambiguity without context, but cannot disambiguate the same sentences as text - When speakers are aware of the intended meaning, they can produce sentences in a way that can - Be disambiguated by listeners, even without context - Be identified through certain acoustic cues - Be disambiguated to some extent by machines, initial results are promising

Thank you! Questions?