A TAG-based noisy channel model of speech repairs

Similar documents
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Grammars & Parsing, Part 1:

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

LTAG-spinal and the Treebank

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Prediction of Maximal Projection for Semantic Role Labeling

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

cmp-lg/ Jan 1998

The Indiana Cooperative Remote Search Task (CReST) Corpus

The stages of event extraction

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

"f TOPIC =T COMP COMP... OBJ

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Developing a TT-MCTAG for German with an RCG-based Parser

Accurate Unlexicalized Parsing for Modern Hebrew

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Parsing of part-of-speech tagged Assamese Texts

An Efficient Implementation of a New POP Model

Context Free Grammars. Many slides from Michael Collins

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Learning Computational Grammars

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

arxiv:cmp-lg/ v1 7 Jun 1997 Abstract

The Discourse Anaphoric Properties of Connectives

Annotation Projection for Discourse Connectives

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Using dialogue context to improve parsing performance in dialogue systems

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Detecting English-French Cognates Using Orthographic Edit Distance

Using Semantic Relations to Refine Coreference Decisions

Learning Methods in Multilingual Speech Recognition

Natural Language Processing. George Konidaris

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

Hyperedge Replacement and Nonprojective Dependency Structures

Disambiguation of Thai Personal Name from Online News Articles

Linking Task: Identifying authors and book titles in verbose queries

A Graph Based Authorship Identification Approach

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

Proof Theory for Syntacticians

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Training and evaluation of POS taggers on the French MULTITAG corpus

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

The Ups and Downs of Preposition Error Detection in ESL Writing

Distant Supervised Relation Extraction with Wikipedia and Freebase

A Version Space Approach to Learning Context-free Grammars

Memory-based grammatical error correction

CS 598 Natural Language Processing

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

Search right and thou shalt find... Using Web Queries for Learner Error Detection

An Evaluation of POS Taggers for the CHILDES Corpus

Miscommunication and error handling

What is NLP? CS 188: Artificial Intelligence Spring Why is Language Hard? The Big Open Problems. Information Extraction. Machine Translation

Language and Computers. Writers Aids. Introduction. Non-word error detection. Dictionaries. N-gram analysis. Isolated-word error correction

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Dialog Act Classification Using N-Gram Algorithms

Ensemble Technique Utilization for Indonesian Dependency Parser

The Smart/Empire TIPSTER IR System

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Calibration of Confidence Measures in Speech Recognition

Refining the Design of a Contracting Finite-State Dependency Parser

Extracting Verb Expressions Implying Negative Opinions

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

The Effect of Multiple Grammatical Errors on Processing Non-Native Writing

Theoretical Syntax Winter Answers to practice problems

Part I. Figuring out how English works

Some Principles of Automated Natural Language Information Extraction

arxiv: v1 [cs.cl] 2 Apr 2017

Introduction. Beáta B. Megyesi. Uppsala University Department of Linguistics and Philology Introduction 1(48)

Multi-View Features in a DNN-CRF Model for Improved Sentence Unit Detection on English Broadcast News

Domain Adaptation for Parsing

Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain

Speech Recognition at ICSI: Broadcast News and beyond

Modeling function word errors in DNN-HMM based LVCSR systems

The Interface between Phrasal and Functional Constraints

Transcription:

A TAG-based noisy channel model of speech repairs Mark Johnson and Eugene Charniak Brown University ACL, 2004 Supported by NSF grants LIS 9720368 and IIS0095940 1

Talk outline Goal: Apply parsing technology and deeper linguistic analysis to (transcribed) speech Problem: Spoken language contains a wide variety of disfluencies and speech errors Why speech repairs are problematic for statistical syntactic models Statistical syntactic models capture nested head-to-head dependencies Speech repairs involve crossing rough-copy dependencies between sequences of words A noisy channel model of speech repairs Source model captures syntactic dependencies Channel model introduces speech repairs Tree adjoining grammar can formalize the non-cfg dependencies in speech repairs 2

Speech errors in (transcribed) speech Filled pauses Parentheticals Speech repairs I think it s, uh, refreshing to see the, uh, support... But, you know, I was reading the other day... Why didn t he, why didn t she stay at home? Ungrammatical constructions, i.e., non-standard English My friends is visiting me? (Note: this really isn t a speech error) Bear, Dowding and Schriberg (1992), Charniak and Johnson (2001), Heeman and Allen (1997, 1999), Nakatani and Hirschberg (1994), Stolcke and Schriberg (1996) 3

Special treatment of speech repairs Filled pauses are easy to recognize (in transcripts) Parentheticals appear in our training data and our parsers identify them fairly well Filled pauses and parentheticals are useful for identifying constituent boundaries (just as punctuation is) Our parser performs slightly better with parentheticals and filled pauses than with them removed Ungrammaticality and non-standard English aren t necessarily fatal Statistical parsers learn how to map sentences to their parses from a training corpus... but speech repairs warrant special treatment, since our parser never recognizes them even though they appear in the training data... Engel, Charniak and Johnson (2002) Parsing and Disfluency Placement, EMNLP 4

The structure of speech repairs... a flight to Boston, uh, I mean, to Denver on Friday... }{{} Reparandum }{{} Interregnum } {{ } Repair The Interregnum is usually lexically (and prosodically marked), but can be empty Repairs don t respect syntactic structure Why didn t she, uh, why didn t he stay at home? The Repair is often roughly a copy of the Reparandum identify repairs by looking for rough copies The Reparandum is often 1 2 words long ( word-by-word classifier) The Reparandum and Repair can be completely unrelated Shriberg (1994) Preliminaries to a Theory of Speech Disfluencies 5

Representation of repairs in treebank ROOT S CC EDITED NP VP and S, PRP MD VP NP VP, you can VB NP PRP VBP get DT NN you get a system Speech repairs are indicated by EDITED nodes in corpus The internal syntactic structure of EDITED nodes is highly unusual 6

Speech repairs and interpretation Speech repairs are indicated by EDITED nodes in corpus The parser does not posit any EDITED nodes even though the training corpus contains them Parser is based on context-free headed trees and head-to-argument dependencies Repairs involve rough copy dependencies that cross constituent boundaries Why didn t he, uh, why didn t she stay at home? Finite state and context free grammars cannot generate ww copy languages (but Tree Adjoining Grammars can) The interpretation of a sentence with a speech repair is (usually) the same as with the repair excised Identify and remove EDITED words before parsing Use a classifier to classify each word as EDITED or not EDITED (Charniak and Johnson, 2001) Use a noisy channel model to generate/remove repairs 7

The noisy channel model Source model P(X) Bigram/Parsing LM Source signal x a flight to Denver on Friday Noisy channel P(U X) TAG transducer Noisy signal u a flight to Boston uh I mean to Denver on Friday argmax x P(x u) = argmax x P(u x)p(x) Train source language model on treebank trees with EDITED nodes removed 8

Helical structure of speech repairs... a flight to Boston, uh, I mean, to Denver on Friday... }{{} Reparandum }{{} Interregnum } {{ } Repair uh I mean a flight to Boston to Denver on Friday Parser-based language model generates repaired string TAG transducer generates reparandum from repair Interregnum is generated by specialized finite state grammar in TAG transducer Joshi (2002), ACL Lifetime achievement award talk 9

TAG transducer models speech repairs uh I mean a flight to Boston to Denver on Friday Source language model: a flight to Denver on Friday TAG generates string of u:x pairs, where u is a speech stream word and x is either or a source word: a:a flight:flight to: Boston: uh: I: mean: to:to Denver:Denver on:on Friday:Friday TAG does not reflect grammatical structure (the LM does) right branching finite state model of non-repairs and interregnum TAG adjunction used to describe copy dependencies in repair 10

TAG derivation of copy constructions (α) a a (β) b b (γ) c c Auxiliary trees Derived tree Derivation tree 11

TAG derivation of copy constructions (α) a (β) a (α) b b a a (γ) c c Auxiliary trees Derived tree Derivation tree 12

TAG derivation of copy constructions (α) a (β) b a b a b b (α) (β) (γ) a c c Auxiliary trees Derived tree Derivation tree 13

TAG derivation of copy constructions (α) a (β) b (γ) a b a b c c b (α) (β) (γ) c a c Auxiliary trees Derived tree Derivation tree 14

Schematic TAG noisy channel derivation... a flight to Boston uh I mean to Denver on Friday... a:a flight:flight to: Boston: Denver:Denver uh: I: to:to mean: on:on Friday:Friday 15

Sample TAG derivation (simplified) (I want) a flight to Boston uh I mean to Denver on Friday... Start state: N want TAG rule: (α 1 ) N want a:a N a, resulting structure: N want a:a N a N want TAG rule: (α 2 ) N a, resulting structure: a:a N a flight:flight R flight:flight flight:flight R flight:flight I I 16

Sample TAG derivation (cont) (I want) a flight to Boston uh I mean to Denver on Friday... N want a:a N a N want flight:flight R flight,flight a:a N a R flight:flight to: R to:to flight:flight R flight:flight to: R to:to R flight:flight to:to I R flight:flight to:to I previous structure TAG rule (β 1 ) resulting structure 17

(I want) a flight to Boston uh I mean to Denver on Friday... N want a:a N a N want flight:flight R flight,flight a:a N a to: R to:to flight:flight R flight:flight R flight:flight to:to to: R to,to I Boston: R Boston,Denver previous structure R to:to R flight,flight R to,to Denver:Denver to:to Boston: R Boston:Denver I R to:to TAG rule (β 2 ) Denver:Denver 18 resulting structure

(I want) a flight to Boston uh I mean to Denver on Friday... N want a:a N a flight:flight R flight:flight R Boston:Denver to: Boston: R to:to R Boston:Denver R Boston:Denver N Denver TAG rule (β 3 ) R Boston:Denver N Denver R to:to Denver:Denver R flight:flight to:to I resulting structure 19

N want a:a N a flight:flight R flight:flight to: Boston: R to:to R Boston:Denver R Boston:Denver N Denver R to:to Denver:Denver on:on N on R flight:flight to:to Friday:Friday N Friday I... uh: I I: mean: 20

Switchboard corpus data... a flight to Boston, uh, I mean, to Denver on Friday... }{{} Reparandum }{{} Interregnum } {{ } Repair TAG channel model trained on the disfluency POS tagged Switchboard files sw[23]*.dps (1.3M words) which annotates reparandum, interregnum and repair Language model trained on the parsed Switchboard files sw[23]*.mrg with Reparandum and Interregnum removed 31K repairs, average repair length 1.6 words Number of training words: reparandum 50K (3.8%), interregnum 10K (0.8%), repair 53K (4%), overlapping repairs or otherwise unclassified 24K (1.8%) 21

Training data for TAG channel model... a flight to Boston, uh, I mean, to Denver on Friday... }{{} Reparandum }{{} Interregnum } {{ } Repair Minimum edit distance aligner used to align reparandum and repair words Prefers identity, POS identity, similar POS alignments Of the 57K alignments in the training data: 35K (62%) are identities 7K (12%) are insertions 9K (16%) are deletions 5.6K (10%) are substitutions 2.9K (5%) are substitutions with same POS 148 of the 352 substitutions (42%) in heldout data were not seen in training 22

Decoding using n-best rescoring We don t know of any efficient algorithms for decoding a TAG-based noisy channel and a parser-based language model... but the intersection of an n-gram language model and the TAG-based noisy channel is just another TAG Use the parser language model to rescore the 20-best bigram language model results: Use the bigram language model with a dynamic programming search to find the 20 best analyses of each string Parse each of these using the parser-based language model Select the overall highest-scoring analysis using the parser probabilities and the TAG-based noisy channel scores See: Collins (2000) Discriminative Reranking for Natural Language Parsing, Collins and Koo (to appear) Discriminative Reranking for Natural Language Parsing 23

Modified labeled precision/recall evaluation Goal: Don t penalize misattachment of EDITED nodes String positions on either side of EDITED nodes in the gold-standard corpus tree are equivalent (just like punctuation in parseval) ROOT S CC EDITED NP VP PRP VB, PRP MD VP VB NP DT NN and you get, you can get a system Charniak and Johnson (2001) Edit detection and parsing for transcribed speech 24

Empirical results Training and testing data has partial words and punctuation removed CJ01 is the Charniak and Johnson 2001 word-by-word classifier trained on new training and testing data Bigram is the Viterbi analysis using dynamic programming decoding with bigram language model Trigram and Parser are results of 20-best reranking using trigram and parser language models CJ01 Bigram Trigram Parser Precision 0.951 0.776 0.774 0.820 Recall 0.631 0.736 0.763 0.778 F-score 0.759 0.756 0.768 0.797 25

Conclusion and future work It is possible to detect and excise speech repairs with reasonable accuracy We can incorporate the very different syntactic and repair structures in a single noisy channel model Using a better language model improves overall performance It might be interesting to make the channel model sensitive to syntactic structure to capture the relationship between syntactic context and the location of repairs A log-linear model should permit us to integrate a wide variety of interacting syntactic and repair features There are lots of interesting ways of combining speech and parsing! 26

Estimating the model from data... a flight to Boston, uh, I mean, to Denver on Friday... }{{} Reparandum }{{} Interregnum } {{ } Repair P n (repair flight) The probability of a repair beginning after flight P(m Boston, Denver), where m {copy, substitute, insert, delete, nonrepair}: The probability of repair type m when the last reparandum word was Boston and the last repair word was Denver P w (tomorrow Boston, Denver) The probability that the next reparandum word is tomorrow when the last reparandum word was Boston and last repair word was Denver 27

The TAG rules and their probabilities P N want a:a N a = (1 P n (repair a)) P flight:flight N a R flight:flight = P n (repair flight) I These rules are just the TAG formulation of a HMM. 28

The TAG rules and their probabilities (cont.) P R flight:flight to: R flight:flight R to:to to:to = P r (copy flight, flight) P Boston: R to:to R Boston:Denver R to:to Denver:Denver = P r (substitute to, to) P w (Boston to, to) Copies generally have higher probability than substitutions 29

The TAG rules and their probabilities (cont.) P P tomorrow: R Boston,Denver P R Boston,Denver R Boston,Denver R Boston,tomorrow R tomorrow,denver R Boston,Denver tomorrow:tomorrow R Boston:Denver R Boston:Denver N Denver = P r (insert Boston, Denver) P w (tomorrow Boston, Denver) = P r (delete Boston, Denver) = P r (nonrepair Boston, Denver) 30

Decoding with a bigram language model We could search for the most likely parses of each sentence... or alternatively interpret the dynamic programming table directly: 1. compute the probability that each triple of adjacent substrings can be analysed as a reparandum/interregnum/repair 2. divide by the probability that the substrings do not contain a repair 3. if these odds are greater than a fixed threshold, identify this reparandum as EDITED. 4. find most highly scoring combination of repairs Advantages of the more complex approach: Doesn t require parsing the whole sentence (rather, only look for repairs up to some maximum size) Adjusting the odds threshold trades precision for recall Handles overlapping repairs (where the repair is itself repaired) [ [What did + what does he ] + what does she ] want? 31

(Standard) labeled precision/recall Precision = # correct nodes/# nodes in parse trees Recall = # correct nodes/# nodes in corpus trees A parse node p is correct iff there is a node c in the corpus tree such that label(p) label(c) (where ADVP PRT) left(p) r left(c) and right(p) r right(c) r is an equivalence relation on string positions I like, but Sandy hates, beans 32