CS474 Natural Language Processing. Noisy channel model. Decoding algorithm. Pronunciation subproblem. Special case of Bayesian inference

Similar documents
Linking Task: Identifying authors and book titles in verbose queries

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Speech Recognition at ICSI: Broadcast News and beyond

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Switchboard Language Model Improvement with Conversational Data from Gigaword

Prediction of Maximal Projection for Semantic Role Labeling

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Modeling function word errors in DNN-HMM based LVCSR systems

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio

Memory-based grammatical error correction

The taming of the data:

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Context Free Grammars. Many slides from Michael Collins

Project Based Learning Debriefing Form Elementary School

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

The Role of the Head in the Interpretation of English Deverbal Compounds

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

English Language and Applied Linguistics. Module Descriptions 2017/18

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Finding Translations in Scanned Book Collections

Corpus Linguistics (L615)

Developing a TT-MCTAG for German with an RCG-based Parser

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Disambiguation of Thai Personal Name from Online News Articles

Modeling function word errors in DNN-HMM based LVCSR systems

2.1 The Theory of Semantic Fields

Lecture 1: Machine Learning Basics

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Dialog Act Classification Using N-Gram Algorithms

Bigrams in registers, domains, and varieties: a bigram gravity approach to the homogeneity of corpora

A Comparison of Two Text Representations for Sentiment Analysis

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

Investigation on Mandarin Broadcast News Speech Recognition

Online Updating of Word Representations for Part-of-Speech Tagging

Language and Computers. Writers Aids. Introduction. Non-word error detection. Dictionaries. N-gram analysis. Isolated-word error correction

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Grammars & Parsing, Part 1:

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

AQUA: An Ontology-Driven Question Answering System

knarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese

Common Core State Standards for English Language Arts

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Natural Language Processing. George Konidaris

Distant Supervised Relation Extraction with Wikipedia and Freebase

Cross Language Information Retrieval

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Cross-Lingual Text Categorization

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

MCAS_2017_Gr5_ELA_RID. IV. English Language Arts, Grade 5

Missouri GLE FIRST GRADE. Communication Arts Grade Level Expectations and Glossary

An Efficient Implementation of a New POP Model

What is NLP? CS 188: Artificial Intelligence Spring Why is Language Hard? The Big Open Problems. Information Extraction. Machine Translation

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Syntactic surprisal affects spoken word duration in conversational contexts

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand

Indian Institute of Technology, Kanpur

A Bayesian Learning Approach to Concept-Based Document Classification

Learning Methods in Multilingual Speech Recognition

Public Speaking Rubric

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Modeling full form lexica for Arabic

Development of the First LRs for Macedonian: Current Projects

arxiv:cmp-lg/ v1 7 Jun 1997 Abstract

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Introduction. Beáta B. Megyesi. Uppsala University Department of Linguistics and Philology Introduction 1(48)

TEKS Comments Louisiana GLE

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Detecting English-French Cognates Using Orthographic Edit Distance

Salli Kankaanpää, Riitta Korhonen & Ulla Onkamo. Tallinn,15 th September 2016

Search right and thou shalt find... Using Web Queries for Learner Error Detection

The Discourse Anaphoric Properties of Connectives

Speech Emotion Recognition Using Support Vector Machine

Miscommunication and error handling

BYLINE [Heng Ji, Computer Science Department, New York University,

MISSISSIPPI OCCUPATIONAL DIPLOMA EMPLOYMENT ENGLISH I: NINTH, TENTH, ELEVENTH AND TWELFTH GRADES

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Longman English Interactive

Transcription:

CS474 Natural Language Processing Last week SENSEVAL» Pronunciation variation in speech recognition Today» Decoding algorithm Introduction to generative models of language» What are they?» Why they re important» Issues for counting words» Statistics of natural language Noisy channel model Channel introduces noise which makes it hard to recognize the true word. Goal: build a model of the channel so that we can figure out how it modified the true word so that we can recover it. Decoding algorithm Special case of Bayesian inference Bayesian classification» Given observation, determine which of a set of classes it belongs to.» Observation string of phones» Classify as a word in the language Pronunciation subproblem Given a string of phones, O (e.g. [ni]), determine which word from the lexicon corresponds to it Consider all words in the vocabulary, V Select the single word, w, such that P (word w observation O) is highest wˆ = arg max w V w O)

Bayesian approach Use Bayes rule to transform into a product of two probabilities, each of which is easier to compute than w O) P ( x y) = y x) x) y) Computing the prior Using the relative frequency of the word in a large corpus Brown corpus and Switchboard Treebank w knee freq(w) 61 w).000024 wˆ = arg max w V likelihood prior O w) w) O) the neat need new 114,834 338 1417 2625.046.00013.00056.001 Probabilistic rules for generating pronunciation likelihoods Sample rules that account for [ni] Take the rules of pronunciation (see chapter 4 of J&M) and associate them with probabilities Nasal assimilation rule Compute the probabilities from a large labeled corpus (like the transcribed portion of Switchboard) Run the rules over the lexicon to generate different possible surface forms each with its own probability

Final results new is the most likely Turns out to be wrong I [ni] w p(y w) p(w) new.36.001 neat.52.00013 need.11.00056 knee 1.00.000024 the 0.046 p(y w)p(w).00036.000068.000062.000024 0 CS474 Natural Language Processing Last week SENSEVAL» Pronunciation variation in speech recognition Today» Decoding algorithm Introduction to generative models of language» What are they?» Why they re important» Issues for counting words» Statistics of natural language Motivation for generative models Word prediction Once upon a I d like to make a collect Let s go outside and take a The need for models of word prediction in NLP has not been uncontroversial But it must be recognized that the notion probability of a sentence is an entirely useless one, under any known interpretation of this term. -Noam Chomsky (1969) Every time I fire a linguist the recognition rate improves. -Fred Jelinek (IBM speech group, 1988) Why are word prediction models important? Augmentative communication systems For the disabled, to predict the next words the user wants to speak Computer-aided education System that helps kids learn to read (e.g. Mostow et al. system) Speech recognition Use preceding context to improve solutions to the subproblem of pronunciation variation Lexical tagging tasks

Why are word prediction models important? Closely related to the problem of computing the probability of a sequence of words Can be used to assign a probability to the next word in an incomplete sentence Useful for part-of-speech tagging, probabilistic parsing N-gram model Uses the previous N-1 words to predict the next one 2-gram: bigram 3-gram: trigram In speech recognition, these statistical models of word sequences are referred to as a language model Counting words in corpora Ok, so how many words are in this sentence? Depends on whether or not we treat punctuation marks as words Important for many NLP tasks» Grammar-checking, spelling error detection, author identification, part-of-speech tagging Spoken language corpora Utterances don t usually have punctuation, but they do have other phenomena that we might or might not want to treat as words» I do uh main- mainly business data processing Fragments Filled pauses» um and uh behave more like words, so most speech recognition systems treat them as such Counting words in corpora Capitalization Should They and they be treated as the same word?» For most statistical NLP applications, they are» Sometimes capitalization information is maintained as a feature E.g. spelling error correction, part-of-speech tagging Inflected forms Should walks and walk be treated as the same word?» No for most n-gram based systems» based on the wordform (i.e. the inflected form as it appears in the corpus) rather than the lemma (i.e. set of lexical forms that have the same stem)

Counting words in corpora Need to distinguish word types» the number of distinct words word tokens» the number of running words Example All for one and one for all. 8 tokens (counting punctuation) 6 types (assuming capitalized and uncapitalized versions of the same token are treated separately) Topics for today Today Introduction to generative models of language» What are they?» Why they re important» Issues for counting words» Statistics of natural language How many words are there in English? How are they distributed? Option 1: count the word entries in a dictionary OED: 600,000 American Heritage (3 rd edition): 200,000 Actually counting lemmas not wordforms Option 2: estimate from a corpus Switchboard (2.4 million wordform tokens): 20,000 wordform types Shakespeare s complete works: 884,647 wordform tokens; 29,066 wordform types Brown corpus (1 million tokens): 61,805 wordform types 37,851 lemma types Brown et al. 1992: 583 million wordform tokens, 293,181 wordform types frequency function words content words rare words rank in frequency list

Statistical Properties of Text Zipf s Law (Tom Sawyer) Zipf s Law relates a term s frequency to its rank Frequency 1/rank There is a constant k such that freq * rank = k The most frequent words in one corpus may be rare words in another corpus Example: computer in CACM vs. National Geographic Each corpus has a different, fairly small working vocabulary These properties hold in a wide range of languages Manning and Schutze SNLP Zipf s Law Useful as a rough description of the frequency distribution of words in human languages Behavior occurs in a surprising variety of situations English verb polysemy References to scientific papers Web page in-degrees, out-degrees Royalties to pop-music composers