Unit 1: Sequence Models

Similar documents
NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Lecture 9: Speech Recognition

CS 598 Natural Language Processing

Cross Language Information Retrieval

On the Formation of Phoneme Categories in DNN Acoustic Models

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

Noisy Channel Models for Corrupted Chinese Text Restoration and GB-to-Big5 Conversion

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Information Session 13 & 19 August 2015

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

TEKS Comments Louisiana GLE

arxiv: v1 [cs.cl] 2 Apr 2017

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

arxiv:cs/ v2 [cs.cl] 7 Jul 1999

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

CS Machine Learning

Investigation on Mandarin Broadcast News Speech Recognition

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Applications of memory-based natural language processing

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Multiple Intelligence Theory into College Sports Option Class in the Study To Class, for Example Table Tennis

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Natural Language Processing. George Konidaris

Large Kindergarten Centers Icons

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Performance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database

Speech Recognition at ICSI: Broadcast News and beyond

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

FONDAMENTI DI INFORMATICA

National Taiwan Normal University - List of Presidents

Yoshida Honmachi, Sakyo-ku, Kyoto, Japan 1 Although the label set contains verb phrases, they

A Named Entity Recognition Method using Rules Acquired from Unlabeled Data

Detecting English-French Cognates Using Orthographic Edit Distance

South Carolina English Language Arts

Phonological Processing for Urdu Text to Speech System

AQUA: An Ontology-Driven Question Answering System

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Florida Reading Endorsement Alignment Matrix Competency 1

A Class-based Language Model Approach to Chinese Named Entity Identification 1

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Effectiveness of Electronic Dictionary in College Students English Learning

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Disambiguation of Thai Personal Name from Online News Articles

Language properties and Grammar of Parallel and Series Parallel Languages

Language and Computers. Writers Aids. Introduction. Non-word error detection. Dictionaries. N-gram analysis. Isolated-word error correction

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Richardson, J., The Next Step in Guided Writing, Ohio Literacy Conference, 2010

Language Model and Grammar Extraction Variation in Machine Translation

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

DIBELS Next BENCHMARK ASSESSMENTS

Information Retrieval

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Abbreviated text input. The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters.

5. Margi (Chadic, Nigeria): H, L, R (Williams 1973, Hoffmann 1963)

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

MARK 12 Reading II (Adaptive Remediation)

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Noisy SMS Machine Translation in Low-Density Languages

MARK¹² Reading II (Adaptive Remediation)

Lecture 1: Machine Learning Basics

Rendezvous with Comet Halley Next Generation of Science Standards

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Improvements to the Pruning Behavior of DNN Acoustic Models

Classroom Connections Examining the Intersection of the Standards for Mathematical Content and the Standards for Mathematical Practice

Problems of the Arabic OCR: New Attitudes

Learning Methods in Multilingual Speech Recognition

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Underlying Representations

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Using computational modeling in language acquisition research

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

ONLINE COURSES. Flexibility to Meet Middle and High School Students at Their Point of Need

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

arxiv:cmp-lg/ v1 7 Jun 1997 Abstract

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

What the National Curriculum requires in reading at Y5 and Y6

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Automatic English-Chinese name transliteration for development of multilingual resources

A General Class of Noncontext Free Grammars Generating Context Free Languages

Large vocabulary off-line handwriting recognition: A survey

Year 4 National Curriculum requirements

The Bruins I.C.E. School

Search right and thou shalt find... Using Web Queries for Learner Error Detection

A Syllable Based Word Recognition Model for Korean Noun Extraction

Grammars & Parsing, Part 1:

Indian Institute of Technology, Kanpur

Transcription:

CS 562: Empirical Methods in Natural Language Processing Unit 1: Sequence Models Lectures 11-13: Stochastic String Transformations (a.k.a. channel-models ) Weeks 5-6 -- Sep 29, Oct 1 & 6, 2009 Liang Huang (lhuang@isi.edu)

String Transformations General Framework for many NLP problems Examples Part-of-Speech Tagging Spelling Correction (Edit Distance) Word Segmentation Transliteration, Sound/Spelling Conversion, Morphology Chunking (Shallow Parsing) Beyond Finite-State Models (i.e., tree transformations) Summarization, Translation, Parsing, Information Retrieval,... Algorithms: Viterbi (both max and sum) CS 562 - Lec 11-13: String Transformations 2

Review of Noisy-Channel Model 3

Example 1: Part-of-Speech Tagging use tag bigram as a language model channel model is context-indep. 4

Work out the compositions if you want to implement Viterbi... case 1: language model is a tag unigram model p(t...t) = p(t1)p(t2)... p(tn) how many states do you get? case 1: language model is a tag bigram model p(t...t) = p(t1)p(t2 t1)... p(tn tn-1) how many states do you get? case 3: language model is a tag trigram model... 5

The case of bigram model context-dependence (from LM) propagates left and right! 6

In general... bigram LM with context-independent CM O(n m) states after composition g-gram LM with context-independent CM O(n m g-1 ) states after composition the g-gram LM itself has O(m g-1 ) states 7

HMM Representation HMM representation is not explicit about the search hidden states have choices over variables in FST composition, paths/states are explicitly drawn 8

Viterbi for argmax how about unigram? 9

Viterbi Tagging Example Q1. why is this table not normalized? Q2. is fish equally likely to be a V or N? Q3: how to train p(w t)? 10

A Side Note on Normalization how to compute the normalization factor? 11

Forward (sum instead of max) α 12

Forward vs. Argmax same complexity, different semirings (+, x) vs (max, x) for g-gram LM with context-indep. CM time complexity O(n m g ) space complexity O(n m g-1 ) 13

Viterbi for DAGs with Semiring 1. topological sort 2. visit each vertex v in sorted order and do updates for each incoming edge (u, v) in E use d(u) to update d(v): d(v) = d(u) w(u, v) key observation: d(u) is fixed to optimal at this time u w(u, v) v see tutorial on DP from course page time complexity: O( V + E ) Liang Huang (Penn) 14 Dynamic Programming

Example: Pronunciation from spelling to sound CS 562 - Lec 11-13: String Transformations 15

Pronunciation Dictionary (hw3: eword-epron.data)... AARON EH R AH N AARONSON AA R AH N S AH N... PEOPLE VIDEO P IY P AH L V IH D IY OW you can train p(s..s w) from this, but what about unseen words? also need alignment to train the channel model p(s e) & p(e s) 16 CS 562 - Lec 11-13: String Transformations

From Sound to Spelling input: HH EH L OW B EH R output: H E L L O B E A R or H E L O B A R E? p(e) => e => p(s e) => s p(w) => w => p(e w) => e => p(s e) => s p(w) => w => p(s w) => s p(w) => w => p(e w) => e => p(s e) => s => p(s) p(w) <= w <= p(w e) <= e <= p(e s) <= s <= p(s) w <= p(w s) <= s <= p(s) can you further improve from these? CS 562 - Lec 11-13: String Transformations 17

Example: Transliteration KEVIN KNIGHT => KH EH VH IH N N AY T K E B I N N A I T O CS 562 - Lec 11-13: String Transformations 18

Japanese 101 (writing systems) Japanese writing system has four components Kanji (Chinese chars): nouns, verb/adj stems, CJKV names Japan Tokyo train eat [inf.] Syllabaries Hiragana: function words (e.g. particles), suffices de ( at ) ka (question) ate Katakana: transliterated foreign words/names koohii ( coffee ) Romaji (Latin alphabet): auxiliary purposes CS 562 - Lec 11-13: String Transformations 19

Why Japanese uses Syllabries all syllables are: [consonant] + vowel + [nasal n] 10 consonants, 5 vowels = 50 basic syllables plus some variations Other languages have way more syllables, so they do alphabets read the Writing Systems tutorial from course page! CS 562 - Lec 11-13: String Transformations 20

Katakana Transliteration Examples ko n py u - ta - kompyuutaa (uu=û) computer a i su ku ri - mu aisukuriimu ice cream andoryuubitabi Andrew Viterbi yo - gu ru to yogurt CS 562 - Lec 11-13: String Transformations 21

Katakana on Streets of Tokyo from Knight & Sproat 09 koohiikoonaa saabisu coffee corner service bulendokoohii blend coffee sutoreetokoohii straight coffee juusu juice aisukuriimu ice cream toosuto toast CS 562 - Lec 11-13: String Transformations 22

Japanese <=> English: Cascades your job in HW3: decode Japanese Katakana words (transcribed in Romaji) back to English words koohiikoonaa => coffee corner what about duplicate paths with same string?? n-best crunching, or weighted determinization (see extra reading) CS 562 - Lec 11-13: String Transformations 23

Example: Word Segmentation you noticed that Japanese (e.g., Katakana) is written without spaces between words in order to guess the English you also do segmentation e.g. : ice cream this is a more important issue in Chinese also in Korean, Thai, and other East Asian Languages also in English: sounds => words (speech recognition) CS 562 - Lec 11-13: String Transformations 24

Chinese Word Segmentation min-zhu people-dominate democracy this was 5 years ago. now Google is good at segmentation! jiang-ze-min zhu-xi... -... - people dominate-podium President Jiang Zemin xia yu tian di mian ji shui Liang Huang (Penn) graph search 25 tagging problem Dynamic Programming

Word Segmentation Cascades Liang Huang (Penn) 26 Dynamic Programming

Example: Edit Distance O(k) deletion arcs courtesy of Jason Eisner a:!" b:!" a:b!:a b:a O(k) insertion arcs!:b O(k) identity arcs a) given x, y, what is p(y x); b) what is the most likely seq. of operations? b:b a:a c) given x, what is the most likely output y? d) given y, what is the most likely input x (with LM)? 27

Given x and y... given x, y a) what is p(y x)? (sum of all paths) b) what is the most likely conversion path? clara.o. Best path (by Dijkstra s algorithm) c:!" l:!" a:!" r:!" a:!" c:c l:c a:c r:c a:c a:!" b:!" a:b!:c!:c!:c!:c!:c!:c c:!" l:!" a:!" r:!" a:!"!:a b:a =!:a c:a l:a a:a r:a a:a!:a!:a c:!" l:!" a:!"!:a r:!"!:a a:!"!:a!:b.o. caca b:b a:a!:c!:a c:c l:c a:c r:c a:c!:c!:c c:!"!:c!:c!:c l:!" a:!" r:!" a:!" c:a l:a a:a r:a a:a!:a!:a!:a!:a!:a c:!" l:!" a:!" r:!" a:!" 28

Most Likely Corrupted Output c) given correct English x, what s the corrupted y with the highest score? 29

DP for most likely corrupted 30

d) Most Likely Original Input using an LM p(e) as source model for spelling correction case 1: letter-based language model pl(e) case 2: word-based language model pw(e) How would dynamic programming work for cases 1/2? 31

Dynamic Programming for d) 32

Summary of Edit Distance 33