Speech Recognition Lecture 6: Language Modeling Software Library

Similar documents
Deep Neural Network Language Models

Noisy SMS Machine Translation in Low-Density Languages

Investigation on Mandarin Broadcast News Speech Recognition

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Language Model and Grammar Extraction Variation in Machine Translation

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Toward a Unified Approach to Statistical Language Modeling for Chinese

Lecture 9: Speech Recognition

Improvements to the Pruning Behavior of DNN Acoustic Models

Abbreviated text input. The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters.

Detecting English-French Cognates Using Orthographic Edit Distance

COPING WITH LANGUAGE DATA SPARSITY: SEMANTIC HEAD MAPPING OF COMPOUND WORDS

Switchboard Language Model Improvement with Conversational Data from Gigaword

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Lecture 1: Machine Learning Basics

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Learning Methods in Multilingual Speech Recognition

Calibration of Confidence Measures in Speech Recognition

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Using dialogue context to improve parsing performance in dialogue systems

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Speech Recognition at ICSI: Broadcast News and beyond

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

arxiv:cmp-lg/ v1 22 Aug 1994

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

arxiv:cmp-lg/ v1 7 Jun 1997 Abstract

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Improving Fairness in Memory Scheduling

arxiv: v1 [cs.cl] 27 Apr 2016

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

IMPROVING PRONUNCIATION DICTIONARY COVERAGE OF NAMES BY MODELLING SPELLING VARIATION. Justin Fackrell and Wojciech Skut

Multi-Lingual Text Leveling

Learning Methods for Fuzzy Systems

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

cmp-lg/ Jan 1998

Language properties and Grammar of Parallel and Series Parallel Languages

Natural Language Processing. George Konidaris

A study of speaker adaptation for DNN-based speech synthesis

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

A Quantitative Method for Machine Translation Evaluation

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Softprop: Softmax Neural Network Backpropagation Learning

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Phonological Processing for Urdu Text to Speech System

WHEN THERE IS A mismatch between the acoustic

The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Search right and thou shalt find... Using Web Queries for Learner Error Detection

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Grammars & Parsing, Part 1:

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

The KIT-LIMSI Translation System for WMT 2014

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Human-like Natural Language Generation Using Monte Carlo Tree Search

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Corpus Linguistics (L615)

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Acquiring Competence from Performance Data

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

SARDNET: A Self-Organizing Feature Map for Sequences

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

arxiv: v1 [cs.cl] 2 Apr 2017

Python Machine Learning

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

Refining the Design of a Contracting Finite-State Dependency Parser

Characterizing and Processing Robot-Directed Speech

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Introduction to Simulation

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Learning Computational Grammars

Memory-based grammatical error correction

Improved Hindi Broadcast ASR by Adapting the Language Model and Pronunciation Model Using A Priori Syntactic and Morphophonemic Knowledge

A Case-Based Approach To Imitation Learning in Robotic Agents

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

An Efficient Implementation of a New POP Model

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

ARNE - A tool for Namend Entity Recognition from Arabic Text

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

A Neural Network GUI Tested on Text-To-Phoneme Mapping

The Smart/Empire TIPSTER IR System

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Dialog Act Classification Using N-Gram Algorithms

Transcription:

Speech Recognition Lecture 6: Language Modeling Software Library Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri

Software Library GRM Library: Grammar Library. General software collection for constructing and modifying weighted automata and transducers representing grammars and statistical language models (Allauzen, Mohri, and Roark, 2005). http://www.research.att.com/projects/mohri/grm 2

Software Libraries OpenGRM Libraries: open source libraries for constructing and using formal grammars in FST form, using OpenFST as underlying representation. NGram Library: create and manipulate n-gram language models encoded as weighted FSTs. (Roark et al., 2012) Thrax: compile regular expressions and contextdependent rewrite grammars into weighted FSTs. (Tai, Skut, and Sproat, 2011) http://opengrm.org 3

Overview Generality: to support the representation and use of the various grammars in dynamic speech recognition. Efficiency: to support competitive large-vocabulary dynamic recognition using automata of several hundred million states and transitions. Reliability: to serve as a solid foundation for research in statistical language modeling. 4

Language Modeling Tools Counts: automata (strings or lattices), merging. Models: Backoff or deleted interpolation smoothing. Katz or absolute discounting. Kneser-Ney models. Shrinking: weighted difference or relative entropy. Class-based modeling: straightforward. 5

Corpus Input: hello. bye. hello. bye bye. Corpus Labels <s> 1 </s> 2 <unknown> 3 hello 4 bye 5 Program: or farcompilestrings --symbols=labels.txt corpus.txt > corpus.far cat lattice1.fst... latticen.fst > foo.far 6

This Lecture Counting Model creation, shrinking, and conversion Class-based models 7

Counting Weights: use fstpush to remove initial weight and create a probabilistic automaton. counting from far files. counts produced in log semiring. Algorithm: applies to all probabilistic automata. In particular, no cycles with weight zero or less. 8

Counting Transducers b:ε/1 a:ε/1 b:ε/1 a:ε/1 x = ab bbabaabba 0 X:X/1 1/1 εεabεεεεε εεεεεabεε X is an automaton representing a string or any other regular expression. Alphabet Σ={a, b}. 9

Counting Program: ngramcount --order=2 corpus.far > corpus.2.counts.fst ngrammerge foo.counts.fst bar.counts.fst > foobar.counts.fst Graphical representation: <s>/4 2/0 hello/2 bye/2 hello/2 3/0 </s>/2 0/0 </s>/4 1/0 bye/3 </s>/2 4/0 bye/1 10

This Lecture Counting Model creation, shrinking, and conversion Class-based models 11

Creating Back-off Model Program: ngrammake corpus.2.counts.fst > corpus.2.lm.fst Graphical representation: bye/0.698 bye/1.108 1!/3.500 bye/1.098 4 </s>/0.410 hello/1.504 </s>/0.810 </s>/0.005 0/0 3!/4.481 hello/0.698!/4.704 2 12

Shrinking Back-off Model Program: ngramshrink --method=relative_entropy --theta=0.2 foo. 2.lm.fst > foo.2.s.lm.fst Graphical representation: hello/0.698 1 </s>/0.005 bye/1.098!/4.704 hello/1.504 3!/4.481 </s>/0.810 0/0 2 bye/0.698 13

Back-off Smoothing Definition: for a bigram model, Pr[w i w i 1 ]= d c(wi 1 w i )c(w i 1 w i ) c(w i 1 ) α Pr[w i ] if k>0; otherwise; where d k = 1 if k>5; k+1 kn k otherwise. 14

Merging/Interpolation Program: ngrammerge --normalize --alpha=2 --beta=3 a.lm.fst b.lm.fst > merged.fst Resulting language models are mixed at relative importance corresponding to --alpha and --beta, normalizing the output LM to be a probability distribution 15

This Lecture Counting Model creation, shrinking, and conversion Class-based models 16

Class-Based Models Simple class-based models: Pr[w i h] =Pr[w i C i ]Pr[C i h]. Methods in GRM: no special utility needed. create transducer mapping strings to classes. use fstcompose to map from word corpus to classes. build and make model over classes. use fstcompose to map from classes to words. Generality: classes defined by weighted automata. 17

Class-Based Model - Example Example: BYE = {bye, bye bye}. Graphical representation: mapping from strings to classes. hello:hello/0 0/0 bye:bye/0.693 hello:hello/0 bye:!/0 1/0 18

Class-Based Model - Counts <s>/4 2/0 hello/2 bye/2 <s>/4 2/0 hello/2 BYE/2 0/0 hello/2 3/0 </s>/4 </s>/2 1/0 0/0 hello/2 3/0 </s>/4 </s>/2 1/0 bye/3 </s>/2 BYE/2 </s>/2 4/0 bye/1 4/0 Original counts. Class-based counts. 19

Models bye/0.698 bye/1.108 1!/3.500 bye/1.098 4 </s>/0.410 hello/1.504 </s>/0.810 </s>/0.005 0/0 3!/4.481 hello/0.698!/4.704 2 original model. 3 BYE/0.698 1!/4.605 BYE/1.386!/4.605 hello/0.698 4 </s>/0.005 hello/1.386!/4.605 </s>/0.693 2 </s>/0.005 0/0 class-based model. 20

Final Class-Based Model bye/1.391!/4.605 3 </s>/0.005 bye/2.079 bye/0!/4.605 0 1 </s>/0.693!/4.605 </s>/0.005 4/0 hello/0.698!/4.605 5 </s>/0.005 hello/1.386 2 21

References Cyril Allauzen, Mehryar Mohri, and Brian Roark. Generalized Algorithms for Constructing Statistical Language Models. In 41st Meeting of the Association for Computational Linguistics (ACL 2003), Proceedings of the Conference, Sapporo, Japan. July 2003. Cyril Allauzen, Mehryar Mohri, and Brian Roark. The Design Principles and Algorithms of a Weighted Grammar Library. International Journal of Foundations of Computer Science, 16(3): 403-421, 2005. Peter F. Brown, Vincent J. Della Pietra, Peter V. desouza, Jennifer C. Lai, and Robert L. Mercer. 1992. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467-479. Stanley Chen and Joshua Goodman. An empirical study of smoothing techniques for language modeling. Technical Report, TR-10-98, Harvard University. 1998. William Gale and Kenneth W. Church. What s wrong with adding one? In N. Oostdijk and P. de Hann, editors, Corpus-Based Research into Language. Rodolpi, Amsterdam. Good, I. The population frequencies of species and the estimation of population parameters, Biometrica, 40, 237-264, 1953. 22

References Frederick Jelinek and Robert L. Mercer. 1980. Interpolated estimation of Markov source parameters from sparse data. In Proceedings of the Workshop on Pattern Recognition in Practice, s 381-397. Slava Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer, IEEE Transactions on Acoustics, Speech and Signal Processing, 35, 400-401, 1987. Reinhard Kneser and Hermann Ney. Improved backing-off for m-gram language modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, volume 1, s 181-184, 1995. David A. McAllester, Robert E. Schapire: On the Convergence Rate of Good-Turing Estimators. Proceedings of Conference on Learning Theory (COLT) 2000: 1-6. Mehryar Mohri. Weighted Grammar Tools: the GRM Library. In Robustness in Language and Speech Technology. s 165-186. Kluwer Academic Publishers, The Netherlands, 2001. Hermann Ney, Ute Essen, and Reinhard Kneser. 1994. On structuring probabilistic dependences in stochastic language modeling. Computer Speech and Language, 8:1-38. 23

References Brian Roark, Richard Sproat, Cyril Allauzen, Michael Riley, Jeffrey Sorensen, Terry Tai. The OpenGrm Open-Source Finite-State Grammar Software Libraries, ACL (System Demonstrations) (2012), pp. 61-66 Terry Tai, Wojciech Skut and Richard Sproat. Thrax: An Open Source Grammar Compiler Built on OpenFst. ASRU 2011. Kristie Seymore and Ronald Rosenfeld. Scalable backoff language models. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), 1996. Andreas Stolcke. 1998. Entropy-based pruning of back-off language models. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, s 270-274. Ian H. Witten and Timothy C. Bell. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression, IEEE Transactions on Information Theory, 37(4):1085-1094, 1991. 24