Neural Network Language Models

Similar documents
Deep Neural Network Language Models

arxiv: v1 [cs.cl] 27 Apr 2016

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

A study of speaker adaptation for DNN-based speech synthesis

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Calibration of Confidence Measures in Speech Recognition

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Speech Recognition at ICSI: Broadcast News and beyond

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Lecture 1: Machine Learning Basics

Learning Methods for Fuzzy Systems

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Dropout improves Recurrent Neural Networks for Handwriting Recognition

arxiv: v1 [cs.lg] 7 Apr 2015

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Knowledge Transfer in Deep Convolutional Neural Nets

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

arxiv: v2 [cs.ir] 22 Aug 2016

Improvements to the Pruning Behavior of DNN Acoustic Models

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

DNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS

Investigation on Mandarin Broadcast News Speech Recognition

On the Formation of Phoneme Categories in DNN Acoustic Models

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Learning Methods in Multilingual Speech Recognition

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

A Neural Network GUI Tested on Text-To-Phoneme Mapping

The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation

Switchboard Language Model Improvement with Conversational Data from Gigaword

Artificial Neural Networks written examination

Python Machine Learning

LOW-RANK AND SPARSE SOFT TARGETS TO LEARN BETTER DNN ACOUSTIC MODELS

COPING WITH LANGUAGE DATA SPARSITY: SEMANTIC HEAD MAPPING OF COMPOUND WORDS

Second Exam: Natural Language Parsing with Neural Networks

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Evolutive Neural Net Fuzzy Filtering: Basic Description

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

WHEN THERE IS A mismatch between the acoustic

Speech Emotion Recognition Using Support Vector Machine

Axiom 2013 Team Description Paper

Human Emotion Recognition From Speech

CSL465/603 - Machine Learning

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

A Review: Speech Recognition with Deep Learning Methods

Vowel mispronunciation detection using DNN acoustic models with cross-lingual training

The 2014 KIT IWSLT Speech-to-Text Systems for English, German and Italian

Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore, India

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Probabilistic Latent Semantic Analysis

Soft Computing based Learning for Cognitive Radio

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

(Sub)Gradient Descent

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

arxiv: v1 [cs.lg] 15 Jun 2015

International Journal of Advanced Networking Applications (IJANA) ISSN No. :

Lecture 9: Speech Recognition

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

SPEECH RECOGNITION CHALLENGE IN THE WILD: ARABIC MGB-3

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

Human-like Natural Language Generation Using Monte Carlo Tree Search

Classification Using ANN: A Review

Residual Stacking of RNNs for Neural Machine Translation

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Test Effort Estimation Using Neural Network

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

arxiv: v1 [cs.lg] 20 Mar 2017

INPE São José dos Campos

arxiv: v2 [cs.cv] 30 Mar 2017

Kamaldeep Kaur University School of Information Technology GGS Indraprastha University Delhi

Multi-View Features in a DNN-CRF Model for Improved Sentence Unit Detection on English Broadcast News

An Online Handwriting Recognition System For Turkish

Model Ensemble for Click Prediction in Bing Search Ads

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Device Independence and Extensibility in Gesture Recognition

arxiv: v4 [cs.cl] 28 Mar 2016

arxiv: v1 [cs.cv] 10 May 2017

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Syntactic systematicity in sentence processing with a recurrent self-organizing network

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Transcription:

Neural Network Language Models Steve Renals Automatic Speech Recognition ASR Lecture 12 6 March 2014 ASR Lecture 12 Neural Network Language Models 1

Neural networks for speech recognition Introduction to Neural Networks Training feed-forward networks Hybrid neural network / HM M acoustic models Neural network features Tandem, posteriorgrams Deep neural network acoustic models Neural network language models ASR Lecture 12 Neural Network Language Models 2

Neural networks for speech recognition Introduction to Neural Networks Training feed-forward networks Hybrid neural network / HMM acoustic models Neural network features Tandem, posteriorgrams Deep neural network acoustic models Neural network language models ASR Lecture 12 Neural Network Language Models 2

n-gram language modelling The problem: estimate the probability of a sequence of T words, P(w 1, w 2,..., w T ) = P(w T 1 ) Decompose as conditional probabilities P(w T 1 ) = T t=1 P(w t w t 1 1 ) n-gram approximation: only consider (n 1) words of context: P(w t w t 1 1 ) P(w t w t 1 t (n 1) ) Many possible word sequences consider vocab size V = 100 000 with a 4-gram 100 000 4 possible 4-grams, i.e. 10 20 parameters Most n-grams not in training data zero-probability problem Smooth n-gram model with models with smaller context size (interpolation) State of the art modified Kneser-Ney smoothing ASR Lecture 12 Neural Network Language Models 3

Problems with n-grams 1 Curse of dimensionality model size (number of parameters) increases exponentially with context size 2 Probability estimation in a high-dimensional discrete smooth not smooth, small changes in discrete context may result in large changes in probability estimate 3 Does not take word similarity into account ASR Lecture 12 Neural Network Language Models 4

Distributed representation for language modelling Each word is associated with a learned distributed representation (feature vector) Use a neural network to estimate the conditional probability of the next word given the the distributed representations of the context words Learn the distributed representations and the weights of the conditional probability estimate jointly by maximising the log likelihood of the training data Similar words (distributionally) will have similar feature vectors small change in feature vector will result in small change in probability estimate (since the NN is a smooth function) ASR Lecture 12 Neural Network Language Models 5

Neural Probabilistic Language Model Bengio et al (2006) ASR Lecture 12 Neural Network Language Models 6

Neural Probabilistic Language Model Train using stochastic gradient ascent to maximise log likelihood Number of free parameters (weights) scales Linearly with vocabulary size Linearly with context size Can be (linearly) interpolated with n-gram model Perplexity results on AP News (14M words training). V = 18k model n perplexity NPLM(100,60) 6 109 n-gram (KN) 3 127 n-gram (KN) 4 119 n-gram (KN) 5 117 ASR Lecture 12 Neural Network Language Models 7

NPLM Shortlists Majority of the weights (hence majority of the computation) is in the output layer Reduce computation by only including the s most frequent words at the output the shortlist (S) (full vocabulary still used for context) Use an n-gram model to estimate probabilities of words not in the shortlist Neural network thus redistributes probability for the words in the shortlist P S (h t ) = w S P(w h t ) { PNN (w P(w t h t ) = t h t )P S (h t ) ifw t S P KN (w t h t ) else In a V = 50k task a 1024 word shortlist covers 89% of 4-grams, 4096 words covers 97% ASR Lecture 12 Neural Network Language Models 8

NPLM ASR results Speech recognition results on Switchboard 7M / 12M / 27M words in domain data. 500M words background data (broadcast news) Vocab size V = 51k, Shortlist size S = 12k WER/% in-domain words 7M 12M 27M KN (in-domain) 25.3 23.0 20.0 NN (in-domain) 24.5 22.2 19.1 KN (+b/g) 24.1 22.3 19.3 NN (+b/g) 23.7 21.8 18.9 ASR Lecture 12 Neural Network Language Models 9

Recurrent Neural Network (RNN) LM Rather than fixed input context, recurrently connected hidden units provide memory Model learns how to remember from the data Recurrent hidden layer allows clustering of variable length histories ASR Lecture 12 Neural Network Language Models 10

}@fit.vutbr.cz, khudanpur@jhu.edu RNN LM Fig. 1. Simple recurrent neural network. Mikolov (2011) ASR Lecture 12 Neural Network Language Models 11

raining RNN training: of RNNLM back-propagation - Backpropagation through time Through Time ASR Lecture 12 Neural Network Language Models 12

Factorised RNN LM ). e s. e. n - Fig. 4. RNN with output layer factorized by class layer. ASR Lecture 12 Neural Network Language Models 13

Reading Y Bengio et al (2006), Neural probabilistic language models (sections 6.1, 6.2, 6.3, 6.7, 6.8), Studies in Fuzziness and Soft Computing Volume 194, Springer, chapter 6. T Mikolov et al (2011), Extensions of recurrent neural network language model, Proc IEEE ICASSP 2011 ASR Lecture 12 Neural Network Language Models 14