Hidden Markov Models (HMMs) - 1. Hidden Markov Models (HMMs) Part 1

Similar documents
AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Speech Recognition at ICSI: Broadcast News and beyond

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Learning Methods in Multilingual Speech Recognition

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

Lecture 9: Speech Recognition

An Online Handwriting Recognition System For Turkish

Large vocabulary off-line handwriting recognition: A survey

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Automatic Pronunciation Checker

Lecture 1: Machine Learning Basics

Human Emotion Recognition From Speech

Speech Recognition by Indexing and Sequencing

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Introduction to Simulation

Generative models and adversarial training

Speech Emotion Recognition Using Support Vector Machine

International Journal of Advanced Networking Applications (IJANA) ISSN No. :

Lecture 10: Reinforcement Learning

Word Segmentation of Off-line Handwritten Documents

A study of speaker adaptation for DNN-based speech synthesis

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

INPE São José dos Campos

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Lecture 1: Basic Concepts of Machine Learning

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

A Neural Network GUI Tested on Text-To-Phoneme Mapping

First Grade Curriculum Highlights: In alignment with the Common Core Standards

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

SARDNET: A Self-Organizing Feature Map for Sequences

Software Maintenance

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

On Developing Acoustic Models Using HTK. M.A. Spaans BSc.

Calibration of Confidence Measures in Speech Recognition

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Mandarin Lexical Tone Recognition: The Gating Paradigm

Speaker recognition using universal background model on YOHO database

WHEN THERE IS A mismatch between the acoustic

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore, India

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

On the Formation of Phoneme Categories in DNN Acoustic Models

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

Laboratorio di Intelligenza Artificiale e Robotica

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Evolutive Neural Net Fuzzy Filtering: Basic Description

Exploration. CS : Deep Reinforcement Learning Sergey Levine

A Context-Driven Use Case Creation Process for Specifying Automotive Driver Assistance Systems

CS Machine Learning

Level 6. Higher Education Funding Council for England (HEFCE) Fee for 2017/18 is 9,250*

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Disambiguation of Thai Personal Name from Online News Articles

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Python Machine Learning

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

B.S/M.A in Mathematics

Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing

Circuit Simulators: A Revolutionary E-Learning Platform

Large Kindergarten Centers Icons

School of Innovative Technologies and Engineering

Phonemic Awareness. Jennifer Gondek Instructional Specialist for Inclusive Education TST BOCES

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

On-the-Fly Customization of Automated Essay Scoring

Laboratorio di Intelligenza Artificiale e Robotica

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Reinforcement Learning by Comparing Immediate Reward

Natural Language Processing. George Konidaris

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

STA 225: Introductory Statistics (CT)

Practice Examination IREB

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Grade 6: Correlated to AGS Basic Math Skills

ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

The Evolution of Random Phenomena

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Axiom 2013 Team Description Paper

Corrective Feedback and Persistent Learning for Information Extraction

Improvements to the Pruning Behavior of DNN Acoustic Models

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

Transcription:

Hidden Markov Models (HMMs) - 1 Hidden Markov Models (HMMs) Part 1 May 21, 2013

Hidden Markov Models (HMMs) - 2 References Lawrence R. Rabiner: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, vol. 77, no 2, February 1989 X. Wang, A. Acero, H-W. Hon: Spoken Language Processing, Chapter 8, pp 374-409, Prentice Hall, 2001 Tapas Kanungo, University Maryland, HMM Tutorial Slides (some of his slides have been reused here)

Hidden Markov Models (HMMs) - 3 Outline Motivation: Problems with Pattern Matching Markov Models Hidden Markov Models Introduction, some properties, topologies Three Main Problems of HMMs and algorithmic solutions: The Evaluation Problem: Forward Algorithm The Decoding Problem: The Learning Problem: Viterbi Algorithm Forward-Backward Algorithm Hidden Markov Models in Speech Recognition Overview of Hidden Markov Models Training Using (Hand-)Labeled Data K-Means Training HMMs with Viterbi Components of an HMM Recognizer Part 1 Part 2

Hidden Markov Models (HMMs) - 4 What we have seen so far Signal preprocessing, feature extraction We model phonemes. However, we want to recognize whole words and sentences. In this lecture: Classification of phoneme sequences Problem: We can classify each single phoneme but not every sequence of recognized phonemes makes sense Furthermore, we want to use a-priori information for the probability of phonemes and words

Hidden Markov Models (HMMs) - 5 Reference pattern Dynamic Time Warping (DTW) Goal: We want to find a distance between two utterances The lower, the better Problem: We need to consider all paths and find the best! Solution: For each time t, calculate the cumulative distances (s,t), which describe the distance of the partial utterances up to the states q(s,t) (s=1,...,s). The distances for time t +1 are calculated from those of time t. At this point, the minimization of the distance is applied. Requires a distance measure d(s,t) for the observed frame t and the reference frame s (high d(s,t) means large distance) e.g. Euclidean distance state q(s,t) Referenzmuster r s Input Eingabemuster pattern o t

Hidden Markov Models (HMMs) - 7 Dynamic Time Warping Application We can use the DTW to recognize whole words: Compute the DTW distance for each possible reference pattern The word with the smallest distance is considered to be recognized Is still applied in practice, for very small vocabularies What are the problems?

Hidden Markov Models (HMMs) - 8 Problems with Pattern Matching The DTW algorithm can be used to differentiate a small amount of words, but: Needs endpoint detection If split in smaller units: needs segmentation into these units High computational effort (esp. for large vocabularies), proportional to vocabulary size Large vocabulary also means: need huge amount of training data Collection of lots of reference patterns (inconvenient for user) Difficult to train suitable references (or sets of references) Poor performance when the environment changes Works only well for speaker-dependent recognition (variations) Unsuitable Where speaker is unknown, no training is feasible Continuous speech (comb. explosion of patterns, coarticulation) Impossible to recognize untrained words Difficult to train/recognize subword units We need a different method that allows to train and recognize smaller units (syllables, phonemes)

Hidden Markov Models (HMMs) - 11 Make a Wish We would like to work with speech units shorter than words each subword unit occurs often, training is easier, less data We want to recognize speech from any speaker, without prior training store "speaker-independent" reference (examples from many speakers) We want to recognize continuous rather than isolated speech handle coarticulation effects, handle sequences of words We want to recognize words that did not occur in the training set train subword units and compose any word out of these (vocabulary independence) We would prefer a solid mathematical foundation

Hidden Markov Models (HMMs) - 12 Speech Production seen as Stochastic Process The same word / phoneme sounds different every time it is uttered Regard words / phonemes as states of a speech production process In a given state we can observe different acoustic sounds Not all sounds are possible / likely in every state We say: In a given state the speech process "emits" sounds according to some probability distribution The production process makes transitions from one state to another Not all transitions are possible, they have different probabilities When we specify the probabilities for sound-emissions (emission probabilities) and for the state transitions, we call this a model.

Hidden Markov Models (HMMs) - 13 Speech Production seen as Stochastic Process Basic principle of our improved recognizer: The speech process is in a state at any time (we cannot observe the state directly) In each state certain sounds are emitted corresponding to a certain probability distribution. These probabilities are called emission probabilities. The transitions between the states also occur according to a certain probability distribution. These probabilities are called transition probabilities. states observations phonemes (sound units, min. 30ms) frames of the acoustic signal (each 10 ms)

Hidden Markov Models (HMMs) - 14 Reference in terms of state sequence of statistical models, models consists of prototypical references vectors What s different? Hypothesis = recognized sentence

Hidden Markov Models (HMMs) - 15 Markov Models (1) k i j t-2 t-1 t

Hidden Markov Models (HMMs) - 16 Markov Models (2) i j t-1 t

Hidden Markov Models (HMMs) - 17 Markov Models - Example 0.4 R 0.2 0.3 0.3 0.1 0.6 C 0.2 0.1 S 0.8

Hidden Markov Models (HMMs) - 18 Markov Models - Example = P(B A)P(A)

Hidden Markov Models (HMMs) - 19 Markov Models - Example 0.4 R 0.2 0.3 0.3 0.1 0.6 C 0.2 0.1 S 0.8 Today is S P(S)=1

Hidden Markov Models (HMMs) - 20 Markov Models and Speech Recognition What differs the process of speech production from weather modeling (as shown on the previous slides)? For weather modeling, we compute the probability of a direct and exactly observable state sequence (either it is sunny or not) Consequently, a state and its observation are exactly the same However, in speech we have a continuum of possible tokens (typically frames of the speech signal whose distribution follows Gaussians) which should be assigned to a limited number of states (phonemes) Each phoneme can (theoretically) be realized in infinite ways (but with different probability) Also the boundaries of phonemes can not be defined exactly There is no 1-1 relation between the phonemes uttered by a speaker and its observable acoustics In speech, the states are hidden Observations are indirectly possible via sound emissions These observations are also probabilistic!

Hidden Markov Models (HMMs) - 21 Hidden Markov Models Consequently, we need an extension of the Markov Models. We can solve our problems with Hidden Markov Models (HMMs). What are Hidden Markov Models? An HMM is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states The state sequence is probablistic. We say: Each state emits an observation (a frame of the speech signal): These emissions are also probabilistic. Observations are probabilistic functions of states. The state sequences are hidden. The states are not observable. HMMs are Markov models The probabilities to enter a next state depend only on the current state. State transitions are still probabilistic.

Hidden Markov Models (HMMs) - 22 Hidden Markov Models The fact that state sequence is not observable has some consequences: Decoding with HMMs Based on the observations we have to draw conclusions about a possible state sequence Thereby we will never find an exact solution, only one with the highest probability. Training of HMMs A related problem is the training of an HMM, where we know the traversed state sequence but not the time of the state transitions. But these properties model the process of speech production/recognition well!

Hidden Markov Models (HMMs) - 23 Example for HMMs The Urn and Ball Model n urns containing colored balls v distinct colors Each urn has a (possibly) different distribution of colors 3 1 2 Observation sequence generation algorithm: 1. (Behind the curtain) Pick initial urn according to some random process. 2. (Behind the curtain) Randomly pick ball from the urn. 3. Show it to the audience and put it back. 4. (Behind the curtain) Select another urn according to random selection process associated with the urn. 5. Repeat step 2 and 3.

Hidden Markov Models (HMMs) - 24 Example for HMMs The Urn and Ball Model Why is this an HMM? Current urn: Not observable state Current ball / the sequence of balls: Observation sequence Distribution of balls in each urn: Emission probabilities Jump from urn to urn: Transition probabilities 0.6 0.8 0.2 0.4 R 3 0.3 0.3 0.1 1 0.2 2 C S 0.2 0.0 0.1 0.8 Generating an observation sequence The term "hidden" refers to seeing observations and drawing conclusions without knowing the hidden sequence of states (urns)

Hidden Markov Models (HMMs) - 25 Formal Definition of Hidden Markov Models A Hidden Markov Model =(A,B, ) is a five-tuple consisting of: S The set of states S={s 1,s 2,...,s n } n is the number of states The initial probability distribution, (s i ) = P(q 1 = s i ) probability of s i being the first state of a sequence A B V The matrix of state transition probabilities: 1 i, j n A=(a ij ) with a ij = P(q t+1 = s j q t = s i ) going from state s i to s j The set of emission probability distributions/densities, B={b 1,b 2,...,b n } where b i (x)=p(o t = x q t = s i ) is the probability of observing x when the system is in state s i Set of symbols -- v is the number of distinct symbols The observable feature space can be discrete: V={x 1,x 2,...,x v }, or continuous V=R d

Hidden Markov Models (HMMs) - 26 Some Properties of Hidden Markov Models For the initial probabilities we have: i (s i ) = 1 Often things are simplified by (s 1 ) = 1, and (s i>1 ) = 0 Obviously: j a ij = 1 for all i Often: a ij = 0 for most j except for a few states When V = {x 1,x 2,...,x v } then b i are discrete probability distributions, the HMMs are called discrete HMMs When V = R d then b i are continuous probability density functions, the HMMs are called continuous (density) HMMs In ASR, we mostly use continuous HMMs. Often the emission probabilities are given by Gaussians. Basically, each classifier which provides probabilities or densities can be combined with an HMM. For simplicity, most upcoming examples show discrete HMMs.

Hidden Markov Models (HMMs) - 27 Some HMM Terminology The most ambiguously used term is the "model", which can be one of: A Hidden Markov Model = the defined five-tuple The model of a state = the combination of HMM parameters that describe the properties of an HMM state (different states can have the same model) The acoustic model = combination of all parameters of recognizer describing all acoustic features (e.g. the parameters of the Gaussians in the continuous case) An (acoustic) model = combination of the parameters that describe acoustic features of a specific unit of speech (e.g. of a sub-phonemes) The language model = combination of all parameters describing probabilities of word sequences

Hidden Markov Models (HMMs) - 28 The Trellis

Hidden Markov Models (HMMs) - 29 Some Typical HMM-Topologies Linear model: Bakis model: every state has transition to self or successor or successor of successor Left-to-right model: Alternative paths: Ergodic model: every state has transitions to every other state

Hidden Markov Models (HMMs) - 30 Some Examples for HMM (-Topologies) Applications: Simulation and analysis of complex stochastic systems (weather, traffic, queues); recognition of dynamic patterns (speech, handwriting, video).

Hidden Markov Models (HMMs) - 31 Typical Questions A magician draws balls from urns behind the curtain, the audience sees the observations sequence O=(o 1,o 2,...,o T ) Your friend told you about two sets of urns and drawing patterns = models 1 =(A,B, ) 2 =(A,B, ) the magician usually uses 0.6 0.2 0.4 R 3 0.3 0.3 1 0.2 2 C S 0.1 0.1 0.8 Assume you have an efficient algorithm to compute P(O ) 1. Compute P(O ) for both models, which of the models 1 or 2 was more likely to be used by the magician 2. Given one model, find the optimal aka most likely state sequence that would produce the observation 3. Find a new model such that P(O ) > P(O )

Hidden Markov Models (HMMs) - 32 Thanks for your interest!