Providing Sublexical Constraints for Word Spotting within the ANGIE Framework

Similar documents
On the Formation of Phoneme Categories in DNN Acoustic Models

Lecture 9: Speech Recognition

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Characterizing and Processing Robot-Directed Speech

Learning Methods in Multilingual Speech Recognition

Speech Recognition at ICSI: Broadcast News and beyond

Stages of Literacy Ros Lugg

English Language and Applied Linguistics. Module Descriptions 2017/18

Modeling function word errors in DNN-HMM based LVCSR systems

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

Modeling function word errors in DNN-HMM based LVCSR systems

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Calibration of Confidence Measures in Speech Recognition

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

CS 598 Natural Language Processing

Generating Test Cases From Use Cases

Investigation on Mandarin Broadcast News Speech Recognition

Mandarin Lexical Tone Recognition: The Gating Paradigm

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Large Kindergarten Centers Icons

TEKS Comments Louisiana GLE

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Phonological Processing for Urdu Text to Speech System

Effect of Word Complexity on L2 Vocabulary Learning

LING 329 : MORPHOLOGY

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Parsing of part-of-speech tagged Assamese Texts

M55205-Mastering Microsoft Project 2016

Small-Vocabulary Speech Recognition for Resource- Scarce Languages

Universal contrastive analysis as a learning principle in CAPT

Lower and Upper Secondary

Large vocabulary off-line handwriting recognition: A survey

Character Stream Parsing of Mixed-lingual Text

A Bayesian Model of Stress Assignment in Reading

DIBELS Next BENCHMARK ASSESSMENTS

Phonological encoding in speech production

Basic concepts: words and morphemes. LING 481 Winter 2011

L1 and L2 acquisition. Holger Diessel

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

The IRISA Text-To-Speech System for the Blizzard Challenge 2017

Bigrams in registers, domains, and varieties: a bigram gravity approach to the homogeneity of corpora

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Seminar - Organic Computing

Designing a Speech Corpus for Instance-based Spoken Language Generation

SLINGERLAND: A Multisensory Structured Language Instructional Approach

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Florida Reading Endorsement Alignment Matrix Competency 1

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Journal of Phonetics

The ABCs of O-G. Materials Catalog. Skills Workbook. Lesson Plans for Teaching The Orton-Gillingham Approach in Reading and Spelling

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Three Strategies for Open Source Deployment: Substitution, Innovation, and Knowledge Reuse

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Speech Emotion Recognition Using Support Vector Machine

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM

Word Stress and Intonation: Introduction

Switchboard Language Model Improvement with Conversational Data from Gigaword

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

The phonological grammar is probabilistic: New evidence pitting abstract representation against analogy

Vowel mispronunciation detection using DNN acoustic models with cross-lingual training

"f TOPIC =T COMP COMP... OBJ

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

2,1 .,,, , %, ,,,,,,. . %., Butterworth,)?.(1989; Levelt, 1989; Levelt et al., 1991; Levelt, Roelofs & Meyer, 1999

Natural Language Processing. George Konidaris

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

Assessing speaking skills:. a workshop for teacher development. Ben Knight

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University

Distant Supervised Relation Extraction with Wikipedia and Freebase

The IDN Variant Issues Project: A Study of Issues Related to the Delegation of IDN Variant TLDs. 20 April 2011

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

An argument from speech pathology

Individual Differences & Item Effects: How to test them, & how to test them well

CELTA. Syllabus and Assessment Guidelines. Third Edition. University of Cambridge ESOL Examinations 1 Hills Road Cambridge CB1 2EU United Kingdom

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Stochastic Phonology Janet B. Pierrehumbert Department of Linguistics Northwestern University Evanston, IL Introduction

Deep Neural Network Language Models

Word Segmentation of Off-line Handwritten Documents

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Transcription:

Providing Sublexical Constraints for Word Spotting within the ANGIE Framework Raymond Lau and Stephanie Seneff { raylau, seneff }@sls.lcs.mit.edu http://www.sls.lcs.mit.edu Spoken Language Systems Group MIT Laboratory for Computer Science Cambridge, Massachusetts United States of America Copyright 1997, Spoken Language Systems. All rights reserved. Spoken Language Systems Group 1

Outline ANGIE Wordspotter Filler models Results Conclusions Spoken Language Systems Group 2

What is ANGIE? Flexible, multipurpose system for speech processing Framework introduced in Seneff, Lau & Meng (ICSLP 96) Word substructures characterized jointly by: Context free grammar Probabilistic model Possible applications include: Flexible/extensible speech recognition tasks Bidirectional letter/sound generation Prosodic modeling Benefits include: Pooling of data due to hierarchical structure Generalization of knowledge to new words Easy experimentation with subword representations Spoken Language Systems Group 3

Example Parse Tree SENTENCE WORD Morphology SROOT UROOT2 DSUF ISUF Syllabification NUCLAX+ CODA NUC DNUC UCODA PAST Phonemics ih+ n t er eh s t d*ed Phonetics 1 2 ih n -n axr ix s t ix dx interested Spoken Language Systems Group 4 Very regular layered structure Regular structure imposed by CF rules with lhs and rhs on adjacent layers Layers are sentence, word, morphology, syllabification, phonemics, phonetics No stress layer -- instead, distributed amongst layers Parsing proceeds left-to-right with each column built bottomup Last two layers capture phonological variation Context dependencies typical in phonology learned by probability model Probabilities: Terminal advancement Bottom up trigram

Current Task: Wordspotting Task: Wordspot 39 city names in ATIS Training 5000 utts, testing Dec 93 test set Similar task to Manos and Zue (ICASSP 97) Objectives: Explore effects of varying subword lexical model * Easy to do within the ANGIE framework Further establish empirically the feasibility of using ANGIE for speech recognition tasks Use as a natural foundation for building a full ANGIE speech recognizer Spoken Language Systems Group 5

Wordspotter Start with segment based graph as in MIT s SUMMIT Use mixture diagonal Gaussian acoustic models for context-independent phones: MFCC means averaged over thirds of segments MFCC derivatives across segment boundaries Perform left-to-right search of phone graph Partial ANGIE parses computed for partial theories * Well supported by ANGIE s left-to-right bottom-up parsing strategy Best ANGIE parse score used as linguistic score Spoken Language Systems Group 6

Search Strategy Previous work with ANGIE used best-first strategy Proved inadequate empirically for wordspotting Possible reason: difficulties in normalizing short vs. long theories for comparison Current strategy: Variant of stack decoder c.f., Jelinek (IEEE 76), Paul (ICASSP 91) Extend all paths at the earliest unexplored time boundary based on score Prune based on a maximum number of paths permitted at any boundary Spoken Language Systems Group 7

Filler Models ANGIE provides subword lexical model for the filler space Different ANGIE configurations give us a range of models Start with least constraint: phone bigram End with most constraint: full ANGIE layered model with 1200 word lexicon In all cases, no cross-word constraints (e.g., word n-gram) used Spoken Language Systems Group 8

Range of Filler Models Phones Only phone bigram used Pseudo-words (e.g., flid: f l ih dcl d) Invent possible pseudo-words bottom-up Syllables (e.g., ciscofran: s ih s kcl k uh f axr n) Syllable is highest unit Syllable ordering not enforced Morphs (e.g., conflighting: kcl k aa n f l ay tcl t iy ng) Syllables with ordering enforced Known words plus pseudo-words 1200 words plus allow invention of pseudo-words Known words only 1200 words Spoken Language Systems Group 9

Results Filler Model Figure of Merit Rel. Time Phone Bigram 85.3 - Pseudo-words 86.3 1.00 Syllables 87.7 0.56 Morphs 88.4 0.79 Words + Pseudo-words 88.6 0.79 Words 89.3 0.74 More constraint leads to higher FOM Speed increases with constraints Possible explanation: lower branchout Exception: Syllables very fast Word bigram gets 93.9 FOM Spoken Language Systems Group 10

Other Points Increasing subword lexical constraints on filler model improves performance Another example of full recognition is best Permitting pseudo-words in addition to known words did not help, even if vocabulary lowered to 400 words Integration of Chung s ANGIE-based duration model improves performance even more (up to 91.6 FOM) To be presented: W2C.3 (Wed, 12:30, Delphi) Spoken Language Systems Group 11 Test set coverage is 92% with vocab size of 400 and 86% with vocab size of 200 (where performance starts to swap relative ordering)

Future Work ANGIE is a workable framework for speech processing Especially for research in subword lexical modeling But also can leverage off of parse tree structure for acoustic modeling Natural next step is full speech recognition Easy to do dynamic vocabulary updates Other tasks Pronunciation server (integrates well with dynamic vocabulary recognizer) Spoken Language Systems Group 12