Speech Recognition for Dialects & Spoken Tutorials

Similar documents
Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems

Speech Recognition at ICSI: Broadcast News and beyond

Vowel mispronunciation detection using DNN acoustic models with cross-lingual training

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

Learning Methods in Multilingual Speech Recognition

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Arabic Orthography vs. Arabic OCR

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Cross Language Information Retrieval

English Language and Applied Linguistics. Module Descriptions 2017/18

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

ScienceDirect. Malayalam question answering system

Named Entity Recognition: A Survey for the Indian Languages

Teaching ideas. AS and A-level English Language Spark their imaginations this year

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

A study of speaker adaptation for DNN-based speech synthesis

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

Speech Emotion Recognition Using Support Vector Machine

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

Calibration of Confidence Measures in Speech Recognition

Speech Translation for Triage of Emergency Phonecalls in Minority Languages

Progressive Aspect in Nigerian English

Using dialogue context to improve parsing performance in dialogue systems

Deep Neural Network Language Models

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

A Review: Speech Recognition with Deep Learning Methods

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

Journal of Phonetics

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Florida Reading Endorsement Alignment Matrix Competency 1

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Modeling full form lexica for Arabic

Automatic Pronunciation Checker

Indian Institute of Technology, Kanpur

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

Dialog Act Classification Using N-Gram Algorithms

SPEECH RECOGNITION CHALLENGE IN THE WILD: ARABIC MGB-3

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

CS 598 Natural Language Processing

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

BYLINE [Heng Ji, Computer Science Department, New York University,

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Improved Hindi Broadcast ASR by Adapting the Language Model and Pronunciation Model Using A Priori Syntactic and Morphophonemic Knowledge

Characterizing and Processing Robot-Directed Speech

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

DNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Miscommunication and error handling

The Acquisition of English Intonation by Native Greek Speakers

On the Formation of Phoneme Categories in DNN Acoustic Models

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Language Development: The Components of Language. How Children Develop. Chapter 6

arxiv: v1 [cs.lg] 7 Apr 2015

Eyebrows in French talk-in-interaction

Edinburgh Research Explorer

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

K.L.N. COLLEGE OF ENGINEERING, POTTAPALAYAM. Department of Computer Science and Engineering. Academic Year:

Python Machine Learning

Multi-modal Sensing and Analysis of Poster Conversations toward Smart Posterboard

Platform for the Development of Accessible Vocational Training

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

An Evaluation of POS Taggers for the CHILDES Corpus

Investigation of Indian English Speech Recognition using CMU Sphinx

THE world surrounding us involves multiple modalities

Using NVivo to Organize Literature Reviews J.J. Roth April 20, Goals of Literature Reviews

PHONETIC DISTANCE BASED ACCENT CLASSIFIER TO IDENTIFY PRONUNCIATION VARIANTS AND OOV WORDS

Metadata of the chapter that will be visualized in SpringerLink

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Instructional Approach(s): The teacher should introduce the essential question and the standard that aligns to the essential question

Degeneracy results in canalisation of language structure: A computational model of word learning

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore, India

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

Automatic Assessment of Spoken Modern Standard Arabic

Linking Task: Identifying authors and book titles in verbose queries

Universal contrastive analysis as a learning principle in CAPT

Age Effects on Syntactic Control in. Second Language Learning

Mandarin Lexical Tone Recognition: The Gating Paradigm

Transcription:

Speech Recognition for Dialects & Spoken Tutorials M.Tech. 1 Seminar Topics Preethi Jyothi Department of CSE, IIT Bombay

Automatic Speech Recognition Automatic Speech Recognition (ASR) is one of the oldest (early 1900s) and most complex sequence prediction tasks Modern ASR systems are dominated by statistical methods pioneered by [Jelenik 76] Noisy channel model: Given an input speech utterance, what is the most likely text sequence? Current state-of-the-art in ASR involves a complex pipeline with many machine learning components

Languages in the Indian subcontinent 30 Indian languages spoken by >1M native speakers Hindi and Bengali among the world s most populous languages Despite this, Indian languages (barring Hindi) considered to be low-resource for ASR PIC SOURCE: http://titus.fkidg1.uni-frankfurt.de/didact/karten/indi/indicm.htm

Challenges Rich language diversity (more than 150 languages and more than 1500 dialects!) Morphological Complexity Dravidian languages pose extra challenge, being agglutinative Lack of standard lexicons/morphological analysers Syntactic Complexity E.g. free word order Limited prior work Lack of diversity in ASR tasks Lack of annotated corpora in many Indian languages

Seminar Topics Speech recognition of Indian dialects Topic 1: Acoustic model adaptation using dialectal speech Topic 2: Discriminative pronunciation and language modelling for dialectal speech Automatic transcription of spoken tutorials in Indian languages Topic 3: Leveraging side information for automatic transcription of spoken tutorials

Seminar Topics Speech recognition of Indian dialects Topic 1: Acoustic model adaptation using dialectal speech Topic 2: Discriminative pronunciation and language modelling for dialectal speech Automatic transcription of spoken tutorials in Indian languages Topic 3: Leveraging side information for automatic transcription of spoken tutorials

Acoustic model adaptation using dialect speech To handle dialects, either build a) an ensemble of dialectspecific recognizers or b) a common language-specific recognizer. E.g.: Strategy adopted by Google VoiceSearch: Route spoken query to a specific dialectal recognizer based on location information. Potential for large improvements in current strategies and this is a largely unexplored area for dialects of Indian languages. Reading: Papers on acoustic model adaptation using both Hidden Markov Models and Deep Neural Network based systems.

Seminar Topics Speech recognition of Indian dialects Topic 1: Acoustic model adaptation using dialectal speech Topic 2: Discriminative pronunciation and language modelling for dialectal speech Automatic transcription of spoken tutorials in Indian languages Topic 3: Leveraging side information for automatic transcription of spoken tutorials

Pronunciation/language modelling of dialectal speech To extend an ASR system to a new dialect, pronunciation/ language models are enhanced by adding pronunciation variants/new words. Can we automatically learn phonological rules governing pronunciation differences in dialect speech (compared to the standard dialect)? How to devise good discriminative models to learn weights for these rules? How about for language models? Reading: Papers on discriminative pronunciation and language modelling.

Seminar Topics Speech recognition of Indian dialects Topic 1: Acoustic model adaptation using dialectal speech Topic 2: Discriminative pronunciation and language modelling for dialectal speech Automatic transcription of spoken tutorials in Indian languages Topic 3: Leveraging side information for automatic transcription of spoken tutorials

Transcribing Lectures using S(l)ide Information Transcribing lectures and leveraging information on slides to build contextual language models Will be using data from spokentutorial.org Produce sub-titles. Could we add sentence markers? Will require an augmented language model and detecting informative cues in the speech signal. Reading: Papers on language modelling and prosodic analysis of speech.

Interested? Requires a strong grasp of probability and statistics. Coding component all seminar topics will require: Building an ASR system for an Indian language of your choice, using the open-source ASR toolkit, Kaldi. Subsequent MTPs: Develop new techniques addressing your research problem and incorporate them into the above ASR system