Speech Processing / Speech Processing Current Topics and Future challenges Commercial and Research

Similar documents
Speech Recognition at ICSI: Broadcast News and beyond

K5 Math Practice. Free Pilot Proposal Jan -Jun Boost Confidence Increase Scores Get Ahead. Studypad, Inc.

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Speak Up 2012 Grades 9 12

Innovation of communication technology to improve information transfer during handover

Appendix L: Online Testing Highlights and Script

ASSISTIVE COMMUNICATION

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Calibration of Confidence Measures in Speech Recognition

The Conversational User Interface

PRESENTED BY EDLY: FOR THE LOVE OF ABILITY

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Five Challenges for the Collaborative Classroom and How to Solve Them

Cleveland State University Introduction to University Life Course Syllabus Fall ASC 101 Section:

UDL AND LANGUAGE ARTS LESSON OVERVIEW

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Education the telstra BLuEPRint

Speech Translation for Triage of Emergency Phonecalls in Minority Languages

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

University of Florida ADV 3502, Section 1B21 Advertising Sales Fall 2017

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Lower and Upper Secondary

COMM370, Social Media Advertising Fall 2017

Critical Thinking in the Workplace. for City of Tallahassee Gabrielle K. Gabrielli, Ph.D.

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Busuu The Mobile App. Review by Musa Nushi & Homa Jenabzadeh, Introduction. 30 TESL Reporter 49 (2), pp

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

EUROPEAN DAY OF LANGUAGES

USER GUIDANCE. (2)Microphone & Headphone (to avoid howling).

Foothill College Fall 2014 Math My Way Math 230/235 MTWThF 10:00-11:50 (click on Math My Way tab) Math My Way Instructors:

Longman English Interactive

COMMUNICATIVE LANGUAGE TEACHING

CMST 2060 Public Speaking

Speech Emotion Recognition Using Support Vector Machine

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Arabic Orthography vs. Arabic OCR

Forget catastrophic forgetting: AI that learns after deployment

General Physics I Class Syllabus

A study of speaker adaptation for DNN-based speech synthesis

Star Math Pretest Instructions

SOFTWARE EVALUATION TOOL

Public Speaking Rubric

DMA 346 Digital Media Production Workshop

FIS Learning Management System Activities

University of Toronto Physics Practicals. University of Toronto Physics Practicals. University of Toronto Physics Practicals

Learning Methods in Multilingual Speech Recognition

School Year 2017/18. DDS MySped Application SPECIAL EDUCATION. Training Guide

Visual Journalism J3220 Syllabus

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

SYLLABUS- ACCOUNTING 5250: Advanced Auditing (SPRING 2017)

Platform for the Development of Accessible Vocational Training

CUSTOM ELEARNING SOLUTIONS THAT ADD VALUE TO YOUR LEARNING BUSINESS

A Review: Speech Recognition with Deep Learning Methods

ONLINE COURSES. Flexibility to Meet Middle and High School Students at Their Point of Need

NCAA Eligibility Center High School Portal Instructions. Course Module

Monticello Community School District K 12th Grade. Spanish Standards and Benchmarks

Small-Vocabulary Speech Recognition for Resource- Scarce Languages

Staff Briefing WHY IS IT IMPORTANT FOR STAFF TO PROMOTE THE NSS? WHO IS ELIGIBLE TO COMPLETE THE NSS? WHICH STUDENTS SHOULD I COMMUNICATE WITH?

SIE: Speech Enabled Interface for E-Learning

Modeling function word errors in DNN-HMM based LVCSR systems

Laboratorio di Intelligenza Artificiale e Robotica

Enter the World of Polling, Survey &

CALCULUS III MATH

COURSE SYLLABUS AND POLICIES

Economics 201 Principles of Microeconomics Fall 2010 MWF 10:00 10:50am 160 Bryan Building

Top US Tech Talent for the Top China Tech Company

Quick Reference for itslearning

SLINGERLAND: A Multisensory Structured Language Instructional Approach

Webinar How to Aid Transition by Digitizing Note-Taking Support

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Client Psychology and Motivation for Personal Trainers

Android App Development for Beginners

MULTIMEDIA Motion Graphics for Multimedia

New Paths to Learning with Chromebooks

Kelli Allen. Vicki Nieter. Jeanna Scheve. Foreword by Gregory J. Kaiser

Teaching ideas. AS and A-level English Language Spark their imaginations this year

Rolling Grades to Academic History. Banner 8 User Guide. March Rolling Grades to Academic History. Page 1 of 5

Text-to-Speech Application in Audio CASI

Parent s Guide to the Student/Parent Portal

Intensive English Program Southwest College

Modeling function word errors in DNN-HMM based LVCSR systems

GIS 5049: GIS for Non Majors Department of Environmental Science, Policy and Geography University of South Florida St. Petersburg Spring 2011

INTERMEDIATE ALGEBRA Course Syllabus

Administrative Services Manager Information Guide

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

EXAMPLES OF SPEAKING PERFORMANCES AT CEF LEVELS A2 TO C2. (Taken from Cambridge ESOL s Main Suite exams)

Read&Write Gold is a software application and can be downloaded in Macintosh or PC version directly from

REVIEW OF CONNECTED SPEECH

Characterizing and Processing Robot-Directed Speech

Deep Neural Network Language Models

MAKING YOUR OWN ALEXA SKILL SHRIMAI PRABHUMOYE, ALAN W BLACK

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing

Storytelling Made Simple

UASCS Summer Planning Committee

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

Innovation and new technologies

3. Improving Weather and Emergency Management Messaging: The Tulsa Weather Message Experiment. Arizona State University

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Renaissance Learning P.O. Box 8036 Wisconsin Rapids, WI (800)

Transcription:

Speech Processing 11-492/18-492 Speech Processing Current Topics and Future challenges Commercial and Research

Current and Future What are the hot topics in Speech What currently works What could work soon (5-10years) What are the industry hot topics What are the research challenges

Spoken Dialog: Now Industry: Location based querying On phone: Apple (Siri) In home: Amazon (Echo) Smartphones, Tablets: (Owners have money) How do you make money out of this

Spoken Dialog: Now Research Error recovery Adaptive systems Rapid deployment Learning dialog structure from data Non-task oriented dialog

ASR: Now Industry Adapting cloud ASR per app. Broadcast news transcription Robust speech recognition: In car, outside, in noisy office, far field LM adaptation from other sources Using click through and search queries Pronunciation variants ( wrong ones too) Medical transcription

ASR: Now Research: Discriminative training Acoustic parameter projections to discriminate between the correct answers and competitors Robust recognition Far field microphones Blind source separation Out of vocabulary words Unsupervised training Deep Learning (Neural Nets) Zero-resource ASR

TTS: Now Industry Building custom voices (and your voice) Multilingual on small devices E.g. for GPS Navigation over Europe Easy methods to build new languages Conversational Speech

TTS: Now Research Improving neural synthesis Rapid support in new languages Emotional speech synthesis Automatic building of voices from data Without any human intervention Languages without Orthography Synthesis beyond the sentence Synthesis with more text analysis

Speech to Speech Translation Industry One way systems, domain limited systems Simple targeted cell phone systems Youtube/Broadcast translation Skype translation Research Two way systems, large domains One way lecture/broadcast news

VC and SID: Now Voice conversion Cross Lingual Voice Conversion Emotion/style conversion Conversion without training data Speaker ID Accuracy on large data sets (> 1000 speakers) Cross channel/language ID More information in ID (prosody, vocab)

CALL: Now Industry Pronunciation training Scenario practicing Research Game based tools Measuring educational contribution

Speech Processing Future Hard challenges (PhD topics and beyond) All on the research side But maybe in Research Labs

Speech Reco without Speech Using other modalities Lip movement, muscle movement Silent speech No generated audio Just think about the words Gesture recognition Brain Computer Interfaces ASR without text Find. in all this audio

Beyond the Words Recognition of more than words Intent, style, emotion Human-Machine Frustration, confidence, agreement Human-Human Rapport, relationships, persuasion Truth and lies Sentiment

Conversational Systems Participant in a meeting True conversational speech Appropriate non-word speech generation Know when to speak, when to laugh, when to listen Appropriate timing conversation Able to interrupt when having something to say Have something to say

Summaries and Discussions Describe a paper/movie/event Appropriate summary Allow questions Know when to use style/emotion Not just speech<->text Understand more of the text content Answer complex questions Engage user and discuss topic

Homework 4 Imagine you work for a social media company Facespace. They ve asked you to give a report on how speech technology could improve their users experience

Homework 4 What parts could use speech What could ASR, TTS, SDS, voice conversion, speaker id and any other speech technology could help How could you use their existing data to help build and tune systems How would you evaluate what you propose The company is international, could speech translation be used

Note Don t forget to fill in Faculty Course Evaluation Final Homework due Friday 8th 3:30pm Final exam Monday 12 th 8:30-11:30pm GHC 4301 18-492 is in the same room!