Signal Processing. Speech Signal Processing. Speech Information Processing

Similar documents
Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

On the Formation of Phoneme Categories in DNN Acoustic Models

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Learning Methods for Fuzzy Systems

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Segregation of Unvoiced Speech from Nonspeech Interference

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Speech Emotion Recognition Using Support Vector Machine

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speaker Recognition. Speaker Diarization and Identification

1972 M.I.T. Linguistics M.S. 1972{1975 M.I.T. Linguistics Ph.D.

THE RECOGNITION OF SPEECH BY MACHINE

Learners Use Word-Level Statistics in Phonetic Category Acquisition

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Mandarin Lexical Tone Recognition: The Gating Paradigm

Word Stress and Intonation: Introduction

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

Human Emotion Recognition From Speech

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Phonological Processing for Urdu Text to Speech System

Expressive speech synthesis: a review

REVIEW OF CONNECTED SPEECH

age, Speech and Hearii

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Word Segmentation of Off-line Handwritten Documents

COMPUTER INTERFACES FOR TEACHING THE NINTENDO GENERATION

Speech Recognition at ICSI: Broadcast News and beyond

A study of speaker adaptation for DNN-based speech synthesis

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

Automatic intonation assessment for computer aided language learning

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Pair Programming: When and Why it Works

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Lecture Notes in Artificial Intelligence 4343

SARDNET: A Self-Organizing Feature Map for Sequences

Journal of Phonetics

Consonants: articulation and transcription

Getting the Story Right: Making Computer-Generated Stories More Entertaining

Communication and Cybernetics 17

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Problems of the Arabic OCR: New Attitudes

COMPUTER-AIDED DESIGN TOOLS THAT ADAPT

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Agent-Based Software Engineering

Quarterly Progress and Status Report. Sound symbolism in deictic words

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

L1 Influence on L2 Intonation in Russian Speakers of English

Body-Conducted Speech Recognition and its Application to Speech Support System

LITERACY, AND COGNITIVE DEVELOPMENT

Speaker Identification by Comparison of Smart Methods. Abstract

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

Florida Reading Endorsement Alignment Matrix Competency 1

/$ IEEE

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

The IRISA Text-To-Speech System for the Blizzard Challenge 2017

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

Phonetics. The Sound of Language

Phonological encoding in speech production

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

Kaufman Assessment Battery For Children

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Proceedings of Meetings on Acoustics

Learning Methods in Multilingual Speech Recognition

Speaker recognition using universal background model on YOHO database

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

LEt s GO! Workshop Creativity with Mockups of Locations

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

Surface Structure, Intonation, and Meaning in Spoken Language

Some Principles of Automated Natural Language Information Extraction

The Acquisition of English Intonation by Native Greek Speakers

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

On Developing Acoustic Models Using HTK. M.A. Spaans BSc.

Using dialogue context to improve parsing performance in dialogue systems

BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY

Building Text Corpus for Unit Selection Synthesis

Contact Information 345 Mell Ave Atlanta, GA, Phone Number:

Program in Linguistics. Academic Year Assessment Report

Transcription:

Signal Processing Speech Signal Processing Speech Information Processing

Role of language Semantics: The meanings of words, and relations among them. Syntax: The order of words, role of function words. Phonology:Individual phonemic segments, features, stressed and unstressed vowels. For example, What is the phonemic inventory of English? How does it function? The concept of contrast (e.g., pat vs. bat). Why do we believe that it is psychologically real? Why does the same phoneme give rise to different, acoustic realizations in different utterances? (e.g., In fluent speech, "Joe ate his soup" loses the /h/ of "his", and the /t/ of "ate" doesn't look like a /t/ in "Tom".) What are the principles that lead to modifications of segments in different environments? How are phonemes usually described in terms of features translated into phonetic representations? (e.g.,/z/ is + voiced, /s/ is -voiced; same relation for many pairs, like f-v; patterning of sounds is beautifully captured by feature concept.)

New terms that you will know.. Voiced, Unvoiced, pitch, intensity, timbre, formants, speech production, vocal tract, vocal cord, phonemes, manner and place of articulation, coarticulation, linear prediction, homo-morphic filtering, spectrograms, speech coding, speech enhancement, speech recognition, acoustic modelling, time/pitch scale modification, speech synthesis, Human auditory system, speech perception, speech quality measures (MOS, PESQ)...

Speech Information Processing Speech InformationUnderstanding and Modeling What is information in speech and how it is encoded? Lets give it a try

Speech Research Speech Science -Linguistics - Physiology of Speech Production -Acoustics - Auditory Nervous System - Psychophysics of Auditory System - Cognitive Psychology - Computer-based Algorithms Speech Technology Speech Recognition Speaker Recognition Speech Synthesis

Speech Coding Speech Synthesis Speech Recognition Speech Understanding Speaker Recognition Language Recognition

Challenges to machine speech processing: Definition of information content Multiple levels of information Subjectivity of the listener Robustness to Interfering signals Partial information Algorithmic complexity

Recommended Readings J. L. Flanagan, Speech Analysis, Synthesis, and Perception, Springer -Verlag, 2nd Edition, Berlin, 1972 J. D. Markel and A. H. Gray, Jr., Linear Prediction of Speech, Springer-Verlag, Berlin, 1976 B. Gold and N. Morgan, Speech and Audio Signal Processing, J. Wiley and Sons, 2000 J. Deller, Jr., J. G. Proakis, and J. Hansen, Discrete Time Processing of Speech Signals, Macmillan Publishing, 1993 D. O Shaughnessy, Speech Communication, Human and Machine, Addison-Wesley, 1987 S. Furui and M. Sondhi, Advances in Speech Signal Processing, Marcel Dekker Inc, NY, 1991 R. W. Schafer and J. D. Markel, Editors, Speech Analysis, IEEE Press Selected Reprint Series, 1979 D. G. Childers, Speech Processing and Synthesis Toolboxes, John Wiley and Sons, 1999 K. Stevens, Acoustic Phonetics, MIT Press, 1998 J. Benesty, M. M. Sondhiand Y. Huang, Editors, Springer Handbook of Speech Processing and Speech Communication, Springer, 2008.

Recommended Readings Speech Coding: A. M. Kondoz, Digital Speech: Coding for Low Bit Rate Communication Systems-2nd Edition, John Wiley and Sons, 2004 W. B. Kleijnand K. K. Paliwal, Editors, Speech Coding and Synthesis, Elsevier, 1995 P. E. Papamichalis, Practical Approaches to Speech Coding, Prentice Hall Inc, 1987 N. S. Jayantand P. Noll, Digital Coding of Waveforms, Prentice Hall Inc, 1984

Recommended Readings Speech Synthesis: T. Dutoit, An Introduction to Text -To-Speech Synthesis, Kluwer Academic Publishers, 1997 P. Taylor, Text-to-Speech Synthesis, Cambridge University Press, 2008 J. Allen, S. Hunnicutt, and D. Klatt, From Text to Speech, Cambridge University Press, 1987 Y. Sagisaka, N. Campbell, and N. Higuchi, Computing Prosody, Springer Verlag, 1996 J. VanSanten, R. W. Sproat, J. P. Olive and J. Hirschberg, Editors, Progress in Speech Synthesis, Springer Verlag, 1996 J. P. Olive, A. Greenwood, and J. Coleman, Acoustics of American English, Springer Verlag, 1993

Recommended Readings Speech Recognition: L. R. Rabinerand B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall Inc, 1993 X. Huang, A. Aceroand H-W Hon, Spoken Language Processing, Prentice Hall Inc, 2000 F. Jelinek, Statistical Methods for Speech Recognition, MIT Press, 1998 H. A. Bourlard and N. Morgan, Connectionist Speech Recognition-A Hybrid Approach, Kluwer Academic Publishers, 1994 C. H. Lee, F. K. Soong, and K. K. Paliwal, Editors, Automatic Speech and Speaker Recognition, Kluwer Academic Publisher, 1996