13/Nov/2008 Introduction to Speech Technology Presented by Andriy Temko Department of Electrical and Electronic Engineering
Page 2 of 30 Outline Introduction & Applications Analysis of Speech Speech Recognition Problem
Page 3 of 30 Speech Signal Speech signal converted to a electrical waveform by a microphone Possibility to be converted to electric waveform and then back to acoustic waveform is the basis for Bell s telephone invention
Page 4 of 30 Speech Chain
Page 5 of 30 Applications: Speech Coding Speech coding block diagram encoder and decoder.
Page 6 of 30 Applications: Text-to-Speech Synthesis Simulation of the entire upper part of Speech Chain Set of linguistic rules determine the appropriate set of sounds Not just simple looking up the words in a pronouncing dictionary: abbreviation, ambiguous words, acronyms, proper names, special terms, intonation, etc Most popular method: Unit Selection & Concatenation
Page 7 of 30 Applications: Speech Recognition Feature Analysis convert a digital speech signal to a set of feature vectors Pattern Matching finds the closest match of the dynamically time-aligned set of feature vectors with a set of stored patterns Speech Recognition extracting a message from a signal Command and control of computer software Voice dictation Dialog with machines help desks and call centers
Page 8 of 30 Applications: Others Speaker Recognition who is speaking Speaker Verification verify the claimed identity Speaker Diarization who spoke when Word Spotting monitoring the signal for a special word Speech/Audio Indexing identifying audio class (Broadcast news transcription) Audio Recognition identifying acoustic events (Audio-based surveillance/smart-rooms) Speech Enhancement make speech more intelligible
Page 9 of 30 Interesting Facts: Perception of Loudness Greatest sensitivity at around 3 to 4 khz. Almost precisely the range of frequencies occupied by most of the sounds of speech! Non-uniform filter-bank analysis
Page 10 of 30 Interesting Facts: Auditory Masking Critical bands phenomena Widely used in speech coding (perceptual lossless coding)
Page 11 of 30 Outline Introduction & Applications Analysis of Speech Speech Recognition Problem
Page 12 of 30 Short-Time Analysis of Speech. Windowing Windowing small portions assumed to be pseudostationary Windowing yields a set of speech samples x(n) weighted by the shape of the window w(n) Generally, successive windows will overlap as w(n) tends to have a shape that will deemphasise samples near it s edges. This breaks the speech down into a sequence of frames.
Page 13 of 30 Short-Time Analysis of Speech. FFT Wide band Narrow band
Page 14 of 30 Short-Time Analysis of Speech. Spectral Envelope Wide band Narrow band You cannot get good time resolution and good frequency resolution from the same spectrogram Uncertainty Principle
Page 15 of 30 Phoneme Speakers and listeners divide words into component sounds called phonemes. Native speakers agree on the phonemes that make up a particular word There are about 42 phonemes in English The actual sound that corresponds to a particular phoneme depends on: The adjacent phonemes in the word or sentence The accent of the speaker The talking speed Whether it is a formal or informal occasion
Page 16 of 30 Voiced / Unvoiced Phoneme Vowels/Consonants discrimination with Zero Crossing Rate and Short Time Energy Determination of Pitch (Fundamental Frequency) with autocorrelation
Page 17 of 30 Outline Introduction & Applications Analysis of Speech Speech Recognition Problem
Page 18 of 30 Speech Recognition Phonologic rules Phonetic models Dictionary and grammar Task model utterance Acoustic front-end Recognition algorithm Understanding algorithm meaning
Page 19 of 30 Speech Recognition Hz Phonologic rules Phonetic models Dictionary and grammar Task model utterance Acoustic front-end Recognition algorithm Understanding algorithm meaning
Page 20 of 30 Speech Recognition Phonologic rules Phonetic models Dictionary and grammar Task model utterance Acoustic front-end Recognition algorithm Understanding algorithm meaning
Page 21 of 30 Speech Recognition Markov Model Phonologic rules Phonetic models Phoneme k-1 Phoneme k Phoneme k+1 Dictionary and grammar Task model utterance Acoustic front-end Recognition algorithm Understanding algorithm meaning
Page 22 of 30 Speech Recognition Phonologic rules Phonetic models Dictionary and grammar Task model utterance Acoustic front-end Recognition algorithm Understanding algorithm meaning
Page 23 of 30 Speech Recognition Phonologic rules Phonetic models Dictionary and grammar Trigram Pr{ the door was not opened} = Pr{ the} Pr{ door/the} Pr{ was/the door} Pr{ not / the door was} Pr{ opened / the door was not} = Pr{ the} Pr{ door/the} Pr{ was/the door }Pr{ not /door was} Pr{ opened / was not} Task model utterance Acoustic front-end Recognition algorithm Understanding algorithm meaning
Page 24 of 30 Speech Recognition DATABASE voice text TRAINING Acoustic front-end Phonetic modeling Language modeling Phonologic rules Phonetic models Dictionary and grammar Task model utterance Acoustic front-end Recognition algorithm Understanding algorithm meaning
Page 25 of 30 A Snapshot of Acoustic Front-End No standard set of features for speech recognition: acoustic/articulatory/auditory
Page 26 of 30 A Snapshot of Recognition Algorithm (I) Viterbi/Baum- Welch alignment Dynamic Time Warping. Weighted Finite State Transducers (WFST)
Page 27 of 30 A Snapshot of Recognition (II) A simple example of the whole decoding network
Page 28 of 30 A Snapshot of Recognition (III)
Page 29 of 30 State of the Art CORPUS STYLE VOCALUBARY SIZE % WORD ERRORS Digit strings spontaneous 11 2.0 Digit strings conversational 11 5.0 Resource Management read 1.000 2.0 Airline Travel Information System (ATIS) spontaneous 2.500 2.5 North American Business News (NAB) Call Home read 64.000 6.6 conversational telephonic 28.000 40.0
Page 30 of 30 Literature - L. R. Rabiner, R. W. Schafer, Introduction to Digital Speech Processing, Foundations and Trends in Signal Processing, Vol. 1, Nos. 1 2, 2007 - X. Huang, A. Acero, H. Hon, R. Reddy, Spoken Language Processing: A Guide to Theory, Algorithm and System, Prentice Hall, 2001 -D. Jurafsky, J.H. Martin, Speech and Language Processing, Prentice Hall, 2001