Introduction to Speech Technology

Size: px

Start display at page:

Download "Introduction to Speech Technology"

Blake Gibbs
5 years ago
Views:

1 13/Nov/2008 Introduction to Speech Technology Presented by Andriy Temko Department of Electrical and Electronic Engineering

2 Page 2 of 30 Outline Introduction & Applications Analysis of Speech Speech Recognition Problem

3 Page 3 of 30 Speech Signal Speech signal converted to a electrical waveform by a microphone Possibility to be converted to electric waveform and then back to acoustic waveform is the basis for Bell s telephone invention

4 Page 4 of 30 Speech Chain

5 Page 5 of 30 Applications: Speech Coding Speech coding block diagram encoder and decoder.

6 Page 6 of 30 Applications: Text-to-Speech Synthesis Simulation of the entire upper part of Speech Chain Set of linguistic rules determine the appropriate set of sounds Not just simple looking up the words in a pronouncing dictionary: abbreviation, ambiguous words, acronyms, proper names, special terms, intonation, etc Most popular method: Unit Selection & Concatenation

7 Page 7 of 30 Applications: Speech Recognition Feature Analysis convert a digital speech signal to a set of feature vectors Pattern Matching finds the closest match of the dynamically time-aligned set of feature vectors with a set of stored patterns Speech Recognition extracting a message from a signal Command and control of computer software Voice dictation Dialog with machines help desks and call centers

8 Page 8 of 30 Applications: Others Speaker Recognition who is speaking Speaker Verification verify the claimed identity Speaker Diarization who spoke when Word Spotting monitoring the signal for a special word Speech/Audio Indexing identifying audio class (Broadcast news transcription) Audio Recognition identifying acoustic events (Audio-based surveillance/smart-rooms) Speech Enhancement make speech more intelligible

Almost precisely the range of frequencies occupied by

9 Page 9 of 30 Interesting Facts: Perception of Loudness Greatest sensitivity at around 3 to 4 khz. Almost precisely the range of frequencies occupied by most of the sounds of speech! Non-uniform filter-bank analysis

10 Page 10 of 30 Interesting Facts: Auditory Masking Critical bands phenomena Widely used in speech coding (perceptual lossless coding)

11 Page 11 of 30 Outline Introduction & Applications Analysis of Speech Speech Recognition Problem

12 Page 12 of 30 Short-Time Analysis of Speech. Windowing Windowing small portions assumed to be pseudostationary Windowing yields a set of speech samples x(n) weighted by the shape of the window w(n) Generally, successive windows will overlap as w(n) tends to have a shape that will deemphasise samples near it s edges. This breaks the speech down into a sequence of frames.

13 Page 13 of 30 Short-Time Analysis of Speech. FFT Wide band Narrow band

14 Page 14 of 30 Short-Time Analysis of Speech. Spectral Envelope Wide band Narrow band You cannot get good time resolution and good frequency resolution from the same spectrogram Uncertainty Principle

15 Page 15 of 30 Phoneme Speakers and listeners divide words into component sounds called phonemes. Native speakers agree on the phonemes that make up a particular word There are about 42 phonemes in English The actual sound that corresponds to a particular phoneme depends on: The adjacent phonemes in the word or sentence The accent of the speaker The talking speed Whether it is a formal or informal occasion

16 Page 16 of 30 Voiced / Unvoiced Phoneme Vowels/Consonants discrimination with Zero Crossing Rate and Short Time Energy Determination of Pitch (Fundamental Frequency) with autocorrelation

17 Page 17 of 30 Outline Introduction & Applications Analysis of Speech Speech Recognition Problem

18 Page 18 of 30 Speech Recognition Phonologic rules Phonetic models Dictionary and grammar Task model utterance Acoustic front-end Recognition algorithm Understanding algorithm meaning

19 Page 19 of 30 Speech Recognition Hz Phonologic rules Phonetic models Dictionary and grammar Task model utterance Acoustic front-end Recognition algorithm Understanding algorithm meaning

20 Page 20 of 30 Speech Recognition Phonologic rules Phonetic models Dictionary and grammar Task model utterance Acoustic front-end Recognition algorithm Understanding algorithm meaning

21 Page 21 of 30 Speech Recognition Markov Model Phonologic rules Phonetic models Phoneme k-1 Phoneme k Phoneme k+1 Dictionary and grammar Task model utterance Acoustic front-end Recognition algorithm Understanding algorithm meaning

22 Page 22 of 30 Speech Recognition Phonologic rules Phonetic models Dictionary and grammar Task model utterance Acoustic front-end Recognition algorithm Understanding algorithm meaning

23 Page 23 of 30 Speech Recognition Phonologic rules Phonetic models Dictionary and grammar Trigram Pr{ the door was not opened} = Pr{ the} Pr{ door/the} Pr{ was/the door} Pr{ not / the door was} Pr{ opened / the door was not} = Pr{ the} Pr{ door/the} Pr{ was/the door }Pr{ not /door was} Pr{ opened / was not} Task model utterance Acoustic front-end Recognition algorithm Understanding algorithm meaning

24 Page 24 of 30 Speech Recognition DATABASE voice text TRAINING Acoustic front-end Phonetic modeling Language modeling Phonologic rules Phonetic models Dictionary and grammar Task model utterance Acoustic front-end Recognition algorithm Understanding algorithm meaning

25 Page 25 of 30 A Snapshot of Acoustic Front-End No standard set of features for speech recognition: acoustic/articulatory/auditory

26 Page 26 of 30 A Snapshot of Recognition Algorithm (I) Viterbi/Baum- Welch alignment Dynamic Time Warping. Weighted Finite State Transducers (WFST)

27 Page 27 of 30 A Snapshot of Recognition (II) A simple example of the whole decoding network

28 Page 28 of 30 A Snapshot of Recognition (III)

29 Page 29 of 30 State of the Art CORPUS STYLE VOCALUBARY SIZE % WORD ERRORS Digit strings spontaneous Digit strings conversational Resource Management read Airline Travel Information System (ATIS) spontaneous North American Business News (NAB) Call Home read conversational telephonic

30 Page 30 of 30 Literature - L. R. Rabiner, R. W. Schafer, Introduction to Digital Speech Processing, Foundations and Trends in Signal Processing, Vol. 1, Nos. 1 2, X. Huang, A. Acero, H. Hon, R. Reddy, Spoken Language Processing: A Guide to Theory, Algorithm and System, Prentice Hall, D. Jurafsky, J.H. Martin, Speech and Language Processing, Prentice Hall, 2001

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Prof. Ch.Srinivasa Kumar Prof. and Head of department. Electronics and communication Nalanda Institute