2. Introduction to Speech Processing

Size: px

Start display at page:

Download "2. Introduction to Speech Processing"

Amberlynn Andrews
5 years ago
Views:

1 2. Introduction to Speech Processing

2 The Speech processing stack Speech Applications: Coding, synthesis, recognition, understanding, speaker verification, language translation, speedup/slow-down Speech Measurements: energy, zero crossings, autocorrelations Speech properties: speech-silence, voiced-unvoiced, pitch, formants Speech representations: temporal, spectral, homomorphic, LPC Fundamentals: acoustics, linguistics, pragmatics, speech perception

3 SPEECH GENERATION AND TRANSMISSION

4 Speech Chain (Denes and Pinson) SPEAKER HEARER Tractament Digital de la Parla 4

5 Speech Production/Perception (after Flanagan) Tractament Digital de la Parla 5

6 Speech Processing Diagram Tractament Digital de la Parla 6

7 Application of digital speech processing Speech coding Speech Synthesis (from text to speech) Speech recognition Speaker/language recognition Many others

8 SPEECH CODING

9 Speech Coding The aim of speech coding is to compress (and then decompress) the speech waveform without any loss of listenability or intelligibility. Various standards exist for speech coding. The desired bit rate and associated quality of speech is highly application dependent. Low bit rate: these basically have rates of between 75 and 2400 bps (bits-per-second). Medium-to-high bit rate: operate at greater than 2400 bps. Tractament Digital de la Parla 9

10 Applications of Speech Coding Reduction in bit rate for transmission/storage Speech enhancement (removal of noise) Allows the development of applications for Security High definition TV Teleconference Etc. Tractament Digital de la Parla 10

11 SPEECH SYNTHESIS

12 Speech Synthesis The aim of speech synthesis is to be able to take a word sequence and produce human-like speech Linguistic analysis stage: maps the input text into a standard form; determines the structure of the input, and finally decides how to pronounce it. Text Linguistic Analysis Prosody / Phone sequence Synthesis stage: converts the symbolic representation of what to say into an actual speech waveform. Speech Synthesis Sound Wave Tractament Digital de la Parla 12

13 Applications of Speech Synthesis/Text-to- Speech (TTS) Games Telephone-based Information directions, air travel, banking, etc. Accessing variable information Machine-human interfaces Eyes-free (in car) Reading/speaking for disabled Reading of texts/books access Education (Reading tutors) Alarm systems... Tractament Digital de la Parla 13

14 AUTOMATIC SPEECH RECOGNITION (ASR)

15 Automatic Speech Recognition Automatic Speech Recognition (ASR) is the process of converting an unknown speech waveform into the corresponding orthographic transcription. & language model Tractament Digital de la Parla 15

16 Extraction of feature vectors Speech signal Usually every 10ms, 25ms window Acoustic features Typically around 39

17 Current issues in ASR Steady reduction has been achieved over the last 20 years in many domains. Still more research is needed: Increase robustness for new acoustic environments Vocabulary increase and topic independence Improve OOV (out-of-vocabulary) recognition

18 Applications of Speech Recognition/ Understanding (ASR/ASU) Dictation Telephone-based Information directions, air travel, banking, etc Polls, online shopping Call routing Hands-free in car, computer, home(domotics), controlling tools Second language (accent reduction) Audio archive searching Help for disabled people Tractament Digital de la Parla 18

19 SPEAKER/LANGUAGE RECOGNITION

20 Speaker/Language identification Audio Feature extraction Feature vectors Speaker/language models Selected speaker/language

21 Applications of Speaker/Language Recognition Language recognition for call routing Speaker Recognition: Speaker verification (binary decision) Voice password, telephone assistant Speaker identification (one of N) (open set/closed set) Criminal investigation Tractament Digital de la Parla 21

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI