1111 :1.1 : :,,; of Speech Processing. Jacob Benesty, M. Mohan Sondhi, Yiteng Huang (Eds.) With DVD-ROM, 456 Figures and 113 Tables.

Size: px

Start display at page:

Download "1111 :1.1 : :,,; of Speech Processing. Jacob Benesty, M. Mohan Sondhi, Yiteng Huang (Eds.) With DVD-ROM, 456 Figures and 113 Tables."

Jared Shields
5 years ago
Views:

1 й ш 1111 :1.1 : :,,; of Speech Processing Jacob Benesty, M. Mohan Sondhi, Yiteng Huang (Eds.) With DVD-ROM, 456 Figures and 113 Tables 4y Springer

Contents List of Abbreviations XXXI 1 Introduction to Speech Processing J. Benesty, M. M. Sondhi, Y. Huang 1 1.1 A Brief History of Speech Processing 1 1.2 Applications of Speech Processing 2 1.

2 Contents List of Abbreviations XXXI 1 Introduction to Speech Processing J. Benesty, M. M. Sondhi, Y. Huang A Brief History of Speech Processing Applications of Speech Processing Organization of the Handbook k References k Part A Production, Perception, and Modeling of Speech 2 Physiological Processes of Speech Production K. Honda Overview of Speech Apparatus Voice Production Mechanisms Articulatory Mechanisms Ik l.k Summary 2k References 25 3 Nonlinear Cochlear Signal Processing and Masking in Speech Perception J. B. Allen Basics The Nonlinear Cochlea Neural Masking k5 ЗА Discussion and Summary 55 References 56 k Perception of Speech and Sound ß. Kollmeier, Т. Brand, В. Meyer 61 k.l Basic Psychoacoustic Quantities 62 k.2 Acoustical Information Required for Speech Perception 70 k.3 Speech Feature Perception Ik References 81 5 Speech Quality Assessment V. Grancharov, W. B. Kleijn Degradation Factors Affecting Speech Quality 8k 5.2 Subjective Tests Objective Measures 90 5 Л Conclusions 95 References 96

Part B Signal Processing for Speech 6 Wiener and Adaptive Filters J. Benesty, Y. Huang, J. Chen 103 6.1 Overview 103 6.2 Signal Models 104 6.3 Derivation of the Wiener Filter 106 6.

3 Part B Signal Processing for Speech 6 Wiener and Adaptive Filters J. Benesty, Y. Huang, J. Chen Overview Signal Models Derivation of the Wiener Filter Impulse Response Tail Effect Condition Number Adaptive Algorithms MIMO Wiener Filter Conclusions 119 References Linear Prediction J. Benesty, J. Chen, Y. Huang Fundamentals Forward Linear Prediction Backward Linear Prediction Levinson-Durbin Algorithm Lattice Predictor Spectral Representation Linear Interpolation Line Spectrum Pair Representation Multichannel Linear Prediction Conclusions 133 References The Kaiman Filter 5. Gannot, A. Yeredor Derivation of the Kaiman Filter Examples: Estimation of Parametric Stochastic Process from Noisy Observations Extensions of the Kaiman Filter The Application of the Kaiman Filter to Speech Processing Summary 157 References Homomorphic Systems and Cepstrum Analysis of Speech R.W. Schäfer Definitions Z-Transform Analysis Discrete-Time Model for Speech Production The Cepstrum of Speech Relation to LPC Application to Pitch Detection 171

9.7 Applications to Analysis/Synthesis Coding 172 9.8 Applications to Speech Pattern Recognition 176 9.

4 9.7 Applications to Analysis/Synthesis Coding Applications to Speech Pattern Recognition Summary 180 References Pitch and Voicing Determination of Speech with an Extension Toward Music Signals W.J. Hess Pitch in Time-Variant Quasiperiodic Acoustic Signals Short-Term Analysis PDAs Selected Time-Domain Methods A A Short Look into Voicing Determination Evaluation and Postprocessing Applications in Speech and Music Some New Challenges and Developments Concluding Remarks 207 References Formant Estimation and Tracking D. O'Shaughnessy Historical Vocal Tract Resonances Speech Production Л Acoustics of the Vocal Tract Short-Time Speech Analysis Formant Estimation Summary 226 References The STFT, Sinusoidal Models, and Speech Modification M. M. Goodwin The Short-Time Fourier Transform Sinusoidal Models Speech Modification 253 References Adaptive Blind Multichannel Identification Y. Huang, J. Benesty, J. Chen Overview Signal Model and Problem Formulation Identifiability and Principle Л Constrained Time-Domain Multichannel LMS and Newton Algorithms Unconstrained Multichannel LMS Algorithm with Optimal Step-Size Control Frequency-Domain Blind Multichannel Identification Algorithms Adaptive Multichannel Exponentiated Gradient Algorithm 276

13.8 Summary 279 References 279 Part С Speech Coding 14 Principles of Speech Coding l/l/. ß. Kleijn 283 14.1 The Objective of Speech Coding 283 14.2 Speech Coder Attributes 284 14.

5 13.8 Summary 279 References 279 Part С Speech Coding 14 Principles of Speech Coding l/l/. ß. Kleijn The Objective of Speech Coding Speech Coder Attributes A Universal Coderfor Speech Coding with Autoregressive Models Distortion Measures and Coding Architecture Summary 302 References Voice over IP: Speech Transmission over Packet Networks J. Skoglund, E. Kozica, J. Linden, R. Hagen, W. B. Kleijn Voice Communication Properties of the Network Outline of a VoIP System Robust Encoding Packet Loss Concealment Conclusion 327 References Low-Bit-Rate Speech Coding A. V. McCree Speech Coding Fundamentals: Parametric Modeling of Speech Signals Flexible Parametric Models Efficient Quantization of Model Parameters Low-Rate Speech Coding Standards Summary 347 References Analysis-by-Synthesis Speech Coding J.-H. Chen, J. Thyssen Overview Basic Concepts of Analysis-by-Synthesis Coding Overview of Prominent Analysis-by-Synthesis Speech Coders Multipulse Linear Predictive Coding (MPLPC) Regular-Pulse Excitation with Long-Term Prediction (RPE-LTP) The Original Code Excited Linear Prediction (CELP) Coder US Federal Standard FS1016 CELP, Vector Sum Excited Linear Prediction (VSELP) Low-Delay CELP (LD-CELP) 370

Contents XXI 17.10 Pitch Synchronous Innovation CELP (PSI-CELP) 371 17.11 Algebraic CELP (ACELP) 371 17.12 Conjugate Structure CELP (CS-CELP) and CS-ACELP 377 17.

6 Contents XXI Pitch Synchronous Innovation CELP (PSI-CELP) Algebraic CELP (ACELP) Conjugate Structure CELP (CS-CELP) and CS-ACELP Relaxed CELP (RCELP) - Generalized Analysis by Synthesis ex-celp ilbc TSNFC Embedded CELP Summary of Analysis-by-Synthesis Speech Coders Conclusion 390 References Perceptual Audio Coding of Speech Signals J. Herre, M. Lutzky History of Audio Coding Fundamentals of Perceptual Audio Coding Some Successful Standardized Audio Coders Perceptual Audio Coding for Real-Time Communication Hybrid/Crossover Coders Summary 409 References 409 Part D Text-to-Speech Synthesis 19 Basic Principles of Speech Synthesis J. Schroeter The Basic Components of a TTS System Speech Representations and Signal Processing for Concatenative Synthesis Speech Signal Transformation Principles Speech Synthesis Evaluation Conclusions 426 References Rule-Based Speech Synthesis R. Carlson, B. Granström Background Terminal Analog Controlling the Synthesizer Special Applications of Rule-Based Parametric Synthesis Concluding Remarks 434 References Corpus-Based Speech Synthesis Г. Dutoit Basics 437

21.2 Concatenative Synthesis with a Fixed Inventory 438 21.3 Unit-Selection-Based Synthesis 447 21.4 Statistical Parametric Synthesis 450 21.

7 21.2 Concatenative Synthesis with a Fixed Inventory Unit-Selection-Based Synthesis Statistical Parametric Synthesis Conclusion 453 References Linguistic Processing for Speech Synthesis R. Sproat Why Linguistic Processing is Hard Fundamentals: Writing Systems and the Graphical Representation of Language Problems to be Solved and Methods to Solve Them Architectures for Multilingual Linguistic Processing Document-Level Processing Future Prospects 466 References Prosodic Processing J. van Santen, T. Mishra, E. Klabbers Overview Historical Overview Fundamental Challenges A Survey of Current Approaches Future Approaches Conclusions 485 References Voice Transformation Y. Stylianou Background Source-Filter Theory and Harmonic Models Definitions Source Modifications Filter Modifications Conversion Functions Voice Conversion Quality Issues in Voice Transformations Summary 502 References Expressive/Affective Speech Synthesis N. Campbell Overview Characteristics of Affective Speech The Communicative Functionality of Speech Approaches to Synthesizing Expressive Speech Modeling Human Speech 512

25.6 Conclusion 515 References 515 Part E Speech Recognition 26 Historical Perspective of the Field of ASR/NLU L Rabiner, B.-H. Juang 521 26.1 ASR Methodologies 521 26.

8 25.6 Conclusion 515 References 515 Part E Speech Recognition 26 Historical Perspective of the Field of ASR/NLU L Rabiner, B.-H. Juang ASR Methodologies Important Milestones in Speech Recognition History Generation 1 - The Early History of Speech Recognition Generation 2 - The First Working Systems for Speech Recognition Generation 3 - The Pattern Recognition Approach to Speech Recognition Generation 4 - The Era of the Statistical Model Generation 5 -The Future Summary 534 References HMMs and Related Speech Recognition Technologies S. Young Basic Framework Architecture of an HMM-Based Recognizer HMM-Based Acoustic Modeling Normalization Adaptation Multipass Recognition Architectures Conclusions 554 References Speech Recognition with Weighted Finite-State Transducers M. Mohri, F. Pereira, M. Riley Definitions Overview Algorithms Applications to Speech Recognition Conclusion 582 References A Machine Learning Framework for Spoken-Dialog Classification С Cortes, P. Haffner, M. Mohri Motivation Introduction to Kernel Methods Rational Kernels Algorithms Experiments Theoretical Results for Rational Kernels 593

29.7 Conclusion 594 References 595 30 Towards Superhuman Speech Recognition M. Picheny, D. Nahamoo 597 30.1 Current Status 597 30.2 A Multidomain Conversational Test Set 598 30.

9 29.7 Conclusion 594 References Towards Superhuman Speech Recognition M. Picheny, D. Nahamoo Current Status A Multidomain Conversational Test Set Listening Experiments 599 ЗОЛ Recognition Experiments Speculation 607 References Natural Language Understanding S. Roukos Overview of NLU Applications Natural Language Parsing Practical Implementation Speech Mining Conclusion 625 References Transcription and Distillation of Spontaneous Speech S. Furui, T. Kawahara Background Overview of Research Activities on Spontaneous Speech Analysis for Spontaneous Speech Recognition Approaches to Spontaneous Speech Recognition Metadata and Structure Extraction of Spontaneous Speech Speech Summarization Conclusions 647 References Environmental Robustness J. Droppo, A. Acero Noise RobustSpeech Recognition Model Retraining and Adaptation Feature Transformation and Normalization A Model of the Environment Structured Model Adaptation Structured Feature Enhancement Unifying Model and Feature Techniques Conclusion 677 References The Business of Speech Technologies J. Wilpon, M. E. Gilbert, J. Cohen Introduction 682

34.2 Network-Based Speech Services 686 34.3 Device-Based Speech Applications 692 34Л Vision/Predications of Future Services - Fueling the Trends 697 34.

10 34.2 Network-Based Speech Services Device-Based Speech Applications Л Vision/Predications of Future Services - Fueling the Trends Conclusion 701 References Spoken Dialogue Systems I/. Zue, S. Seneff : Technology Components and System Development Development Issues Historical Perspectives New Directions Concluding Remarks 718 References 718 Part F Speaker Recognition 36 Overview of Speaker Recognition A. E. Rosenberg, F. Bimbot, S. Parthasarathy Speaker Recognition Measuring Speaker Features Constructing Speaker Models Adaptation Decision and Performance Selected Applications for Automatic Speaker Recognition Summary 739 References Text-Dependent Speaker Recognition M. Hebert Brief Overview Text-Dependent Challenges Selected Results A Concluding Remarks 760 References Text-Independent Speaker Recognition D.A. Reynolds, l/l/. M. Campbell Introduction Likelihood Ratio Detector Features Classifiers Performance Assessment Summary ' 778 References 779

Part G Language Recognition 39 Principles of Spoken Language Recognition* C.-H. Lee 785 39.1 Spoken Language 785 39.2 Language Recognition Principles 786 39.

11 Part G Language Recognition 39 Principles of Spoken Language Recognition* C.-H. Lee Spoken Language Language Recognition Principles Phone Recognition Followed by Language Modeling (PRLM) Vector-Space Characterization (VSC) Spoken Language Verification Discriminative Classifier Design Summary 793 References Spoken Language Characterization M. P. Harper, M. Maxwell Language versus Dialect Spoken Language Collections Spoken Language Characteristics Human Language Identification Text as a Source of Information on Spoken Languages Summary 807 References Automatic Language Recognition Via Spectral and Token Based Approaches D.A. Reynolds, W.M. Campbell, l/l/. Shen, E. Singer Automatic Language Recognition Spectral Based Methods Token-Based Methods System Fusion Performance Assessment Summary 823 References Vector-Based Spoken Language Classification H. Li, B. Ma, C.-H. Lee Vector Space Characterization Unit Selection and Modeling Front-End: Voice Tokenization and Spoken Document Vectorization Back-End: Vector-Based Classifier Design Language Classification Experiments and Discussion Summary 838 References 839

Contents XXVII Part H Speech Enhancement 43 Fundamentals of Noise Reduction J. Chen, J. Benesty, Y. Huang, E.J. Diethorn 843 43.1 Noise 843 43.2 Signal Model and Problem Formulation 845 43.

12 Contents XXVII Part H Speech Enhancement 43 Fundamentals of Noise Reduction J. Chen, J. Benesty, Y. Huang, E.J. Diethorn Noise Signal Model and Problem Formulation Evaluation of Noise Reduction Noise Reduction via Filtering Techniques Noise Reduction via Spectral Restoration Speech-Model-Based Noise Reduction Summary 868 References Spectral Enhancement Methods /. Cohen, S. Gannot Spectral Enhancement Problem Formulation Statistical Models Signal Estimation Signal Presence Probability Estimation A Priori SNR Estimation Noise Spectrum Estimation Summary of a Spectral Enhancement Algorithm Selection of Spectral Enhancement Algorithms Conclusions References Adaptive Echo Cancelation for Voice Signals M.M. Sondhi Network Echoes Single-Channel Acoustic Echo Cancelation Multichannel Acoustic Echo Cancelation Summary 925 References Dereverberation Y. Huang, J. Benesty, J. Chen Background and Overview Signal Model and Problem Formulation Source Model-Based Speech Dereverberation Separation of Speech and Reverberation via Homomorphic Transformation Channel Inversion and Equalization Summary 941 References 942

47 Adaptive Beamforming and Postfiltering 5. Gannot, I. Cohen 945 47.1 Problem Formulation 947 47.2 Adaptive Beamforming 948 47.3 Fixed Beamformer and Blocking Matrix 953 47.

13 47 Adaptive Beamforming and Postfiltering 5. Gannot, I. Cohen Problem Formulation Adaptive Beamforming Fixed Beamformer and Blocking Matrix Identification of the Acoustical Transfer Function Robustness and Distortion Weighting Multichannel Postfiltering Performance Analysis Experimental Results Summary A Appendix: Derivation of the Expected Noise Reduction for a Coherent Noise Field В Appendix: Equivalence Between Maximum SNR and LCMV Beamformers 974 References Feedback Control in Hearing Aids A. Spriet, S. Dodo, M. Moonen, J. Wouters Problem Statement Standard Adaptive Feedback Canceller Feedback Cancellation Based on Prior Knowledge of the Acoustic Feedback Path Feedback Cancellation Based on Closed-Loop System Identification Comparison Conclusions 997 References Active Noise Control S.M. Kuo, D.R. Morgan Broadband Feedforward Active Noise Control Narrowband Feedforward Active Noise Control Feedback Active Noise Control Multichannel ANC Summary 1015 References 1015 Part I Multichannel Speech Processing 50 Microphone Arrays G. 1/1/. Elko, J. Meyer Microphone Array Beamforming Constant-Beamwidth Microphone Array System Constrained Optimization of the Directional Gain Differential Microphone Arrays Eigenbeamforming Arrays 1034

50.6 Adaptive Array Systems 1037 50.7 Conclusions 1040 References 1040 51 Time Delay Estimation and Source Localization Y. Huang, J. Benesty, J. Chen 1043 51.1 Technology Taxonomy 1043 51.

14 50.6 Adaptive Array Systems Conclusions 1040 References Time Delay Estimation and Source Localization Y. Huang, J. Benesty, J. Chen Technology Taxonomy Time Delay Estimation Source Localization Л Summary 1061 References Convolutive Blind Source Separation Methods M.S. Pedersen, J. Larsen, U. Kjems, L.C. Parra The Mixing Model The Separation Model Identification Л Separation Principle Time Versus Frequency Domain The Permutation Ambiguity Results Conclusion 1084 References Sound Field Reproduction R. Rabenstein, S. Spots Sound Field Synthesis Mathematical Representation of Sound Fields Stereophony Vector-Based Amplitude Panning Ambisonics Wave Field Synthesis 1109 References 1113 Acknowledgements 1115 About the Authors 1117 Detailed Contents 1133 Subject Index 1161

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

26 24th European Signal Processing Conference (EUSIPCO) Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions Emma Jokinen Department