Speech Processing in Embedded Systems

Size: px

Start display at page:

Download "Speech Processing in Embedded Systems"

Gilbert Terry
5 years ago
Views:

1 Speech Processing in Embedded Systems

2 Priyabrata Sinha Speech Processing in Embedded Systems ABC

3 Priyabrata Sinha Microchip Technology, Inc., Chandler AZ, USA Certain Materials contained herein are reprinted with permission of Microchip Technology Incorporated. No further reprints or reproductions maybe made of said materials without Microchip s Inc s prior written consent. ISBN e-isbn DOI / Springer New York Dordrecht Heidelberg London Library of Congress Control Number: c Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (

4 Preface Speech Processing has rapidly emerged as one of the most widespread and wellunderstood application areas in the broader discipline of Digital Signal Processing. Besides the telecommunications applications that have hitherto been the largest users of speech processing algorithms, several nontraditional embedded processor applications are enhancing their functionality and user interfaces by utilizing various aspects of speech processing. At the same time, embedded systems, especially those based on high-performance microcontrollers and digital signal processors, are rapidly becoming ubiquitous in everyday life. Communications equipment, consumer appliances, medical, military, security, and industrial control are some of the many segments that can potentially exploit speech processing algorithms to add more value to their users. With new embedded processor families providing powerful and flexible CPU and peripheral capabilities, the range of embedded applications that employ speech processing techniques is becoming wider than ever before. While working as an Applications Engineer at Microchip Technology and helping customers incorporate speech processing functionality into mainstream embedded applications, I realized that there was an acute need for literature that addresses the embedded application and computational aspects of speech processing. This need is not effectively met by the existing speech processing texts, most of which are overwhelmingly mathematics intensive and only focus on theoretical concepts and derivations. Most speech processing books only discuss the building blocks of speech processing but do not provide much insight into what applications and endsystems can utilize these building blocks. I sincerely hope my book is a step in the right direction of providing the bridge between speech processing theory and its implementation in real-life applications. Moreover, the bulk of existing speech processing books is primarily targeted toward audiences who have significant prior exposure to signal processing fundamentals. Increasingly, the system software and hardware developers who are involved in integrating speech processing algorithms in embedded end-applications are not DSP experts but general-purpose embedded system developers (often coming from the microcontroller world) who do not have a substantive theoretical background in DSP or much experience in developing complex speech processing algorithms. This large and growing base of engineers requires books and other sources of information that bring speech processing algorithms and concepts into v

5 vi Preface the practical domain and also help them understand the CPU and peripheral needs for accomplishing such tasks. It is primarily this audience that this book is designed for, though I believe theoretical DSP engineers and researchers would also benefit by referring to this book as it would provide an real-world implementation-oriented perspective that would help fine-tune the design of future algorithms for practical implementability. This book starts with Chap. 1 providing a general overview of the historical and emerging trends in embedded systems, the general signal chain used in speech processing applications, several applications of speech processing in our daily life, and a listing of some key speech processing tasks. Chapter 2 provides a detailed analysis of several key signal processing concepts, and Chap. 3 builds on this foundation by explaining many additional concepts and techniques that need to be understood by anyone implementing speech processing applications. Chapter 4 describes the various types of processor architectures that can be utilized by embedded speech processing applications, with special focus on those characteristic features that enable efficient and effective execution of signal processing algorithms. Chapter 5 provides readers with a description of some of the most important peripheral features that form an important criterion for the selection of a suitable processing platform for any application. Chapters 6 8 describe the operation and usage of a wide variety of Speech Compression algorithms, perhaps the most widely used class of speech processing operations in embedded systems. Chapter 9 describes techniques for Noise and Echo Cancellation, another important class of algorithms for several practical embedded applications. Chapter 10 provides an overview of Speech Recognition algorithms, while Chap. 11 explains Speech Synthesis. Finally, Chap. 12 concludes the book and tries to provide some pointers to future trends in embedded speech processing applications and related algorithms. While writing this book I have been helped by several individuals in small but vital ways. First, this book would not have been possible without the constant encouragement and motivation provided by my wife Hoimonti and other members of our family. I would also like to thank my colleagues at Microchip Technology, including Sunil Fernandes, Jayanth Madapura, Veena Kudva, and others, for helping with some of the block diagrams and illustrations used in this book, and especially Sunil for lending me some of his books for reference. I sincerely hope that the effort that has gone into developing this book helps embedded hardware and software developers to provide the most optimal, high-quality, and cost-effective solutions for their end customers and to society at large. Chandler, AZ Priyabrata Sinha

6 Contents 1 Introduction... 1 Digital vs. Analog Systems... 1 Embedded Systems Overview... 3 Speech Processing in Everyday Life... 4 Common Speech Processing Tasks... 5 Summary... 7 References Signal Processing Fundamentals... 9 Signals and Systems... 9 Sampling and Quantization Sampling of an Analog Signal Quantization of a Sampled Signal Convolution and Correlation The Convolution Operation Cross-correlation Autocorrelation Frequency Transformations and FFT Discrete Fourier Transform Fast Fourier Transform Benefits of Windowing Introduction to Filters Low-Pass, High-Pass, Band-Pass and Band-Stop Filters Analog and Digital Filters FIR and IIR Filters FIR Filters IIR Filters Interpolation and Decimation Summary References vii

7 viii Contents 3 Basic Speech Processing Concepts Mechanism of Human Speech Production Types of Speech Signals Voiced Sounds Unvoiced Sounds Voiced and Unvoiced Fricatives Voiced and Unvoiced Stops Nasal Sounds Digital Models for the Speech Production System Alternative Filtering Methodologies Used in Speech Processing Lattice Realization of a Digital Filter Zero-Input Zero-State Filtering Some Basic Speech Processing Operations Short-Time Energy Average Magnitude Short-Time Average Zero-Crossing Rate Pitch Period Estimation Using Autocorrelation Pitch Period Estimation Using Magnitude Difference Function Key Characteristics of the Human Auditory System Basic Structure of the Human Auditory System Absolute Threshold Masking Phase Perception (or Lack Thereof) Evaluation of Speech Quality Signal-to-Noise Ratio Segmental Signal-to-Noise Ratio Mean Opinion Score Summary References CPU Architectures for Speech Processing The Microprocessor Concept Microcontroller Units Architecture Overview Digital Signal Processor Architecture Overview Digital Signal Controller Architecture Overview Fixed-Point and Floating-Point Processors Accumulators and MAC Operations Multiplication, Division, and 32-Bit Operations Program Flow Control Special Addressing Modes Modulo Addressing Bit-Reversed Addressing Data Scaling, Normalization, and Bit Manipulation Support Other Architectural Considerations Pipelining... 71

8 Contents ix Memory Caches Floating Point Support Exception Processing Summary References Peripherals for Speech Processing Speech Sampling Using Analog-to-Digital Converters Types of ADC ADC Accuracy Specifications Other Desirable ADC Features ADC Signal Conditioning Considerations Speech Playback Using Digital-to-Analog Converters Speech Playback Using Pulse Width Modulation Interfacing with Audio Codec Devices Communication Peripherals Universal Asynchronous Receiver/Transmitter Serial Peripheral Interface Inter-Integrated Circuit Controller Area Network Other Peripheral Features External Memory and Storage Devices Direct Memory Access Summary References Speech Compression Overview Speech Compression and Embedded Applications Full-Duplex Systems Half-Duplex Systems Simplex Systems Types of Speech Compression Techniques Choice of Input Sampling Rate Choice of Output Data Rate Lossless and Lossy Compression Techniques Direct and Parametric Quantization Waveform and Voice Coders Scalar and Vector Quantization Comparison of Speech Coders Summary References...100

9 x Contents 7 Waveform Coders Introduction to Scalar Quantization Uniform Quantization Logarithmic Quantization ITU-T G.711 Speech Coder ITU-T G.726 and G.726A Speech Coders Encoder Decoder ITU-T G.722 Speech Coder Encoder Decoder Summary References Voice Coders Linear Predictive Coding Levinson Durbin Recursive Solution Short-Term and Long-Term Prediction Other Practical Considerations for LPC Vector Quantization Speex Speech Coder ITU-T G.728 Speech Coder ITU-T G.729 Speech Coder ITU-T G Speech Coder Summary References Noise and Echo Cancellation Benefits and Applications of Noise Suppression Noise Cancellation Algorithms for 2-Microphone Systems Spectral Subtraction Using FFT Adaptive Noise Cancellation Noise Suppression Algorithms for 1-Microphone Systems Active Noise Cancellation Systems Benefits and Applications of Echo Cancellation Acoustic Echo Cancellation Algorithms Line Echo Cancellation Algorithms Computational Resource Requirements Noise Suppression Acoustic Echo Cancellation Line Echo Cancellation Summary References...142

10 Contents xi 10 Speech Recognition Benefits and Applications of Speech Recognition Speech Recognition Using Template Matching Speech Recognition Using Hidden Markov Models Viterbi Algorithm Front-End Analysis Other Practical Considerations Performance Assessment of Speech Recognizers Computational Resource Requirements Summary References Speech Synthesis Benefits and Applications of Concatenative Speech Synthesis Benefits and Applications of Text-to-Speech Systems Speech Synthesis by Concatenation of Words and Subwords Speech Synthesis by Concatenating Waveform Segments Speech Synthesis by Conversion from Text (TTS) Preprocessing Morphological Analysis Phonetic Transcription Syntactic Analysis and Prosodic Phrasing Assignment of Stresses Timing Pattern Fundamental Frequency Computational Resource Requirements Summary References Conclusion References Index...169

Human Emotion Recognition From Speech

RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati