L1: Course introduction

Similar documents
MTH 215: Introduction to Linear Algebra

Math 181, Calculus I

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

Human Emotion Recognition From Speech

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Learning Methods in Multilingual Speech Recognition

Course Development Using OCW Resources: Applying the Inverted Classroom Model in an Electrical Engineering Course

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Computer Science 141: Computing Hardware Course Information Fall 2012

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

BA 130 Introduction to International Business

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

FINN FINANCIAL MANAGEMENT Spring 2014

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors

BUS Computer Concepts and Applications for Business Fall 2012

A study of speaker adaptation for DNN-based speech synthesis

Class Meeting Time and Place: Section 3: MTWF10:00-10:50 TILT 221

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

(Sub)Gradient Descent

Control Tutorials for MATLAB and Simulink

Accounting 312: Fundamentals of Managerial Accounting Syllabus Spring Brown

Introduction to Information System

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Course Syllabus for Math

Learning Methods for Fuzzy Systems

Multisensor Data Fusion: From Algorithms And Architectural Design To Applications (Devices, Circuits, And Systems)

Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore, India

Syllabus - ESET 369 Embedded Systems Software, Fall 2016

Speaker Identification by Comparison of Smart Methods. Abstract

CS Course Missive

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Required Materials: The Elements of Design, Third Edition; Poppy Evans & Mark A. Thomas; ISBN GB+ flash/jump drive

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Phys4051: Methods of Experimental Physics I

Instructor: Matthew Wickes Kilgore Office: ES 310

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Coding II: Server side web development, databases and analytics ACAD 276 (4 Units)

CS 100: Principles of Computing

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Speech Emotion Recognition Using Support Vector Machine

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

We will use the text, Lehninger: Principles of Biochemistry, as the primary supplement to topics presented in lecture.

Speech Recognition at ICSI: Broadcast News and beyond

THE UNIVERSITY OF SYDNEY Semester 2, Information Sheet for MATH2068/2988 Number Theory and Cryptography

Office Hours: Mon & Fri 10:00-12:00. Course Description

INTRODUCTION TO DECISION ANALYSIS (Economics ) Prof. Klaus Nehring Spring Syllabus

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

FINANCE 3320 Financial Management Syllabus May-Term 2016 *

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Course specification

RM 2234 Retailing in a Digital Age SPRING 2016, 3 credits, 50% face-to-face (Wed 3pm-4:15pm)

Math 96: Intermediate Algebra in Context

PreAP Geometry. Ms. Patricia Winkler

Process to Identify Minimum Passing Criteria and Objective Evidence in Support of ABET EC2000 Criteria Fulfillment

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

CENTRAL MAINE COMMUNITY COLLEGE Introduction to Computer Applications BCA ; FALL 2011

Speaker Recognition. Speaker Diarization and Identification

An Online Handwriting Recognition System For Turkish

Syllabus FREN1A. Course call # DIS Office: MRP 2019 Office hours- TBA Phone: Béatrice Russell, Ph. D.

PELLISSIPPI STATE TECHNICAL COMMUNITY COLLEGE MASTER SYLLABUS APPLIED STATICS MET 1040

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

Spring 2015 Natural Science I: Quarks to Cosmos CORE-UA 209. SYLLABUS and COURSE INFORMATION.

INTERMEDIATE ALGEBRA Course Syllabus

CHEM 6487: Problem Seminar in Inorganic Chemistry Spring 2010

Lecture 9: Speech Recognition

METHODS OF INSTRUCTION IN THE MATHEMATICS CURRICULUM FOR MIDDLE SCHOOL Math 410, Fall 2005 DuSable Hall 306 (Mathematics Education Laboratory)

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

BIOL 2402 Anatomy & Physiology II Course Syllabus:

Stochastic Calculus for Finance I (46-944) Spring 2008 Syllabus

Radius STEM Readiness TM

GEOG 473/573: Intermediate Geographic Information Systems Department of Geography Minnesota State University, Mankato

MinE 382 Mine Power Systems Fall Semester, 2014

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

STA2023 Introduction to Statistics (Hybrid) Spring 2013

PHYS 2426: UNIVERSITY PHYSICS II COURSE SYLLABUS: SPRING 2013

Class Numbers: & Personal Financial Management. Sections: RVCC & RVDC. Summer 2008 FIN Fully Online

Modeling function word errors in DNN-HMM based LVCSR systems

SOUTHERN MAINE COMMUNITY COLLEGE South Portland, Maine 04106

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

Jeff Walker Office location: Science 476C (I have a phone but is preferred) 1 Course Information. 2 Course Description

Syllabus Foundations of Finance Summer 2014 FINC-UB

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Observing Teachers: The Mathematics Pedagogy of Quebec Francophone and Anglophone Teachers

HCI 440: Introduction to User-Centered Design Winter Instructor Ugochi Acholonu, Ph.D. College of Computing & Digital Media, DePaul University

Lecture 1: Basic Concepts of Machine Learning

MAE Flight Simulation for Aircraft Safety

Agent-Based Software Engineering

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Intensive English Program Southwest College

CIS 2 Computers and the Internet in Society -

Dialog Act Classification Using N-Gram Algorithms

Computer Science 1015F ~ 2016 ~ Notes to Students

Transcription:

Course introduction Course logistics Course contents L1: Course introduction Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 1

What is speech processing? Course introduction The study of speech signals and their processing methods Speech processing encompasses a number of related areas Speech recognition: extracting the linguistic content of the speech signal Speaker recognition: recognizing the identity of speakers by their voice Speech coding: compression of speech signals for telecommunication Speech synthesis: computer-generated speech (e.g., from text) Speech enhancement: improving intelligibility or perceptual quality of speech signals The music carried on until ðə mju:zɪk kær[i,ɪ]d ɒn ʌntɪl after midnight and then the ɑ:ftə mɪdnaɪt[, ]ən[d] ðen[, ]ðə drummers became tired and drʌməz b[ɪ,ə]keɪm taɪəd[, ]ən[d] the dancers became cold. ðə dɑ:nsəz b[ɪ,ə]keɪm kəʊld Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 2

Applications of speech processing Human computer interfaces (e.g., speech I/O, affective) Telecommunication (e.g., speech enhancement, translation) Assistive technologies (e.g., blindness/deafness, language learning) Audio mining (e.g., diarization, tagging) Security (e.g., biometrics, forensics) Related disciplines Digital signal processing Natural language processing Machine learning Phonetics Human computer interaction Perceptual psychology Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 3

The course objectives are to familiarize students with Fundamental concepts of speech production and speech perception Mathematical foundations of signal processing and pattern recognition Computational methods for speech analysis, recognition, synthesis, and modification As outcomes, students will be able to Manipulate, visualize, and analyze speech signals Perform various decompositions, codifications, and modifications of speech signals Build a complete speech recognition system using state of the art tools Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 4

Class meetings MWF 11:30-12:20am HRBB 204 Course prerequisites Course logistics ECEN 314 or equivalent, or permission of the instructor Basic knowledge of signals and systems, linear algebra, and probability and statistics Programming experience in a high-level language is required Textbook The course will not have an official textbook and instead will be based on lecture slides developed by the instructor from several sources Additional course materials may be found in the course website http://courses.cs.tamu.edu/rgutier/csce630_f14/ Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 5

Recommended references B. Gold, N. Morgan and D. Ellis, Speech and Audio Signal Processing: Processing and perception of speech and music, 2nd Ed., Wiley, 2011 J. Holmes & W. Holmes, Speech Synthesis and Recognition, 2 nd Ed, CRC Press, 2001 (available online at TAMU libraries) P. Taylor, Text-to-speech synthesis, Cambridge University Press, 2009 L. R. Rabiner and R. W. Schafer, Introduction to Digital Speech Processing, Foundations and Trends in Signal Processing 1(1 2), 2007 T. Dutoit and F. Marques, Applied signal processing, a Matlab-based proof-of-concept, Springer, 2009 J. Benesty, M. M. Sondhi, and Y. Huang (Eds.), Springer Handbook of Speech Processing, 2008 (available online at TAMU libraries) X. Huang, A. Acero and H.-W. Hon, Spoken Language Processing, Prentice Hall, 2001 Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 6

Grading Homework assignments Tests Three assignments, roughly every 2-3 weeks Emphasis on implementation of material presented in class Must be done individually Midterm and final exam Closed-books, closed notes (cheat-sheet allowed) Project Team-based, in groups of up to 3 people Three types: application of existing tools, development of new tools, design of new algorithms Weight (%) Homework 40 Project 30 Midterm 15 Final Exam 15 Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 7

Introduction (3 lectures) Course introduction Speech production and perception Organization of speech sounds Mathematical foundations (4 lectures) Signals and transforms Digital filters Probability, statistics and estimation theory Pattern recognition principles Speech analysis and coding (4 lectures) Short-time Fourier analysis and synthesis Linear prediction of speech Source estimation Cepstral analysis Speech and speaker recognition (6 lectures) Template matching Hidden Markov models Refinements for HMMs Large vocabulary continuous speech recognition The HTK speech recognition system Speaker recognition Speech synthesis and modification (4 lectures) Text-to-speech front-end Text-to-speech back-end Prosodic modification of speech Voice conversion Course contents Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 8

Tentative schedule Week Date Classroom meeting Materials due 9/1 Course introduction 1 9/3 Speech production and perception 9/5 Organization of speech sounds 9/8 Signals and transforms 2 9/10 Signals and transforms 9/12 Digital filters 9/15 Digital filters 3 9/17 Short-time Fourier analysis and synthesis 9/19 Short-time Fourier analysis and synthesis HW1 assigned 9/22 Linear prediction of speech 4 9/24 Linear prediction of speech 9/26 Source estimation 9/29 Source estimation 5 10/1 Cepstral analysis 10/3 Cepstral analysis HW1 due 10/6 Probability, statistics, and estimation theory HW2 assigned 6 10/8 Probability, statistics, and estimation theory 10/10 Pattern recognition principles 10/13 Pattern recognition principles 7 10/15 Template matching 10/17 Hidden Markov models 10/20 Hidden Markov models 8 10/22 Review/catch-up day HW2 due 10/24 Midterm exam 10/27 Refinements for HMMs 9 10/29 Refinements for HMMs 10/31 HTK speech recognition system HW3 assigned 11/3 HTK speech recognition system 10 11/5 Large vocabulary continuous speech recognition 11/7 Large vocabulary continuous speech recognition 11/10 Speaker recognition 11 11/12 Speaker recognition 11/14 Speech synthesis 11/17 Speech synthesis HW3 due 12 11/19 Speech synthesis 11/21 Speech modification 11/24 Proposal presentations Project proposal 13 11/26 Proposal presentations 11/28 Thanksgiving holiday 12/1 Speech modification 14 12/3 Speech modification 12/5 Review/catch-up day 12/8 Final exam 15 12/10 Reading day 12/12 No class 16 12/17 Project presentations: 10:30am-12:30pm Project report Introduction to Speech Processing Ricardo Gutierrez-Osuna CSE@TAMU 9