ELEC9723 Speech Processing

Similar documents
ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

ELEC3117 Electrical Engineering Design

Human Emotion Recognition From Speech

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Speaker Identification by Comparison of Smart Methods. Abstract

Faculty of Health and Behavioural Sciences School of Health Sciences Subject Outline SHS222 Foundations of Biomechanics - AUTUMN 2013

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

Speech Emotion Recognition Using Support Vector Machine

Control Tutorials for MATLAB and Simulink

Speaker recognition using universal background model on YOHO database

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

1. Programme title and designation International Management N/A

STA 225: Introductory Statistics (CT)

Note: Principal version Modification Amendment Modification Amendment Modification Complete version from 1 October 2014

FINS3616 International Business Finance

Modeling function word errors in DNN-HMM based LVCSR systems

Course Development Using OCW Resources: Applying the Inverted Classroom Model in an Electrical Engineering Course

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

General syllabus for third-cycle courses and study programmes in

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

General study plan for third-cycle programmes in Sociology

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Major Milestones, Team Activities, and Individual Deliverables

Delaware Performance Appraisal System Building greater skills and knowledge for educators

Math 181, Calculus I

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Instructor: Matthew Wickes Kilgore Office: ES 310

MKT ADVERTISING. Fall 2016

Business Administration

Digital Signal Processing: Speaker Recognition Final Report (Complete Version)

MSc Education and Training for Development

Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse

Anglia Ruskin University Assessment Offences

Theory of Probability

Modeling function word errors in DNN-HMM based LVCSR systems

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

PELLISSIPPI STATE TECHNICAL COMMUNITY COLLEGE MASTER SYLLABUS APPLIED STATICS MET 1040

A study of speaker adaptation for DNN-based speech synthesis

Strategic Management (MBA 800-AE) Fall 2010

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

RTV 3320: Electronic Field Production Instructor: William A. Renkus, Ph.D.

BIOS 104 Biology for Non-Science Majors Spring 2016 CRN Course Syllabus

Probability and Statistics Curriculum Pacing Guide

Introduction to Forensic Drug Chemistry

HARPER ADAMS UNIVERSITY Programme Specification

Speaker Recognition. Speaker Diarization and Identification

University of Massachusetts Lowell Graduate School of Education Program Evaluation Spring Online

Human Computer Interaction

Process to Identify Minimum Passing Criteria and Objective Evidence in Support of ABET EC2000 Criteria Fulfillment

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

PSYC 2700H-B: INTRODUCTION TO SOCIAL PSYCHOLOGY

BSc (Hons) Banking Practice and Management (Full-time programmes of study)

Instructor Experience and Qualifications Professor of Business at NDNU; Over twenty-five years of experience in teaching undergraduate students.

STUDENT ASSESSMENT, EVALUATION AND PROMOTION

Office Hours: Mon & Fri 10:00-12:00. Course Description

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Name: Giovanni Liberatore NYUHome Address: Office Hours: by appointment Villa Ulivi Office Extension: 312

Henley Business School at Univ of Reading

Marketing Management MBA 706 Mondays 2:00-4:50

Syllabus Education Department Lincoln University EDU 311 Social Studies Methods

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

Chromatography Syllabus and Course Information 2 Credits Fall 2016

MAE Flight Simulation for Aircraft Safety

Introduction to Sociology SOCI 1101 (CRN 30025) Spring 2015

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Mathematics Program Assessment Plan

COURSE BAPA 550 (816): Foundations of Managerial Economics Course Outline

Developing a Distance Learning Curriculum for Marine Engineering Education

BSc (Hons) in International Business

BUSI 2504 Business Finance I Spring 2014, Section A

KOMAR UNIVERSITY OF SCIENCE AND TECHNOLOGY (KUST)

Department of Statistics. STAT399 Statistical Consulting. Semester 2, Unit Outline. Unit Convener: Dr Ayse Bilgin

UCC2: Course Change Transmittal Form

SYLLABUS- ACCOUNTING 5250: Advanced Auditing (SPRING 2017)

Proceedings of Meetings on Acoustics

PATHWAYS IN FIRST YEAR MATHS

Course outline. Code: PHY202 Title: Electronics and Electromagnetism

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

Programme Specification

Firms and Markets Saturdays Summer I 2014

TCH_LRN 531 Frameworks for Research in Mathematics and Science Education (3 Credits)

SOC 175. Australian Society. Contents. S3 External Sociology

University of North Carolina at Greensboro Bryan School of Business and Economics Department of Information Systems and Supply Chain Management

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Ryerson University Sociology SOC 483: Advanced Research and Statistics

Self Study Report Computer Science

Physics 270: Experimental Physics

Carolina Course Evaluation Item Bank Last Revised Fall 2009

UNIVERSITY OF THESSALY DEPARTMENT OF EARLY CHILDHOOD EDUCATION POSTGRADUATE STUDIES INFORMATION GUIDE

content First Introductory book to cover CAPM First to differentiate expected and required returns First to discuss the intrinsic value of stocks

GEOG 473/573: Intermediate Geographic Information Systems Department of Geography Minnesota State University, Mankato

ACADEMIC EXCELLENCE REDEFINED American University of Ras Al Khaimah. Syllabus for IBFN 302 Room No: Course Class Timings:

Speech Recognition at ICSI: Broadcast News and beyond

Monitoring and Evaluating Curriculum Implementation Final Evaluation Report on the Implementation of The New Zealand Curriculum Report to

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

A Pilot Study on Pearson s Interactive Science 2011 Program

Transcription:

ELEC9723 Speech Processing COURSE INTRODUCTION Session 1, 2008 s Course Staff Course conveners: Prof. E. Ambikairajah, room EEG6, ambi@ee.unsw.edu.au Dr Julien Epps, room EE337, j.epps@unsw.edu.au Laboratory demonstrator: Vidhyasaharan Sethu, vidhyasaharan@gmail.com Consultations: You are encouraged to ask questions on the course material before the regular class times (e.g. from 5:45pm) in EEG3 in the first instance, rather than by email. Course details Credits: The course is a 6 UoC course; expected workload is 9-10 hours per week throughout the 12 week session. Contact hours: The course consists of 3 hours of per week, comprising lectures and/or laboratory (a typical class might be 1½ hours of lecture followed by 1½ hours of lab): Lectures: Tuesdays, 6pm 9pm, room EEG3 Lab sessions: Tuesdays, 6pm-9pm, room EE214 Laboratory classes start in week 0 (Introductory MATLAB) Course Information Context and aims ELEC9723 Speech Processing builds directly on students skills and knowledge in digital signal processing gained during ELEC3104 Signal Processing and ELEC4621 Advanced Digital Signal Processing. Speech processing has been one of the main application areas of digital signal processing for several decades now, and as new technologies like voice over IP, automated call centres, voice browsing and biometrics find commercial markets, speech seems set to drive a range of new digital signal processing techniques for some time to come. This course provides not only the technical details of ubiquitous techniques like linear predictive coding, Mel frequency cepstral coefficients, Gaussian mixture models and hidden Markov models, but the rationale behind their application to speech and an understanding of speech as a signal. Contemporary signal processing is almost entirely digital, hence only discrete-time theory is presented in this course. Aims: This course aims to: a. Familiarise you with modeling the vocal tract as a digital, linear time-invariant system. ELEC9723 Speech Processing 1

b. Convey details of a range of commonly used speech feature extraction techniques. c. Provide a basic understanding of multidimensional techniques for speech representation and classification methods. d. Familiarise you with the practical aspects of speech processing, including robustness, and applications of speech processing, including speech enhancement, speaker recognition and speech recognition. e. Give you practical experience with the implementation of several components of speech processing systems. Relation to other courses ELEC9723 Speech Processing is the most advanced course offered by the university on this topic, and serves as an excellent basis from which to commence research in the area. Various aspects of the course bring students up to date with the very latest developments in the field, as seen in recent international conferences and journals, and some of the laboratory work is designed in the style of an empirical research investigation. ELEC9723 is well complemented by ELEC9724 Audio and Electroacoustics, which deals with many other signal processing methods and gives an understanding of human auditory perception (also a key part of speech processing), discusses compression techniques (many related to speech coding) and an understanding of audio signals. ELEC9723 is also well complemented by ELEC9722 Digital Image Processing, which gives an insight into two-dimensional signal processing and image signals. ELEC9721 Digital Signal Processing Theory and Applications provides an excellent basis for Speech Processing, however for students who have not already completed this course (or ELEC4621), it is recommended for future study. Pre-requisites: The minimum pre-requisite for the course is ELEC3104, Signal Processing (or equivalent). Knowledge from either ELEC4621 or ELEC9721 is highly desirable. Assumed knowledge: It is essential that you are familiar with the sampling theorem, digital filter design, the discrete Fourier transform, random signals and autocorrelation and frame-by-frame processing. Students who are not confident in their knowledge from previous signal processing courses (especially the topics mentioned) are strongly advised to revise their previous course materials as quickly as possible to avoid difficulties in this course. Previous course code: The course replaces previous course ELEC9344 Speech and Audio Processing. Learning outcomes On successful completion you should be able to: 1. Express the speech signal in terms of its time domain and frequency domain representations and the different ways in which it can be modelled; ELEC9723 Speech Processing 2

2. Derive expressions for simple features used in speech classification applications; 3. Explain the operation of example algorithms covered in lectures, and discuss the effects of varying parameter values within these; 4. Synthesise block diagrams for speech applications, explain the purpose of the various blocks, and describe in detail algorithms that could be used to implement them; 5. Implement components of speech processing systems, including speech recognition and speaker recognition, in MATLAB. 6. Deduce the behaviour of previously unseen speech processing systems and hypothesise about their merits. The course delivery methods and course content address a number of core UNSW graduate attributes; these include: a. The capacity for analytical and critical thinking and for creative problem-solving, which is addressed by the tutorial exercises and laboratory work. b. The ability to engage in independent and reflective learning, which is addressed by tutorial exercises together with self-directed study. c. The skills of effective communication, which are addressed by the viva-style verbal assessment in the laboratory. d. Information literacy, which is addressed by the homework. Please refer to http://www.ltu.unsw.edu.au/content/userdocs/gradattreng.pdf for more information about graduate attributes. Teaching strategies The course consists of the following elements: lectures, laboratory work, and home work comprising self-guided study and a problem sheet. Lectures Selected lectures will be delivered using DVD-based lectures. These classes will be presented as normal, however a DVD recording of the live lecture will be distributed for your own self-directed study at the conclusion of the class. During the lectures, techniques for the analysis, modeling and processing of the digital speech signal will be presented. The lectures provide you with a focus on the core material in the course, together with qualitative, alternative explanations to aid your understanding. Various examples will be given, to enrich the analytical course content. The lectures materials distributed in class (or via the course web site) will give a good guide to the course syllabus, but you will need to supplement them with additional reading, of the recommended text book and/or other materials recommended by the lecturing staff. In particular, you should not assume that attendance at all lectures (even with a glance or two through the notes), on its own, is sufficient to pass the course. Laboratory work The lecture schedule is deliberately designed to gain practical, hands-on exposure to the concepts conveyed in lectures soon after they are conveyed in class. Generally there will ELEC9723 Speech Processing 3

be around one week between the introduction of a topic in lectures and a laboratory exercise on the same topic, sufficient time in which to revise the lecture, attempt related problems and prepare for the laboratory. The laboratory work provides you with handson design experience and exposure to simulation tools and algorithms used widely in speech processing. You must be pre-prepared for the laboratory sessions: the laboratory sessions are short, so this is only possible way to complete the given tasks. Laboratory classes will start in week 0 of session, with the compulsory Introductory MATLAB laboratory. Regular laboratory classes will start in week 1. You will need to bring to the laboratories: - A USB drive for storing MATLAB script files - A laboratory notebook for recording your work - Your lecture notes, laboratory preparation and/or any other relevant course materials Home work and Problem sheets The lectures can only cover the course material to a certain depth; you must read the textbook(s) and reflect on its content as preparation for the lectures to fully appreciate the course material. Home preparation for laboratory work provides you with the background knowledge you will need. The problem sheets aim to provide in-depth quantitative and qualitative understanding of speech processing theory and methods. Together with your attendance at classes, your self-directed reading, completion of problems from the problem sheet and reflection on course materials will form the basis of your understanding of this course. Assessment Laboratory work: 30% Mid-session exam: 10% Final examination: 60% Laboratory work: Starting in week 2, the laboratory work will be assessed in order to ensure that you are studying and that you understand the course material. The laboratory assessment is conducted live during the lab sessions, so it is essential that you arrive at each lab having revised lecture materials (and attempted problems from the problem sheet) in advance of each laboratory, and having completed any requested preparation for the labs. Without preparation, marks above 50% may be difficult to obtain. No lab reports are required in this course. During the laboratory, you may consult with others in the class, but you must keep your own notes of the laboratory. In particular, note that laboratory assessment will be conducted individually, not on a per-group basis. Please also note that you must pass the laboratory component in order to pass the course. Mid-session examination: The mid-session examination tests your general understanding of the course material, and questions may be drawn from any course material up to the end of week 6. ELEC9723 Speech Processing 4

Final examination: The exam in this course is a standard closed-book 3 hours written examination, comprising five compulsory questions. University approved calculators are allowed. The examination tests analytical and critical thinking and general understanding of the course material in a controlled fashion. Questions may be drawn from any aspect of the course, unless specifically indicated otherwise by the lecture staff. Please note that you must pass the final exam in order to pass the course. Course Schedule Week Lecture Ref Lecturer Laboratory 0 No lecture Ambikairajah Introductory MATLAB Mar 4 th 1 Introduction to speech [1] Ambikairajah Introductory speech analysis no assessment processing 2 Speech analysis [1] Ambikairajah Lab 1: Spectral analysis 3 Linear predictive coding [1,2] Ambikairajah Lab 2: Feature extraction 4 Time-frequency analysis [1] Ambikairajah Lab 3: Linear predictive coding 5 Speech enhancement [1] Ambikairajah Lab 4: Speech synthesis using LPC 6 Speech synthesis Chen (NICTA) No lab 7 Apr 29 th Mid-session examination, duration 1 hour 15 min Front-end processing [1] Epps No lab 8 Robust front-end, VAD Epps Lab 5: Front-end processing 9 Clustering and Gaussian Epps Lab 6: Robust front-end mixture models 10 Speaker Recognition [1] Epps Lab 6: Robust front-end 11 Hidden Markov models [2] Epps Lab 7: Speaker recognition 12 Speech recognition [2] Epps Lab 8: Speech recognition Resources Textbooks Prescribed textbook The following textbook is prescribed for the course: [1] Quatieri, T. F. (2002). Discrete-Time Speech Signal Processing, Prentice-Hall, New Jersey. You may want to check the coverage of this text before purchasing, as some topics in the syllabus are not featured. Unfortunately there is no single text that covers all topics in a satisfactory depth. Additional references, listed below and at the end of some lecture note sets, will in combination provide complete coverage of the course. Lecture notes will be ELEC9723 Speech Processing 5

provided, however note that these do not treat each topic exhaustively and additional reading is required. Reference books The following books are good additional resources for speech processing topics: [2] Rabiner, L. R., and Juang, B.-H. (1993). Fundamentals of Speech Recognition, Prentice-Hall, New Jersey. Books covering assumed knowledge The following books cover material which is assumed knowledge for the course: On-line resources Some additional on-line resources relevant to the course: Resource: course webct http://vista.elearning.unsw.edu.au library resources http://info.library.unsw.edu.au/web/ services/teaching.html VOICEBOX: Speech Processing Toolbox for MATLAB http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html Other Matters Academic Honesty and Plagiarism Plagiarism is the unacknowledged use of other peoples work, including the copying of assignment works and laboratory results from other students. Plagiarism is considered a serious offence by the University and severe penalties may apply. For more information about plagiarism, please refer to http://www.lc.unsw.edu.au/plagiarism Continual Course Improvement The course is under constant revision in order to improve the learning outcomes of its students. Please forward any feedback (positive or negative) on the course to the course convener or via the Course and Teaching Evaluation and Improvement Process (surveys at the end of the course). Administrative Matters On issues and procedures regarding such matters as special needs, equity and diversity, occupational heath and safety, enrolment, rights, and general expectations of students, please refer to the School policies, see http://scoff.ee.unsw.edu.au/. CATEI Results (S2, 2007) The university strongly encourages students to give their feedback at the conclusion of the course. Results from an online survey of ELEC9344 Speech and Audio Processing in 2007 are shown below. In 2008, we will be endeavouring to improve on the quality of the feedback given to you, developing thinking skills, and tutorial support. Please note that the survey assumes that respondents have attended at least 80% of the class contact time. ELEC9723 Speech Processing 6 A D

% % Q1. The aims of this course were clear to me Q2. I was given helpful feedback on how I was going in the course 89 11 Q3. The course was challenging and interesting Q4. Q5. Q6. Q7. Q8. The course provided effective opportunities for active student participation in learning activities The course was effective for developing my thinking skills (e.g. critical analysis, problem solving). I was provided with clear information about the assessment requirements for this course. The assessment methods and tasks in this course were appropriate given the course aims The course advanced my ability for independent learning and critical analysis 78 22 Good resources in laboratories and tutorials supported the learning Q9. 89 11 process Q10. Overall, I was satisfied with the quality of this course ELEC9723 Speech Processing 7