Syllabus for the course «Stochastic Modelling»

Similar documents
ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Performance Modeling and Design of Computer Systems

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Evolutive Neural Net Fuzzy Filtering: Basic Description

Probabilistic Latent Semantic Analysis

CS/SE 3341 Spring 2012

Introduction to Simulation

CSL465/603 - Machine Learning

Human Emotion Recognition From Speech

Python Machine Learning

Lecture 1: Machine Learning Basics

Generative models and adversarial training

Speech Emotion Recognition Using Support Vector Machine

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

MASTER OF PHILOSOPHY IN STATISTICS

Penn State University - University Park MATH 140 Instructor Syllabus, Calculus with Analytic Geometry I Fall 2010

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

WHEN THERE IS A mismatch between the acoustic

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Lecture 10: Reinforcement Learning

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

TD(λ) and Q-Learning Based Ludo Players

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

A Reinforcement Learning Variant for Control Scheduling

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

4 th year course description

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Probabilistic Model Checking of DTMC Models of User Activity Patterns

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Modeling function word errors in DNN-HMM based LVCSR systems

An Introduction to Simulation Optimization

Georgetown University at TREC 2017 Dynamic Domain Track

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Stochastic Calculus for Finance I (46-944) Spring 2008 Syllabus

A study of speaker adaptation for DNN-based speech synthesis

Learning Methods in Multilingual Speech Recognition

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

1.11 I Know What Do You Know?

Math 96: Intermediate Algebra in Context

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Axiom 2013 Team Description Paper

Course Syllabus for Math

BMBF Project ROBUKOM: Robust Communication Networks

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

Australian Journal of Basic and Applied Sciences

Welcome to. ECML/PKDD 2004 Community meeting

MAT 122 Intermediate Algebra Syllabus Summer 2016

arxiv: v1 [cs.se] 20 Mar 2014

Modeling function word errors in DNN-HMM based LVCSR systems

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Math 181, Calculus I

Foothill College Summer 2016

stateorvalue to each variable in a given set. We use p(x = xjy = y) (or p(xjy) as a shorthand) to denote the probability that X = x given Y = y. We al

Xinyu Tang. Education. Research Interests. Honors and Awards. Professional Experience

Strategic Management (MBA 800-AE) Fall 2010

An Online Handwriting Recognition System For Turkish

FINN FINANCIAL MANAGEMENT Spring 2014

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Radius STEM Readiness TM

Artificial Neural Networks written examination

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

(Sub)Gradient Descent

Probability and Statistics Curriculum Pacing Guide

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

Learning Methods for Fuzzy Systems

Probability and Game Theory Course Syllabus

EDEXCEL FUNCTIONAL SKILLS PILOT. Maths Level 2. Chapter 7. Working with probability

Control Tutorials for MATLAB and Simulink

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

Lecture 1: Basic Concepts of Machine Learning

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Learning Probabilistic Behavior Models in Real-Time Strategy Games

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Corrective Feedback and Persistent Learning for Information Extraction

Reinforcement Learning by Comparing Immediate Reward

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

Reducing Features to Improve Bug Prediction

A student diagnosing and evaluation system for laboratory-based academic exercises

Navigating the PhD Options in CMS

Speech Recognition at ICSI: Broadcast News and beyond

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Laboratorio di Intelligenza Artificiale e Robotica

High-level Reinforcement Learning in Strategy Games

MGMT 5303 Corporate and Business Strategy Spring 2016

STA 225: Introductory Statistics (CT)

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Syllabus Foundations of Finance Summer 2014 FINC-UB

Answer Key Applied Calculus 4

International Journal of Advanced Networking Applications (IJANA) ISSN No. :

Transcription:

Government of Russian Federation Federal State Autonomous Educational Institution of High Professional Education National Research University High School of Economics Faculty of Computer Science Syllabus for the course «Stochastic Modelling» 010402 «Applied Mathematics and Informatics» «Data Sciences», Master Program Approved by: Moscow, 2016

1. Scope of Use The present program establishes minimum demands of students knowledge and skills, and determines content of the course. The present syllabus is aimed at department teaching the course, their teaching assistants, and students of the Master of Science program 010402 «Applied Mathematics and Informatics». This syllabus meets the standards required by: Educational standards of ; Educational program «Data Sciences» of Federal Master s Degree Program 010402, 2014; University curriculum of the Master s program in «Data Science» (010402) for 2016. Summary The first part of the course deals with unsupervised learning techniques for independent observations (including clustering and probabilistic principal component analysis), before moving on to models for sequential data (including the Poisson process and Kalman filtering). Finally, we introduce basic sampling techniques. The second part of the course is devoted to the study of stochastic modeling and simulation of random processes. It covers Markov chains with discrete and continuous time, methods of finding stationary states and processes described by Markov chains. The course proceeds with practical algorithms based on Markov chains: the hidden Markov model (HMM), Markov random field (MRF), Monte Carlo Methods in Markov chains (MCMC). 2. Learning Objectives The main objective of the course is to learn the fundamental principles used in random systems modeling, as well as advanced algorithms based on them. These algorithms are widely used in modern technologies of information retrieval, processing and recognition of speech and language, bioinformatics, and many others. Understanding of the basic algorithms required for the training of professionals in the field of mathematical modeling. The learning objective of the course «Stochastic Modeling» is to provide students with essential tools including Clustering (K-means and Gaussian Mixture Models); EM algorithm; Principal Component Analysis (PCA), Probabilistic PCA, Kernel PCA; Kalman filtering; Basing sampling methods; Random processes and Markov chains; Hidden Markov chains; Monte Carlo methods. 3. Learning outcomes After completing the study of the discipline «Stochastic Models» the student should: Know essential techniques for clustering; Know principal component analysis, and its generalisations;

Know basic sampling methods; Know basic notions in a theory of random processes and Markov chains; Apply the EM algorithm in a wide variety of applications; Forecast time series using Kalman filtering; Apply Markov chains to simulate various random processes in real-world problems (speech recognition, a text author problem, etc.) Understand the capabilities and limitations of existing algorithms. 4. Place of the discipline in the Master s program structure The course «Stochastic Models» is an elective course taught in the second year of the Master s program «Data Science». Prerequisites Students are assumed to have a good level in probability theory and statistics, discrete mathematics, advanced mathematics. The following knowledge and competence are needed to study the discipline: A good command of the English language, both orally and written. A good knowledge in probability theory and statistics. After completing the study of the discipline «Stochastic Modeling» the student should have the following competences: Competence Code Code (UC) The ability to reflect developed methods of activity. The ability to propose a model to invent and test methods and tools of professional activity Descriptors (indicators of achievement of the result) SSC-М1 The student is able to C-1 reflect developed methods in stochastic modeling. C-2 SC-М2 The student is able to model randomness using probabilistic models and perform statistical inference to estimate the model parameters. Educative forms and methods aimed at generation and development of the competence Lectures and tutorials. Examples covered during the lectures and tutorials. Assignments. Capability of development of new research methods, change of scientific and industrial profile of self-activities C-3 SC-М3 Students obtain necessary knowledge in stochastic and probabilistic models, sufficient to apply them and develop new methods in other disciplines. Assignments, additional material/reading provided.

5. Schedule Two pairs consist of 2 academic hour for lecture followed by 2 academic hour for tutorial after lecture. Topic Total Contact hours hours Lectures Seminars Self-study 1. Clustering 24 4 4 16 2. Principal Component Analysis and its extensions 36 6 6 24 3. Kalman filtering 32 4 4 24 4. Sampling Methods 16 2 2 12 5. Introduction to Markov process 14 2 2 10 6. Discrete time Markov Chains 17 2 4 11 7. Continuous time Markov chains 17 2 4 11 8. Hidden Markov models (HMM) 19 4 4 11 9. Markov chain Monte Carlo methods (MCMC) 19 4 4 11 Total: 190 30 30 130 Requirements and Grading Type of work Module # Homework1 1 1 One written homework. Type of Cumulative of written and coding works during grading Homework2 2 4 the 2nd module MidTerm Exam 1 1 Written exam. Preparation time 180 min. Final Exam 2 1 Written exam. Preparation time 80 min. 9. Assessment The assessment consists of one homework, handed out to the students during the first module, and average value of homeworks from the second module. MidTerm Exam covers the course material from the first module. Final assessment is the final exam. Students have to demonstrate knowledge of all the topics covered in the course. The grade formula: The exam is worth 30% of the final mark. Final course mark is obtained from the following formula: Final=0.15*(Homework1)+0.25*(MidTerm)+0.3*(Homework2)+0.3*(Exam).

The grades are rounded in favour of examiner/lecturer with respect to regularity of class and home works. All grades, having a fractional part greater than 0.5, are rounded up. Table of Grade Accordance 1 - very bad 2 bad 3 no pass Ten-point Grading Scale Five-point Grading Scale Unsatisfactory - 2 4 pass 5 highly pass Satisfactory 3 6 good 7 very good Good 4 8 almost excellent 9 excellent 10 perfect Excellent 5 FAIL PASS 10. Course Description The following list describes main mathematical definitions which will be considered in the course in correspondence with lecture order. Topic 1. Clustering. K-means algorithm, initialization of the K-means algorithm using k++ procedure, Gaussian mixtures model, general form of the EM algorithm, EM akgorithm applied to gaussian mixtures, link between K-means and the Gaussian mixture model. 1. Christopher Bishop. Pattern Recognition and Machine Learning, Springer, 2006. 1. D. Arthur. and S. Vassilvitskii (2007). k-means++: the advantages of careful seeding. In ACM-SIAM Symposium on Discrete Algorithms. Topic 2. PCA Review of Principal Component Analysis, probabilistic PCA (ppca) and factor analysis, learning ppca using the EM algorithm and direct maximization of the likelihood, kernel PCA. 1. Christopher Bishop. Pattern Recognition and Machine Learning, Springer, 2006. 1. M.E. Tipping and C.M. Bishop. Probabilistic principal component analysis. J. R. statist. Soc. B (1999) 2. B. Scholkopf, A. Smola and K-R. Muller. Kernel Principal Component Analysis. Artificial Neural Networks ICANN (1997). Vol 127 pp 583-588.

Topic 3. Kalman filtering Derivation of Kalman filtering equations, definition of Kalman gain, learning the model using the EM algorithm. 1. Christopher Bishop. Pattern Recognition and Machine Learning, Springer, 2006. 1. R.E. Kalman (1960). A new approach to linear filtering and prediction problems. Transactions of the ASME-Journal of Basic Engineering. Vol 82, pp 35-45. Topic 4. Sampling methods Rejection sampling, important sampling. 1. Christopher Bishop. Pattern Recognition and Machine Learning, Springer, 2006. Topic 5. Introduction to Markov process Stochastic process. Markov process. Discrete and continues time processes. Probabilities. 1. Sheldon M. Ross. Introduction to Probability Models, Tenth Edition. Academic Press, 2009. 2. Olive Ibe. Markov Processes for Stochastic Modeling. Academic Press, 2009 1. Howard M. Taylor, Samuel Karlin, An introduction to Stochastic Modeling, Academic 1998 Topic 6. Discrete time Markov Chains Discrete time Markov chains (DTMC). Applications of DTMC. Stationary DTMC. Transition matrix. N-step transition matrix. The Chapman-Kolmogorov equations. Classification of states. Reachable and communicating states. Equivalence classes. Communicating classes. Irreducibility. Periodicity. Recurrence and transience. First passage time. Positive recurrence. Recurrence time. Ergodic states. Fundamental theorem of Markov chains. Steady state. Steady states equations. Absorbing states. Finite absorbing chains. Transition matrix structure. Absorption probabilities. Limiting distributions. 1. Sheldon M. Ross. Introduction to Probability Models, Tenth Edition. Academic Press, 2009. 2. Olive Ibe. Markov Processes for Stochastic Modeling. Academic Press, 2009 1. Howard M. Taylor, Samuel Karlin, An introduction to Stochastic Modeling, Academic 1998 2. J. R. Norris. Markov Chains. Cambridge University Press, 1998. Topic 7. Continuous time Markov chains Continuous time stochastic process (CTMC). Examples of CTMC. Transition probability functions. Holding time and exponential distribution. Embedded DTMC. Transition rates. Absorbing states. Transition probability diagram. Classification of states. Long run behavior of

CTMC. Positive recurrence. Fundamental theorem for CTMC. Steady state equations. Example of birth and death processes. 1. Sheldon M. Ross. Introduction to Probability Models, Tenth Edition. Academic Press, 2009. 2. Olive Ibe. Markov Processes for Stochastic Modeling. Academic Press, 2009 Howard M. Taylor, Samuel Karlin, An introduction to Stochastic Modeling, Academic 1998 J. R. Norris. Markov Chains. Cambridge University Press, 1998. Topic 8. Hidden Markov models (HMM) Introduction to Hidden Markov Model. Prediction of states. Evaluation of model parameters. Transition and signal matrices. Three main problems of HMM. Evaluation of signal sequence. Forward-backward procedure. Finding the most probable sequence. Decoding. Viterbi Algorithm. Fining model parameters- training HMMs. Baum-Welch Algorithm. HMM applications in speech recognition. 1. Olive Ibe. Markov Processes for Stochastic Modeling. Academic Press, 2009 2. Pierre Bremaud. Markov Chains, Givvs Fields, Monte Carlo Simulation, and Queues. Springer, 1998 1. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Lawrence R. Rabiner. Proc of IEEE, Vol 77, N 2, 1989, pp 257-286 2. Hidden Markov Models for Speech Recognition. B. H. Juang; L. R. Rabiner, Technometrics, Vol. 33, No. 3. (Aug., 1991), pp. 251-272. Topic 9. Markov chain Monte Carlo (MCMC) Random walks. Sampling from a distribution. Monte Carlo integrations. Construction of Markov chains. Metropolis-Hastings algorithm Olive Ibe. Markov Processes for Stochastic Modeling. Academic Press, 2009 Pierre Bremaud. Markov Chains, Givvs Fields, Monte Carlo Simulation, and Queues. Springer, 1998 Monte Carlo sampling methods using Markov chains and their applications. W. K. Hastings, Biometrika, 57,1, 1970, p. 97. Equation of State Calculations by Fast Computing Machines. Metropolis, N.,Rosenbluth, A. W.. Rosenbluth, M. N., Teller. A. H.. and Teller, E, Journal of Chemichal Physics, 21, 1953, p 1087-1092.

11. Term Educational Technology The following educational technologies are used in the study process: discussion and analysis of the results during the tutorials; solutions of exercises are posted on the subject website for the student to practice; assignments to test the progress of the students; consultation time on Monday mornings (1st module). 12. Recommendations for course lecturer Course lecturer is advised to use interactive learning methods, which allow participation of the majority of students, such as slide presentations, combined with writing materials on board, and usage of interdisciplinary papers to present connections between probability theory and statistics. The course is intended to be adaptive, but it is normal to differentiate tasks in a group if necessary, and direct fast learners to solve more complicated tasks. 13. Recommendations for students and Final exam questions The course is interactive. Lectures are combined with classes. Students are invited to ask questions and actively participate in group discussions. There will be special office hours for students, which would like to get more precise understanding of each topic. The lecturer is ready to answer your questions online by official e-mails that you can find in the contacts section. Additional references found in section 15.1 are suggested to help students in their understanding of the material. This course is taught in English, and students can ask teaching assistants to help them with the language. Example of control questions for Module 2. 1. Given Markov chain with two states {0, 1} and transition matrix P = [1/3, 2/3; 3/4,1/4]. If at n = 0 chain is in the state 0, compute the probability of it being in state 1 after n = 3 steps. What is the probability of being in state 1 after long time? 2. A player has a $1 at the beginning of the game and at every step of the game can win a $1 with the probability p and lose a $1 with probability 1 p. The game stops when the player either ruined or won $3. Draw a Markov chain representing this game. Compute probabilities of win and loss. Compute their numerical values of a fair game with p = 1/2. 3. Formulate the fundamental theorem of continues time Markov chains. What conditional are equivalent to the stationary state equations? Example of exam questions (not complete, only FYI) 1. Describe the fundamental Markov chain theorem and its necessary conditions. Give an example when these conditions are not satisfied. 2. Find all communication classes of a Markov chain with 7 states S = {A, B, C, D, E, F, G} and given transition matrix. Draw the chain diagram. Find communication classes and identify which states are recurrent and transient and find state periods 3. Consider Markov chain consisting of two states S = {1, 2} and known transition matrix. What is the probability for this system to be in the second state after a long time? 4. Find the value of the P (2, 3) element of the Markov chain transition matrix based on the sequence of observable states: 13231313132212132312

5. Compute the probability of generating sequence of states «BAB» by a hidden Markov model with two states {s,t}. Initial states have probabilities p(s) = 0.85 and p(t) = 0.15. and signal emission probabilities are Ps(A) = 0.4, Ps(B) = 0.6, Pt(A) = 0.5, Pt(B) = 0.5. Transition matrix is given. 6. Apply the EM algorithm to the gaussian mixture model. 7. Describe how probabilistic PCA (ppca) is related to conventional PCA. What are the advantages of considering ppca over PCA? 8. Derive the Kalman filtering equations. Pool of questions to control quality of studying the course (2nd module) Give a definition of Markov process. Give an example of a Markov process. What is a period of a state? What is a fundamental theorem of Markov chains? Give a definition of continues time Markov process. What is transition intensity? Describe an example of hidden Markov model. What are the three main problems related to HMM? Describe the construction of Markov chain in MCMC. Describe how probability distribution is generated from MCMC. The final exam will test the students' understanding on each topic discussed during the lectures. 14. Reading and Materials 14.1 Recommended Reading Christopher Bishop. Pattern Recognition and Machine Learning, Springer, 2006. Sheldon M. Ross. Introduction to Probability Models, 10th Edition. Academic Press, 2009. Olive Ibe. Markov Processes for Stochastic Modeling. Academic Press, 2009 Pierre Bremaud. Markov Chains, Gives Fields, Monte Carlo Simulation, and Queues. Springer, 1998 14.2 Supplementary reading 1. William J. Stewart. Introduction to the Numerical Solution of Markov Chains. Princeton University Press, 1994. 2. Howard M. Taylor, Samuel Karlin, An introduction to Stochastic Modeling, Academic 1998 3. J. R. Norris. Markov Chains. Cambridge University Press, 1998. 4. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Lawrence R. Rabiner. Proc of IEEE, Vol 77, N 2, 1989, pp 257-286 5. Hidden Markov Models for Speech Recognition. B. H. Juang; L. R. Rabiner, Technometrics, Vol. 33, No. 3. (Aug., 1991), pp. 251-272. 6. Understanding the Metropolis-Hastings Algorithm. S. Chib, E. Greenberg, The American Statistician, Vol. 49, No. 4.,1995, pp. 327-335. 7. A guided walk Metropolis algorithm. Paul Gustafson, Statistics and computing (1998) 8, 357-364

8. Monte Carlo sampling methods using Markov chains and their applications. W. K. Hastings, Biometrika, 57,1, 1970, p. 97. 9. Equation of State Calculations by Fast Computing Machines. Metropolis, N.,Rosenbluth, A. W.. Rosenbluth, M. N., Teller. A. H.. and Teller, E, Journal of Chemichal Physics, 21, 1953, p 1087-1092. 10. Explaining the Gibbs Sampler. G. Casella, E.I. Georrge. The American Statistician, V 46, N3, 1992, pp 167-174 11. Optimization by Simulated Annealing. S. Kirkpatrick et al. Science, Vol. 220, No. 4598, 1983, pp. 671-680. 12. D. Arthur. and S. Vassilvitskii (2007). k-means++: the advantages of careful seeding. In ACM-SIAM Symposium on Discrete Algorithms. 13. M.E. Tipping and C.M. Bishop. Probabilistic principal component analysis. J. R. statist. Soc. B (1999) 14. B. Scholkopf, A. Smola and K-R. Muller. Kernel Principal Component Analysis. Artificial Neural Networks ICANN (1997). Vol 127 pp 583-588. 15. R.E. Kalman (1960). A new approach to linear filtering and prediction problems. Transactions of the ASME-Journal of Basic Engineering. Vol 82, pp 35-45. 14.3 Course webpage All material of the discipline will be posted at http://www.cs.hse.ru/ai/sm Material from the previous years is placed here: http://www.leonidzhukov.net/hse/2014/stochmod/ Students are provided with links to the lecture notes, problem sheets and their solutions, assignments and their solutions, and additional readings. 16 Equipment The course requires a laptop and projector. The following Software products could be used for practical classes: Matlab, Octave, R, Python.