Syllabus for the course «Stochastic Modelling»

Government of Russian Federation Federal State Autonomous Educational Institution of High Professional Education National Research University High School of Economics Faculty of Computer Science Syllabus for the course «Stochastic Modelling» 010402 «Applied Mathematics and Informatics» «Data Sciences», Master Program Approved by: Moscow, 2016

1. Scope of Use The present program establishes minimum demands of students knowledge and skills, and determines content of the course. The present syllabus is aimed at department teaching the course, their teaching assistants, and students of the Master of Science program 010402 «Applied Mathematics and Informatics». This syllabus meets the standards required by: Educational standards of ; Educational program «Data Sciences» of Federal Master s Degree Program 010402, 2014; University curriculum of the Master s program in «Data Science» (010402) for 2016. Summary The first part of the course deals with unsupervised learning techniques for independent observations (including clustering and probabilistic principal component analysis), before moving on to models for sequential data (including the Poisson process and Kalman filtering). Finally, we introduce basic sampling techniques. The second part of the course is devoted to the study of stochastic modeling and simulation of random processes. It covers Markov chains with discrete and continuous time, methods of finding stationary states and processes described by Markov chains. The course proceeds with practical algorithms based on Markov chains: the hidden Markov model (HMM), Markov random field (MRF), Monte Carlo Methods in Markov chains (MCMC). 2. Learning Objectives The main objective of the course is to learn the fundamental principles used in random systems modeling, as well as advanced algorithms based on them. These algorithms are widely used in modern technologies of information retrieval, processing and recognition of speech and language, bioinformatics, and many others. Understanding of the basic algorithms required for the training of professionals in the field of mathematical modeling. The learning objective of the course «Stochastic Modeling» is to provide students with essential tools including Clustering (K-means and Gaussian Mixture Models); EM algorithm; Principal Component Analysis (PCA), Probabilistic PCA, Kernel PCA; Kalman filtering; Basing sampling methods; Random processes and Markov chains; Hidden Markov chains; Monte Carlo methods. 3. Learning outcomes After completing the study of the discipline «Stochastic Models» the student should: Know essential techniques for clustering; Know principal component analysis, and its generalisations;

Know basic sampling methods; Know basic notions in a theory of random processes and Markov chains; Apply the EM algorithm in a wide variety of applications; Forecast time series using Kalman filtering; Apply Markov chains to simulate various random processes in real-world problems (speech recognition, a text author problem, etc.) Understand the capabilities and limitations of existing algorithms. 4. Place of the discipline in the Master s program structure The course «Stochastic Models» is an elective course taught in the second year of the Master s program «Data Science». Prerequisites Students are assumed to have a good level in probability theory and statistics, discrete mathematics, advanced mathematics. The following knowledge and competence are needed to study the discipline: A good command of the English language, both orally and written. A good knowledge in probability theory and statistics. After completing the study of the discipline «Stochastic Modeling» the student should have the following competences: Competence Code Code (UC) The ability to reflect developed methods of activity. The ability to propose a model to invent and test methods and tools of professional activity Descriptors (indicators of achievement of the result) SSC-М1 The student is able to C-1 reflect developed methods in stochastic modeling. C-2 SC-М2 The student is able to model randomness using probabilistic models and perform statistical inference to estimate the model parameters. Educative forms and methods aimed at generation and development of the competence Lectures and tutorials. Examples covered during the lectures and tutorials. Assignments. Capability of development of new research methods, change of scientific and industrial profile of self-activities C-3 SC-М3 Students obtain necessary knowledge in stochastic and probabilistic models, sufficient to apply them and develop new methods in other disciplines. Assignments, additional material/reading provided.

5. Schedule Two pairs consist of 2 academic hour for lecture followed by 2 academic hour for tutorial after lecture. Topic Total Contact hours hours Lectures Seminars Self-study 1. Clustering 24 4 4 16 2. Principal Component Analysis and its extensions 36 6 6 24 3. Kalman filtering 32 4 4 24 4. Sampling Methods 16 2 2 12 5. Introduction to Markov process 14 2 2 10 6. Discrete time Markov Chains 17 2 4 11 7. Continuous time Markov chains 17 2 4 11 8. Hidden Markov models (HMM) 19 4 4 11 9. Markov chain Monte Carlo methods (MCMC) 19 4 4 11 Total: 190 30 30 130 Requirements and Grading Type of work Module # Homework1 1 1 One written homework. Type of Cumulative of written and coding works during grading Homework2 2 4 the 2nd module MidTerm Exam 1 1 Written exam. Preparation time 180 min. Final Exam 2 1 Written exam. Preparation time 80 min. 9. Assessment The assessment consists of one homework, handed out to the students during the first module, and average value of homeworks from the second module. MidTerm Exam covers the course material from the first module. Final assessment is the final exam. Students have to demonstrate knowledge of all the topics covered in the course. The grade formula: The exam is worth 30% of the final mark. Final course mark is obtained from the following formula: Final=0.15*(Homework1)+0.25*(MidTerm)+0.3*(Homework2)+0.3*(Exam).

The grades are rounded in favour of examiner/lecturer with respect to regularity of class and home works. All grades, having a fractional part greater than 0.5, are rounded up. Table of Grade Accordance 1 - very bad 2 bad 3 no pass Ten-point Grading Scale Five-point Grading Scale Unsatisfactory - 2 4 pass 5 highly pass Satisfactory 3 6 good 7 very good Good 4 8 almost excellent 9 excellent 10 perfect Excellent 5 FAIL PASS 10. Course Description The following list describes main mathematical definitions which will be considered in the course in correspondence with lecture order. Topic 1. Clustering. K-means algorithm, initialization of the K-means algorithm using k++ procedure, Gaussian mixtures model, general form of the EM algorithm, EM akgorithm applied to gaussian mixtures, link between K-means and the Gaussian mixture model. 1. Christopher Bishop. Pattern Recognition and Machine Learning, Springer, 2006. 1. D. Arthur. and S. Vassilvitskii (2007). k-means++: the advantages of careful seeding. In ACM-SIAM Symposium on Discrete Algorithms. Topic 2. PCA Review of Principal Component Analysis, probabilistic PCA (ppca) and factor analysis, learning ppca using the EM algorithm and direct maximization of the likelihood, kernel PCA. 1. Christopher Bishop. Pattern Recognition and Machine Learning, Springer, 2006. 1. M.E. Tipping and C.M. Bishop. Probabilistic principal component analysis. J. R. statist. Soc. B (1999) 2. B. Scholkopf, A. Smola and K-R. Muller. Kernel Principal Component Analysis. Artificial Neural Networks ICANN (1997). Vol 127 pp 583-588.

Topic 3. Kalman filtering Derivation of Kalman filtering equations, definition of Kalman gain, learning the model using the EM algorithm. 1. Christopher Bishop. Pattern Recognition and Machine Learning, Springer, 2006. 1. R.E. Kalman (1960). A new approach to linear filtering and prediction problems. Transactions of the ASME-Journal of Basic Engineering. Vol 82, pp 35-45. Topic 4. Sampling methods Rejection sampling, important sampling. 1. Christopher Bishop. Pattern Recognition and Machine Learning, Springer, 2006. Topic 5. Introduction to Markov process Stochastic process. Markov process. Discrete and continues time processes. Probabilities. 1. Sheldon M. Ross. Introduction to Probability Models, Tenth Edition. Academic Press, 2009. 2. Olive Ibe. Markov Processes for Stochastic Modeling. Academic Press, 2009 1. Howard M. Taylor, Samuel Karlin, An introduction to Stochastic Modeling, Academic 1998 Topic 6. Discrete time Markov Chains Discrete time Markov chains (DTMC). Applications of DTMC. Stationary DTMC. Transition matrix. N-step transition matrix. The Chapman-Kolmogorov equations. Classification of states. Reachable and communicating states. Equivalence classes. Communicating classes. Irreducibility. Periodicity. Recurrence and transience. First passage time. Positive recurrence. Recurrence time. Ergodic states. Fundamental theorem of Markov chains. Steady state. Steady states equations. Absorbing states. Finite absorbing chains. Transition matrix structure. Absorption probabilities. Limiting distributions. 1. Sheldon M. Ross. Introduction to Probability Models, Tenth Edition. Academic Press, 2009. 2. Olive Ibe. Markov Processes for Stochastic Modeling. Academic Press, 2009 1. Howard M. Taylor, Samuel Karlin, An introduction to Stochastic Modeling, Academic 1998 2. J. R. Norris. Markov Chains. Cambridge University Press, 1998. Topic 7. Continuous time Markov chains Continuous time stochastic process (CTMC). Examples of CTMC. Transition probability functions. Holding time and exponential distribution. Embedded DTMC. Transition rates. Absorbing states. Transition probability diagram. Classification of states. Long run behavior of

CTMC. Positive recurrence. Fundamental theorem for CTMC. Steady state equations. Example of birth and death processes. 1. Sheldon M. Ross. Introduction to Probability Models, Tenth Edition. Academic Press, 2009. 2. Olive Ibe. Markov Processes for Stochastic Modeling. Academic Press, 2009 Howard M. Taylor, Samuel Karlin, An introduction to Stochastic Modeling, Academic 1998 J. R. Norris. Markov Chains. Cambridge University Press, 1998. Topic 8. Hidden Markov models (HMM) Introduction to Hidden Markov Model. Prediction of states. Evaluation of model parameters. Transition and signal matrices. Three main problems of HMM. Evaluation of signal sequence. Forward-backward procedure. Finding the most probable sequence. Decoding. Viterbi Algorithm. Fining model parameters- training HMMs. Baum-Welch Algorithm. HMM applications in speech recognition. 1. Olive Ibe. Markov Processes for Stochastic Modeling. Academic Press, 2009 2. Pierre Bremaud. Markov Chains, Givvs Fields, Monte Carlo Simulation, and Queues. Springer, 1998 1. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Lawrence R. Rabiner. Proc of IEEE, Vol 77, N 2, 1989, pp 257-286 2. Hidden Markov Models for Speech Recognition. B. H. Juang; L. R. Rabiner, Technometrics, Vol. 33, No. 3. (Aug., 1991), pp. 251-272. Topic 9. Markov chain Monte Carlo (MCMC) Random walks. Sampling from a distribution. Monte Carlo integrations. Construction of Markov chains. Metropolis-Hastings algorithm Olive Ibe. Markov Processes for Stochastic Modeling. Academic Press, 2009 Pierre Bremaud. Markov Chains, Givvs Fields, Monte Carlo Simulation, and Queues. Springer, 1998 Monte Carlo sampling methods using Markov chains and their applications. W. K. Hastings, Biometrika, 57,1, 1970, p. 97. Equation of State Calculations by Fast Computing Machines. Metropolis, N.,Rosenbluth, A. W.. Rosenbluth, M. N., Teller. A. H.. and Teller, E, Journal of Chemichal Physics, 21, 1953, p 1087-1092.

11. Term Educational Technology The following educational technologies are used in the study process: discussion and analysis of the results during the tutorials; solutions of exercises are posted on the subject website for the student to practice; assignments to test the progress of the students; consultation time on Monday mornings (1st module). 12. Recommendations for course lecturer Course lecturer is advised to use interactive learning methods, which allow participation of the majority of students, such as slide presentations, combined with writing materials on board, and usage of interdisciplinary papers to present connections between probability theory and statistics. The course is intended to be adaptive, but it is normal to differentiate tasks in a group if necessary, and direct fast learners to solve more complicated tasks. 13. Recommendations for students and Final exam questions The course is interactive. Lectures are combined with classes. Students are invited to ask questions and actively participate in group discussions. There will be special office hours for students, which would like to get more precise understanding of each topic. The lecturer is ready to answer your questions online by official e-mails that you can find in the contacts section. Additional references found in section 15.1 are suggested to help students in their understanding of the material. This course is taught in English, and students can ask teaching assistants to help them with the language. Example of control questions for Module 2. 1. Given Markov chain with two states {0, 1} and transition matrix P = [1/3, 2/3; 3/4,1/4]. If at n = 0 chain is in the state 0, compute the probability of it being in state 1 after n = 3 steps. What is the probability of being in state 1 after long time? 2. A player has a $1 at the beginning of the game and at every step of the game can win a $1 with the probability p and lose a $1 with probability 1 p. The game stops when the player either ruined or won $3. Draw a Markov chain representing this game. Compute probabilities of win and loss. Compute their numerical values of a fair game with p = 1/2. 3. Formulate the fundamental theorem of continues time Markov chains. What conditional are equivalent to the stationary state equations? Example of exam questions (not complete, only FYI) 1. Describe the fundamental Markov chain theorem and its necessary conditions. Give an example when these conditions are not satisfied. 2. Find all communication classes of a Markov chain with 7 states S = {A, B, C, D, E, F, G} and given transition matrix. Draw the chain diagram. Find communication classes and identify which states are recurrent and transient and find state periods 3. Consider Markov chain consisting of two states S = {1, 2} and known transition matrix. What is the probability for this system to be in the second state after a long time? 4. Find the value of the P (2, 3) element of the Markov chain transition matrix based on the sequence of observable states: 13231313132212132312

5. Compute the probability of generating sequence of states «BAB» by a hidden Markov model with two states {s,t}. Initial states have probabilities p(s) = 0.85 and p(t) = 0.15. and signal emission probabilities are Ps(A) = 0.4, Ps(B) = 0.6, Pt(A) = 0.5, Pt(B) = 0.5. Transition matrix is given. 6. Apply the EM algorithm to the gaussian mixture model. 7. Describe how probabilistic PCA (ppca) is related to conventional PCA. What are the advantages of considering ppca over PCA? 8. Derive the Kalman filtering equations. Pool of questions to control quality of studying the course (2nd module) Give a definition of Markov process. Give an example of a Markov process. What is a period of a state? What is a fundamental theorem of Markov chains? Give a definition of continues time Markov process. What is transition intensity? Describe an example of hidden Markov model. What are the three main problems related to HMM? Describe the construction of Markov chain in MCMC. Describe how probability distribution is generated from MCMC. The final exam will test the students' understanding on each topic discussed during the lectures. 14. Reading and Materials 14.1 Recommended Reading Christopher Bishop. Pattern Recognition and Machine Learning, Springer, 2006. Sheldon M. Ross. Introduction to Probability Models, 10th Edition. Academic Press, 2009. Olive Ibe. Markov Processes for Stochastic Modeling. Academic Press, 2009 Pierre Bremaud. Markov Chains, Gives Fields, Monte Carlo Simulation, and Queues. Springer, 1998 14.2 Supplementary reading 1. William J. Stewart. Introduction to the Numerical Solution of Markov Chains. Princeton University Press, 1994. 2. Howard M. Taylor, Samuel Karlin, An introduction to Stochastic Modeling, Academic 1998 3. J. R. Norris. Markov Chains. Cambridge University Press, 1998. 4. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Lawrence R. Rabiner. Proc of IEEE, Vol 77, N 2, 1989, pp 257-286 5. Hidden Markov Models for Speech Recognition. B. H. Juang; L. R. Rabiner, Technometrics, Vol. 33, No. 3. (Aug., 1991), pp. 251-272. 6. Understanding the Metropolis-Hastings Algorithm. S. Chib, E. Greenberg, The American Statistician, Vol. 49, No. 4.,1995, pp. 327-335. 7. A guided walk Metropolis algorithm. Paul Gustafson, Statistics and computing (1998) 8, 357-364

8. Monte Carlo sampling methods using Markov chains and their applications. W. K. Hastings, Biometrika, 57,1, 1970, p. 97. 9. Equation of State Calculations by Fast Computing Machines. Metropolis, N.,Rosenbluth, A. W.. Rosenbluth, M. N., Teller. A. H.. and Teller, E, Journal of Chemichal Physics, 21, 1953, p 1087-1092. 10. Explaining the Gibbs Sampler. G. Casella, E.I. Georrge. The American Statistician, V 46, N3, 1992, pp 167-174 11. Optimization by Simulated Annealing. S. Kirkpatrick et al. Science, Vol. 220, No. 4598, 1983, pp. 671-680. 12. D. Arthur. and S. Vassilvitskii (2007). k-means++: the advantages of careful seeding. In ACM-SIAM Symposium on Discrete Algorithms. 13. M.E. Tipping and C.M. Bishop. Probabilistic principal component analysis. J. R. statist. Soc. B (1999) 14. B. Scholkopf, A. Smola and K-R. Muller. Kernel Principal Component Analysis. Artificial Neural Networks ICANN (1997). Vol 127 pp 583-588. 15. R.E. Kalman (1960). A new approach to linear filtering and prediction problems. Transactions of the ASME-Journal of Basic Engineering. Vol 82, pp 35-45. 14.3 Course webpage All material of the discipline will be posted at http://www.cs.hse.ru/ai/sm Material from the previous years is placed here: http://www.leonidzhukov.net/hse/2014/stochmod/ Students are provided with links to the lecture notes, problem sheets and their solutions, assignments and their solutions, and additional readings. 16 Equipment The course requires a laptop and projector. The following Software products could be used for practical classes: Matlab, Octave, R, Python.