Advanced Methods in Probabilistic Modeling

Similar documents
Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

CSL465/603 - Machine Learning

Lecture 1: Machine Learning Basics

Probabilistic Latent Semantic Analysis

STA 225: Introductory Statistics (CT)

A Model of Knower-Level Behavior in Number Concept Development

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Wenguang Sun CAREER Award. National Science Foundation

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Probability and Statistics Curriculum Pacing Guide

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

CS 101 Computer Science I Fall Instructor Muller. Syllabus

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Degree Qualification Profiles Intellectual Skills

Introduction to Simulation

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

Statistics and Data Analytics Minor

Exploration. CS : Deep Reinforcement Learning Sergey Levine

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Abstractions and the Brain

CS Machine Learning

Axiom 2013 Team Description Paper

Introduction, Organization Overview of NLP, Main Issues

Latent Semantic Analysis

Level 6. Higher Education Funding Council for England (HEFCE) Fee for 2017/18 is 9,250*

Theory of Probability

Speech Emotion Recognition Using Support Vector Machine

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

DOCTOR OF PHILOSOPHY HANDBOOK

Agents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators

Computational Data Analysis Techniques In Economics And Finance

EGRHS Course Fair. Science & Math AP & IB Courses

FRANK LAD. December 22, PHONE: In New Zealand (03) , International

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

February University of Chicago, Chicago, USA Booth School of Business Assistant Professor in Econometrics and Statistics, 7/2016-present

A Comparison of Two Text Representations for Sentiment Analysis

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

ECON 6901 Research Methods for Economists I Spring 2017

ECO 3101: Intermediate Microeconomics

Syllabus for CHEM 4660 Introduction to Computational Chemistry Spring 2010

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Knowledge-Based - Systems

Room: Office Hours: T 9:00-12:00. Seminar: Comparative Qualitative and Mixed Methods


Assignment 1: Predicting Amazon Review Ratings

Learning From the Past with Experiment Databases

OFFICE SUPPORT SPECIALIST Technical Diploma

Truth Inference in Crowdsourcing: Is the Problem Solved?

Math 181, Calculus I

Beyond the Pipeline: Discrete Optimization in NLP

Syllabus Foundations of Finance Summer 2014 FINC-UB

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Georgetown University at TREC 2017 Dynamic Domain Track

Word learning as Bayesian inference

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Office Hours: Mon & Fri 10:00-12:00. Course Description

Corrective Feedback and Persistent Learning for Information Extraction

Agent-Based Software Engineering

Data Fusion Through Statistical Matching

Multimedia Application Effective Support of Education

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Discriminative Learning of Beam-Search Heuristics for Planning

Semi-Supervised Face Detection

San José State University Department of Marketing and Decision Sciences BUS 90-06/ Business Statistics Spring 2017 January 26 to May 16, 2017

arxiv:cmp-lg/ v1 22 Aug 1994

Answer Key Applied Calculus 4

Reducing Features to Improve Bug Prediction

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

MASTER S THESIS GUIDE MASTER S PROGRAMME IN COMMUNICATION SCIENCE

Honors Mathematics. Introduction and Definition of Honors Mathematics

Course Content Concepts

Data Structures and Algorithms

Speech Recognition at ICSI: Broadcast News and beyond

Toward Probabilistic Natural Logic for Syllogistic Reasoning

Problems of the Arabic OCR: New Attitudes

ICRSA James D. Lynch William J. Padgett Edsel A. Peña. June 2, 2003

CS/SE 3341 Spring 2012

MATH 1A: Calculus I Sec 01 Winter 2017 Room E31 MTWThF 8:30-9:20AM

tom

CIS 121 INTRODUCTION TO COMPUTER INFORMATION SYSTEMS - SYLLABUS

STA2023 Introduction to Statistics (Hybrid) Spring 2013

CS 446: Machine Learning

Hierarchical Linear Models I: Introduction ICPSR 2015

Transcription:

Advanced Methods in Probabilistic Modeling David M. Blei Princeton University September 13, 2013 We will study how to use probability models to analyze data, focusing both on mathematical details of the models and the technology that implements the corresponding algorithms. We will study advanced methods, such as large scale inference, model diagnostics and selection, and Bayesian nonparametrics. Our goals are to understand the cutting edge of modern probabilistic modeling, to begin research that makes contributions to this field, and develop good practices for specifying and applying probabilistic models to analyze real-world data. The centerpiece of the course will be the student project. Over the course of the semester, students will develop an applied case study, ideally one that is connected to their graduate research. Each project must involve using probabilistic models to analyze real-world data. Prerequisites I assume you are familiar with the basic material from COS513 (Foundations of Proababilistic Modeling). For example, you should be comfortable with probabilistic graphical models basic statistics mixture modeling linear regression hidden Markov models exponential families the expectation-maximization algorithm We will study again some of the advanced material that was touched on in COS513, such as variational inference and Bayesian nonparametrics. I assume you are comfortable writing software to analyze data and learning about new tools for that purpose. For example, you should be familiar with a statistical programming language such as R and a scripting language such as Python. 1

Administrative Details The instructor is David Blei (blei@cs.princeton.edu). The course meets Monday from 1:30PM - 4:20PM in Room 302 of the CS building. Office hours are Mondays from 10:30AM - 12:30PM in Room 419 of the CS building. The course website is www.cs.princeton.edu/courses/archive/fall13/cos597a/. We will use Piazza to distribute readings and have online discussion. Requirements and Grading There are four requirements for the course. 1. Participate in class. This is a seminar based on discussion. Each student must participate substantially to class. 2. Weekly paper. Each student will submit a two part weekly paper. First, write on what you thought about the week s reading. Second, write about your progress on the class project. This might include summaries of additional reading or interesting intermediate results. These papers should be no longer than two pages. They can be as short as needed. 3. Project. There will be short progress reports about the final project due throughout the semester. A final report about your project is due by Dean s Date. I grade final reports on both content and writing quality. Two good books about writing are Strunk and White (1979) and Williams (1981). 4. Demonstration. Each student is required to give a 15 minute demonstration of a tool or technique. As much as possible, he or she should walk us through code or run an interpreter. I do not expect the student to be an expert in the tool; the idea is to have some experience with it and then lead a discussion. Some example demonstrations include Processing text data with nltk, Keeping track of research with ipython notebooks, Exploring results with plyr, Interactive data visualization with D3, Probabilistic programming with Stan, and Shell scripts I cannot live without. Your course grade will be mainly based on the final report, but I will also consider the other requirements. The weekly papers will not be individually graded. Please prepare all written work using LaTex. I will provide LaTex templates before the first assignment. There are no auditors. Those that cannot enroll (for example, postdocs or visiting researchers) must still complete all of the work. 2

Schedule There is an assigned reading each week, sometimes a choice of readings and sometimes additional optional readings. Students are also expected to read outside of the syllabus in the service of their final projects. Below is a tentative schedule of course topics and readings. These may change depending on student interests and the overall trajectory of the course. 1. Introduction and overview of the course 2. Applied probabilistic modeling (Blei, 2013) 3. Model specification (Lehmann, 1990; Varian, 1997) 4. Variational inference and stochastic optimization (Wainwright and Jordan, 2008; Hoffman et al., 2013) 5. Hierarchical models, shrinkage, and empirical Bayes (Gelman and Hill, 2007; Efron, 2010) 6. Mixed-membership models (Pritchard et al., 2000; Blei, 2012; Rusch et al., 2013) 7. Model fitness: Posterior predictive checks and predictive likelihood (Gelman et al., 1996; Box, 1980; Rubin, 1984; Geisser, 1975) 8. Data visualization, the grammar of graphics (and ggplot2) (Wilkinson, 2009) 9. Bayesian nonparametrics: Clustering models (Gershman and Blei, 2012; Teh and Jordan, 2008) 10. Bayesian nonparametrics: Latent feature models (Griffiths and Ghahramani, 2011; Broderick et al., 2013) 11. Variational inference with nonconjugate models (Braun and McAuliffe, 2010; Wang and Blei, 2013) 12. Bayesian statistics and the philosophy of science (Gelman and Shalizi, 2012) In addition to the assigned papers, consider reading Gelman et al. (1995), Bishop (2006), and Murphy (2013). These are excellent sources on applied probabilistic modeling. 3

References Bishop, C. (2006). Pattern Recognition and Machine Learning. Springer New York. Blei, D. (2012). Probabilistic topic models. Communications of the ACM, 55(4):77 84. Box, G. (1980). Sampling and Bayes inference in scientific modeling and robustness. Journal of the Royal Statistical Society, Series A, 143(4):383 430. Braun, M. and McAuliffe, J. (2010). Variational inference for large-scale models of discrete choice. Journal of the American Statistical Association. Broderick, T., Jordan, M. I., and Pitman, J. (2013). Cluster and feature modeling from combinatorial stochastic processes. Statistical Science, 28(3):289 312. Efron, B. (2010). Large-scale inference: empirical Bayes methods for estimation, testing, and prediction, volume 1. Cambridge University Press. Geisser, S. (1975). The predictive sample reuse method with applications. Journal of the American Statistical Association, 70:320 328. Gelman, A., Carlin, J., Stern, H., and Rubin, D. (1995). Bayesian Data Analysis. Chapman & Hall, London. Gelman, A. and Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press. Gelman, A., Meng, X., and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6:733 807. Gelman, A. and Shalizi, C. (2012). Philosophy and the practice of Bayesian statistics. British Journal of Mathematical and Statistical Psychology. Gershman, S. and Blei, D. (2012). A tutorial on Bayesian nonparametric models. Journal of Mathematical Psychology, 56:1 12. Griffiths, T. and Ghahramani, Z. (2011). The indian buffet process: An introduction and review. Journal of Machine Learning Research, 12:1185 1224. Hoffman, M., Blei, D., Wang, C., and Paisley, J. (2013). Stochastic variational inference. Journal of Machine Learning Research, 14(1303 1347). Lehmann, E. (1990). Model specification: The views of Fisher and Neyman, and later developments. Statistical Science, 5(2):160 168. Murphy, K. (2013). Machine Learning: A Probabilistic Approach. MIT Press. Pritchard, J., Stephens, M., and Donnelly, P. (2000). Inference of population structure using multilocus genotype data. Genetics, 155:945 959. Rubin, D. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. The Annals of Statistics, 12(4):1151 1172. 4

Rusch, T., Hofmarcher, P., Hatzinger, R., and Hornik, K. (2013). Model trees with topic model preprocessing: An approach for data journalism illustrated with the wikileaks afghanistan war logs. The Annals of Applied Statistics, 7(2):613 639. Strunk, W. and White, E. (1979). Elements of Style. Longman Press. Teh, Y. and Jordan, M. (2008). Hierarchical Bayesian nonparametric models with applications. Varian, H. R. (1997). How to build an economic model in your spare time. The American Economist, pages 3 10. Wainwright, M. and Jordan, M. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1 2):1 305. Wang, C. and Blei, D. (2013). Variational inference in nonconjugate models. Journal of Machine Learning Research, 14:1005 1031. Wilkinson, L. (2009). The Grammar of Graphics. Springer. Williams, J. (1981). Style: Towards Clarity and Grace. University of Chicago Press. 5