Learning. Machine. A First Course in. Simon Rogers Mark Girolami. Chapman & Hall/CRC. CRC Press. Machine Learning & Pattern Recognition Series

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

Probabilistic Latent Semantic Analysis

Learning From the Past with Experiment Databases

Generative models and adversarial training

Probability and Statistics Curriculum Pacing Guide

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

STA 225: Introductory Statistics (CT)

CS Machine Learning

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

(Sub)Gradient Descent

WHEN THERE IS A mismatch between the acoustic

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Semi-Supervised Face Detection

CS/SE 3341 Spring 2012

Detailed course syllabus

Calibration of Confidence Measures in Speech Recognition

Rule Learning With Negation: Issues Regarding Effectiveness

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Truth Inference in Crowdsourcing: Is the Problem Solved?

Mathematics subject curriculum

A Model of Knower-Level Behavior in Number Concept Development

Assignment 1: Predicting Amazon Review Ratings

CSL465/603 - Machine Learning

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Speech Emotion Recognition Using Support Vector Machine

Rule Learning with Negation: Issues Regarding Effectiveness

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

arxiv: v2 [cs.cv] 30 Mar 2017

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

PROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT. James B. Chapman. Dissertation submitted to the Faculty of the Virginia

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Australian Journal of Basic and Applied Sciences

Switchboard Language Model Improvement with Conversational Data from Gigaword

Applications of data mining algorithms to analysis of medical data

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

Why Did My Detector Do That?!

Time series prediction

Universidade do Minho Escola de Engenharia

Guide to Teaching Computer Science

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

12- A whirlwind tour of statistics

Hierarchical Linear Models I: Introduction ICPSR 2015

Exploration. CS : Deep Reinforcement Learning Sergey Levine

BAYESIAN ANALYSIS OF INTERLEAVED LEARNING AND RESPONSE BIAS IN BEHAVIORAL EXPERIMENTS

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Data Fusion Through Statistical Matching

INPE São José dos Campos

University of Cincinnati College of Medicine. DECISION ANALYSIS AND COST-EFFECTIVENESS BE-7068C: Spring 2016

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

Speech Recognition at ICSI: Broadcast News and beyond

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

A study of speaker adaptation for DNN-based speech synthesis

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Reducing Features to Improve Bug Prediction

Word Segmentation of Off-line Handwritten Documents

Evaluation of ecodriving performances and teaching method: comparing training and simple advice

Development of Multistage Tests based on Teacher Ratings

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Introduction to Simulation

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Artificial Neural Networks written examination

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Modeling function word errors in DNN-HMM based LVCSR systems

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Human Emotion Recognition From Speech

School Size and the Quality of Teaching and Learning

Learning to Rank with Selection Bias in Personal Search

Test Effort Estimation Using Neural Network

Modeling function word errors in DNN-HMM based LVCSR systems

A survey of multi-view machine learning

Using EEG to Improve Massive Open Online Courses Feedback Interaction

Knowledge management styles and performance: a knowledge space model from both theoretical and empirical perspectives

Analysis of Enzyme Kinetic Data

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Lecture Notes on Mathematical Olympiad Courses

Introduction to Causal Inference. Problem Set 1. Required Problems

On-the-Fly Customization of Automated Essay Scoring

Indian Institute of Technology, Kanpur

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Theory of Probability

Go fishing! Responsibility judgments when cooperation breaks down

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

ABILITY SORTING AND THE IMPORTANCE OF COLLEGE QUALITY TO STUDENT ACHIEVEMENT: EVIDENCE FROM COMMUNITY COLLEGES

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Softprop: Softmax Neural Network Backpropagation Learning

Multivariate k-nearest Neighbor Regression for Time Series data -

A Bayesian Learning Approach to Concept-Based Document Classification

EGRHS Course Fair. Science & Math AP & IB Courses

MASTER OF PHILOSOPHY IN STATISTICS

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Transcription:

Chapman & Hall/CRC Machine Learning & Pattern Recognition Series A First Course in Machine Learning Simon Rogers Mark Girolami CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor Sc Francis Croup, an Informa business A CHAPMAN & HALL BOOK

a List of Tables xi List of Figures xiii Preface xix 1 Linear Modelling: A Least Squares Approach 1 11 Linear modelling 1 111 Defining the model 2 112 Modelling assumptions 3 113 Defining what a good model is 4 114 The least squares solution worked example 6 115 Worked example 9 116 Least squares fit to the Olympics data 10 117 Summary 11 12 Making predictions 12 121 A second Olympics dataset 12 122 Summary 15 13 Vector/matrix notation 15 131 Example 22 132 Numerical example 23 133 Making predictions 24 134 Summary 24 14 Nonlinear response from a linear model 25 15 Generalisation and overfitting 28 151 Validation data 29 152 Crossvalidation 29 153 Computational scaling of JiTfold crossvalidation 32 16 Regularised least squares 33 17 Exercises 35 Further reading 37 2 Linear Modelling: A Maximum Likelihood Approach 39 21 Errors as noise 39 211 Thinking generatively 40 22 Random variables and probability 41 v

Bayes' an Olympics vi 221 Random variables 41 222 Probability and distributions 42 223 Adding probabilities 44 224 Conditional probabilities 44 225 Joint probabilities 45 226 Marginalisation 47 227 Aside rule 49 ' 75 228 Expectations 50 23 Popular discrete distributions 53 231 Bernoulli distribution 53 232 Binomial distribution 53 233 Multinomial distribution 54 24 Continuous random variables density functions 55 25 Popular continuous density functions 58 251 The uniform density function 58 252 The beta density function 60 253 The Gaussian density function 61 254 Multivariate Gaussian 62 255 Summary 65 26 Thinking generativelycontinued 65 27 Likelihood 67 271 Dataset likelihood 68 272 Maximum likelihood 69 273 Characteristics of the maximum likelihood solution 71 274 Maximum likelihood favours complex models 74 28 The biasvariance tradeoff 75 281 Summary 29 Effect of noise on parameter estimates 76 291 Uncertainty in estimates 78 292 Comparison with empirical 293 Variability in model parameters values 81 data 82 210 Variability in predictions 83 2101 Predictive variability example 85 2102 Expected values of the estimators 86 2103 Summary 90 211 Exercises 90 Further reading 93 3 The Bayesian Approach to Machine Learning 95 31 A coin game 95 311 Counting heads 97 312 The Bayesian way 98 32 The exact posterior 103 33 The three scenarios 104 331 No prior knowledge 104

the Gaussian a classconditional " vii 332 The fair coin scenario Ill 333 A biased coin 114 334 The three scenarios 335 Adding more data summary 116 116 118 34 Marginal likelihoods 117 341 Model comparison with the marginal likelihood 35 Hyperparameters 119 36 Graphical models 120 361 Summary 121 37 A Bayesian treatment of the Olympics 100 m data 122 371 The model 38 likelihood for model 372 The likelihood 124 373 The prior 124 374 The posterior 124 375 A firstorder polynomial 126 376 Making predictions 129 131 Marginal polynomial order selection 39 Chapter summary 133 310 Exercises 133 Further reading 137 122 4 Bayesian Inference 139 41 Nonconjugate models 139 42 Binary responses 140 421 A model for binary responses 140 43 A point estimate MAP solution 143 : 154 44 The Laplace approximation 149 441 Laplace approximation example: Approximating a gamma density 150 442 Laplace approximation for the binary response model 151 45 Sampling techniques 451 Playing darts 154 452 The MetropolisHastings algorithm 156 453 The art of sampling 164 46 Summary 165 47 Exercises 165 Further reading 167 5 Classification 169 51 The general problem 169 52 Probabilistic classifiers 170 521 The Bayes classifier 170 5211 Likelihood distributions 171 5212 Prior class distribution 171 5213 Example classconditionals 172

0/1 classifying the viii 5214 Making predictions 173 5215 The naive Bayes assumption 175 5216 Example text 175 5217 Smoothing 177 522 Logistic regression 179 5221 Motivation 180 5222 Nonlinear decision functions 181 5223 Nonparametric models Gaussian process 182 186 53 Nonprobabilistic classifiers 183 531 ifnearest neighbours 183 5311 Choosing if 184 532 Support vector machines and other kernel methods 5321 The margin 186 5322 Maximising the margin 187 5323 Making predictions 190 5324 Support vectors 191 5325 Soft margins 192 5326 Kernels 193 533 Summary 197 54 Assessing classification performance 198 541 Accuracy 542 Sensitivity and specificity loss 198 198 543 The area under the ROC curve 199 544 Confusion matrices 201 55 Discriminative and generative classifiers 203 56 Summary 203 57 Exercises 203 Further reading 205 6 Clustering 207 61 The general problem 207 62 ifmeans clustering 208 621 Choosing the number of clusters 210 622 Where ifmeans fails 212 623 Kernelised ifmeans 212 624 Summary 214 63 Mixture models 215 631 A generative process 216 632 Mixture model likelihood 217 633 The EM algorithm 219 6331 Updating nk 220 6332 Updating^ 221 6333 Updating Sfc 222 6334 Updating qnk 223 6335 Some intuition 224

ix 634 Example 225 635 EM finds local optima 226 636 Choosing the number of components 228 637 Other forms of mixture components 230 638 MAP estimates with EM 232 639 Bayesian mixture models 233 64 Summary 234 65 Exercises 234 Further reading 237 7 Principal Components Analysis and Latent Variable Models 239 71 The general problem 239 711 Variance as a proxy for interest 239 72 Principal components analysis 242 721 Choosing D 247 722 Limitations of PCA 247 73 Latent variable models 248 731 Mixture models as latent variable models 248 732 Summary 249 74 Variational Bayes 249 741 Choosing Q{9) 251 742 Optimising the bound 252 75 A probabilistic model for PCA 252 751 Qt(t) 254 752 QXn(x ) 256 753 QWm(wm) 257 754 The required expectations 258 755 The algorithm 258 756 An example 260 76 Missing values 260 761 Missing values as latent variables 262 762 Predicting missing values 264 77 Nonrealvalued data 264 771 Probit PPCA 264 772 Visualising parliamentary data 268 7721 Aside relationship to classification 272 78 Summary 273 79 Exercises 273 Further reading 275 Glossary 277 Index 283