Lecture 1: Introduction to Machine Learning

Statistical Methods for Intelligent Information Processing (SMIIP) Lecture 1: Introduction to Machine Learning Shuigeng Zhou School of Computer Science September 13, 2017

What is machine learning? Machine learning is an application of artificial intelligence that automates analytical model building by using algorithms that iteratively learn from data without being explicitly programmed where to look 2017/9/25 SMIIP 2

What is machine learning? A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E Tom M. Mitchell 2017/9/25 SMIIP 3

What is machine learning? Machine learning is about predicting the future based on the past Hal Daume III (University of Maryland) 2017/9/25 SMIIP 4

What is machine learning? Machine learning is to develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest Kevin P. Murphy 2017/9/25 SMIIP 5

The key point is More data / experience / knowledge can make the methods / algorithms / programs more effective or smarter 2017/9/25 SMIIP 6

Machine learning: types & applications Different types of machine learning Supervised learning Unsupervised learning Semi-supervised learning Reinforcement learning Transfer learning Deep learning etc. Applications Pattern recognition Data mining Natural language processing (NLP) Computer vision Bioinformatics etc. 2017/9/25 SMIIP 7

Machine learning: from different views What data is available Supervised, unsupervised, reinforcement learning semi-supervised, active learning, How are we getting the data online vs. offline learning Types of model generative vs. discriminative parametric vs. non-parametric 2017/9/25 SMIIP 8

Popular machine learning tasks Supervised learning Unsupervised learning discrete label Classification k-nearest Neighbors Naive Bayes Support vector machines Decision trees Clustering k-means DBSCAN continuous label Regression Linear Locally weighted linear Ridge Lasso Density estimation Expectation maximization Parzen window 2017/9/25 SMIIP 9

Supervised learning examples label label 1 label 3 labeled examples label 4 label 5 Supervised learning: given labeled examples 2017/9/25 SMIIP 10

Supervised learning label label 1 label 3 model/ predictor label 4 label 5 Supervised learning: given labeled examples 2017/9/25 SMIIP 11

Supervised learning model/ predictor predicted label Supervised learning: learn to predict new example 2017/9/25 SMIIP 12

Supervised learning label apple apple Classification: a finite set of labels banana banana Supervised learning: given labeled examples 2017/9/25 SMIIP 13

Classifying flowers 2017/9/25 SMIIP 14

Classifying flowers 2017/9/25 SMIIP 15

Handwriting recognition 2017/9/25 SMIIP 16

Unsupervised learning (clustering) Learning what normally happens Clustering: Grouping similar instances Splitting a dataset to different groups to maximize intra-group similarity and inter-group difference Example applications Customer segmentation in CRM Image compression: Color quantization Bioinformatics: Learning motifs 2017/9/25 SMIIP 17

Unsupervised learning (clustering) Unsupervised learning: given data, i.e. examples, but no labels 2017/9/25 SMIIP 18

Reinforcement learning left, right, straight, left, left, left, straight left, straight, straight, left, right, straight, straight GOOD BAD left, right, straight, left, left, left, straight left, straight, straight, left, right, straight, straight 18.5-3 Given a sequence of examples/states and a reward after completing that sequence, learn to predict the action to take in for an individual example/state Applications: Game playing, Robot in a maze, Multiple agents, partial observability,... 2017/9/25 SMIIP 19

Reinforcement learning Backgammon WIN! LOSE! Given sequences of moves and whether or not the player won at the end, learn to make good moves 2017/9/25 SMIIP 20

Reinforcement learning 2017/9/25 SMIIP 21

Machine learning methods (Supervised learning) Certain training data (limited and well established) Determine hypothesis space (contains all possible models) Apply evaluation criterion in a certain strategy Implement solving/learning algorithms Training to select the optimal model Use the model to predict or analysis ML methods = model + strategy + algorithm 2017/9/25 SMIIP 22

Model 2017/9/25 SMIIP 23

Strategy 2017/9/25 SMIIP 24

Strategy 2017/9/25 SMIIP 25

Strategy 2017/9/25 SMIIP 26

Algorithm Solve it as an optimization problem Many kinds of algorithms used for convex optimization or non-convex optimization Stochastic Gradient Descent (SGD) many tricks and modifications Adelta/Adagrad/Adam/fancy SGD 2017/9/25 SMIIP 27

Model evaluation and selection 2017/9/25 SMIIP 28

Model evaluation and selection Overfitting a model describes random error or noise instead of the underlying relationship Overfitting occurs when a model is excessively complex, such as having too many parameters relative to the number of observations 2017/9/25 SMIIP 29

Model evaluation and selection Underfitting A machine learning algorithm cannot capture the underlying trend of the data Underfitting would occur, for example, when fitting a linear model to non-linear data. Such a model would have poor predictive performance. 2017/9/25 SMIIP 30

Model evaluation and selection Avoid overfitting regularization: L1-norm, L2-norm Cross validation: leave-one out cross validation (LOOCV), k-fold 2017/9/25 SMIIP 31

Performance evaluation: classification Confusion matrix ( 混淆矩阵 ) / error matrix Matching matrix for unsupervised learning A kind of contingency table ( 列联表 ) with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mis-labelling one as another) Actual class Cat Dog Rabbit Predicted class Cat 5 2 0 Dog 3 3 2 Rabbit 0 1 11 2017/9/25 SMIIP 32

Performance evaluation: classification Table of confusion (sometimes also called a confusion matrix) A table with two rows and two columns that reports the number of false positives, false negatives, true positives, and true negatives. Actual class Predicted class Cat Non-cat Cat 5 True Positives 2 False Positives Non-cat 3 False Negatives 17 True Negatives 2017/9/25 SMIIP 33

Performance evaluation: classification Some terms of confusion matrix Conditional positive (P) The number of real positive examples in the data Conditional negative (N) The number of real negative examples in the data True positive (TP), or hit Correctly predicted positive True negative (TN): or correct rejection Correctly predicted negative False positive (FP) : or false alarm, Type 1 error Incorrectly predicted positive False negative (FN): or miss, Type 2 error Incorrectly predicted negative 2017/9/25 SMIIP 34

Performance evaluation: classification Recall, or sensitivity, hit, true positive rate (TPR) Recall=TP/P=TP/(TP+FN) Specificity, or true negative rate (TNR) Specificity=TNR=TN/N=TN/(TN+FN) Precision, or true positive predictive value (PPV) Precision=TP/(TP+FP) False discovery rate (FDR) FDR=FP/(TP+FP)=1-PPV Accuracy (Acc) Acc=(TP+TN)/(P+N)=(TP+TN)/(TP+FP+TN+FN) 2017/9/25 SMIIP 35

Performance evaluation: classification 2017/9/25 SMIIP 36

Performance evaluation: multiple- class classification Micro-averaging Summing up the individual true positives, false positives, and false negatives for different classes and the apply them to get the statistics Macro-averaging Just taking the average of the precision and recall of different classes Macro-average weights equally all the classes, while micro-average weights equally all the documents 2017/9/25 SMIIP 37

Assignment Reading Chapter 1 Introduction of Murphy s book. 2017/9/25 SMIIP 38

Thanks! Questions?