Statistical Methods for Intelligent Information Processing (SMIIP) Lecture 1: Introduction to Machine Learning Shuigeng Zhou School of Computer Science September 13, 2017
What is machine learning? Machine learning is an application of artificial intelligence that automates analytical model building by using algorithms that iteratively learn from data without being explicitly programmed where to look 2017/9/25 SMIIP 2
What is machine learning? A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E Tom M. Mitchell 2017/9/25 SMIIP 3
What is machine learning? Machine learning is about predicting the future based on the past Hal Daume III (University of Maryland) 2017/9/25 SMIIP 4
What is machine learning? Machine learning is to develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest Kevin P. Murphy 2017/9/25 SMIIP 5
The key point is More data / experience / knowledge can make the methods / algorithms / programs more effective or smarter 2017/9/25 SMIIP 6
Machine learning: types & applications Different types of machine learning Supervised learning Unsupervised learning Semi-supervised learning Reinforcement learning Transfer learning Deep learning etc. Applications Pattern recognition Data mining Natural language processing (NLP) Computer vision Bioinformatics etc. 2017/9/25 SMIIP 7
Machine learning: from different views What data is available Supervised, unsupervised, reinforcement learning semi-supervised, active learning, How are we getting the data online vs. offline learning Types of model generative vs. discriminative parametric vs. non-parametric 2017/9/25 SMIIP 8
Popular machine learning tasks Supervised learning Unsupervised learning discrete label Classification k-nearest Neighbors Naive Bayes Support vector machines Decision trees Clustering k-means DBSCAN continuous label Regression Linear Locally weighted linear Ridge Lasso Density estimation Expectation maximization Parzen window 2017/9/25 SMIIP 9
Supervised learning examples label label 1 label 3 labeled examples label 4 label 5 Supervised learning: given labeled examples 2017/9/25 SMIIP 10
Supervised learning label label 1 label 3 model/ predictor label 4 label 5 Supervised learning: given labeled examples 2017/9/25 SMIIP 11
Supervised learning model/ predictor predicted label Supervised learning: learn to predict new example 2017/9/25 SMIIP 12
Supervised learning label apple apple Classification: a finite set of labels banana banana Supervised learning: given labeled examples 2017/9/25 SMIIP 13
Classifying flowers 2017/9/25 SMIIP 14
Classifying flowers 2017/9/25 SMIIP 15
Handwriting recognition 2017/9/25 SMIIP 16
Unsupervised learning (clustering) Learning what normally happens Clustering: Grouping similar instances Splitting a dataset to different groups to maximize intra-group similarity and inter-group difference Example applications Customer segmentation in CRM Image compression: Color quantization Bioinformatics: Learning motifs 2017/9/25 SMIIP 17
Unsupervised learning (clustering) Unsupervised learning: given data, i.e. examples, but no labels 2017/9/25 SMIIP 18
Reinforcement learning left, right, straight, left, left, left, straight left, straight, straight, left, right, straight, straight GOOD BAD left, right, straight, left, left, left, straight left, straight, straight, left, right, straight, straight 18.5-3 Given a sequence of examples/states and a reward after completing that sequence, learn to predict the action to take in for an individual example/state Applications: Game playing, Robot in a maze, Multiple agents, partial observability,... 2017/9/25 SMIIP 19
Reinforcement learning Backgammon WIN! LOSE! Given sequences of moves and whether or not the player won at the end, learn to make good moves 2017/9/25 SMIIP 20
Reinforcement learning 2017/9/25 SMIIP 21
Machine learning methods (Supervised learning) Certain training data (limited and well established) Determine hypothesis space (contains all possible models) Apply evaluation criterion in a certain strategy Implement solving/learning algorithms Training to select the optimal model Use the model to predict or analysis ML methods = model + strategy + algorithm 2017/9/25 SMIIP 22
Model 2017/9/25 SMIIP 23
Strategy 2017/9/25 SMIIP 24
Strategy 2017/9/25 SMIIP 25
Strategy 2017/9/25 SMIIP 26
Algorithm Solve it as an optimization problem Many kinds of algorithms used for convex optimization or non-convex optimization Stochastic Gradient Descent (SGD) many tricks and modifications Adelta/Adagrad/Adam/fancy SGD 2017/9/25 SMIIP 27
Model evaluation and selection 2017/9/25 SMIIP 28
Model evaluation and selection Overfitting a model describes random error or noise instead of the underlying relationship Overfitting occurs when a model is excessively complex, such as having too many parameters relative to the number of observations 2017/9/25 SMIIP 29
Model evaluation and selection Underfitting A machine learning algorithm cannot capture the underlying trend of the data Underfitting would occur, for example, when fitting a linear model to non-linear data. Such a model would have poor predictive performance. 2017/9/25 SMIIP 30
Model evaluation and selection Avoid overfitting regularization: L1-norm, L2-norm Cross validation: leave-one out cross validation (LOOCV), k-fold 2017/9/25 SMIIP 31
Performance evaluation: classification Confusion matrix ( 混淆矩阵 ) / error matrix Matching matrix for unsupervised learning A kind of contingency table ( 列联表 ) with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mis-labelling one as another) Actual class Cat Dog Rabbit Predicted class Cat 5 2 0 Dog 3 3 2 Rabbit 0 1 11 2017/9/25 SMIIP 32
Performance evaluation: classification Table of confusion (sometimes also called a confusion matrix) A table with two rows and two columns that reports the number of false positives, false negatives, true positives, and true negatives. Actual class Predicted class Cat Non-cat Cat 5 True Positives 2 False Positives Non-cat 3 False Negatives 17 True Negatives 2017/9/25 SMIIP 33
Performance evaluation: classification Some terms of confusion matrix Conditional positive (P) The number of real positive examples in the data Conditional negative (N) The number of real negative examples in the data True positive (TP), or hit Correctly predicted positive True negative (TN): or correct rejection Correctly predicted negative False positive (FP) : or false alarm, Type 1 error Incorrectly predicted positive False negative (FN): or miss, Type 2 error Incorrectly predicted negative 2017/9/25 SMIIP 34
Performance evaluation: classification Recall, or sensitivity, hit, true positive rate (TPR) Recall=TP/P=TP/(TP+FN) Specificity, or true negative rate (TNR) Specificity=TNR=TN/N=TN/(TN+FN) Precision, or true positive predictive value (PPV) Precision=TP/(TP+FP) False discovery rate (FDR) FDR=FP/(TP+FP)=1-PPV Accuracy (Acc) Acc=(TP+TN)/(P+N)=(TP+TN)/(TP+FP+TN+FN) 2017/9/25 SMIIP 35
Performance evaluation: classification 2017/9/25 SMIIP 36
Performance evaluation: multiple- class classification Micro-averaging Summing up the individual true positives, false positives, and false negatives for different classes and the apply them to get the statistics Macro-averaging Just taking the average of the precision and recall of different classes Macro-average weights equally all the classes, while micro-average weights equally all the documents 2017/9/25 SMIIP 37
Assignment Reading Chapter 1 Introduction of Murphy s book. 2017/9/25 SMIIP 38
Thanks! Questions?