CSE 417T: Introduction to Machine Learning Lecture 1: Introduction Henry Chai 08/28/18
Website: http://classes.cec.wustl.edu/~cse417t/ Piazza (signup with Wash U email) Course Information Gradescope (signup with M6Z8XD) and SVN Textbooks: Learning From Data (AML), http://amlbook.com/ Computer Age Statistical Inference: Algorithms, Evidence, and Data Science (CASI), https://web.stanford.edu/~hastie/casi/ 2
Course Information Grading: Homework assignments (68): 50% Mix of programming and pencilandpaper problems Worst (secondworst) scores discounted 60% (40%) 5 total late days, no more than 2 usable on any one assignment Collaboration: Feel free to discuss homework with other students Must write your own solutions Must cite all external sources (including other students) Tests (2): 50% 10/4/18, 6:30 PM 8:30 PM 12/5/18, 6:30 PM 8:30 PM Location TBD 3
First Half of the Course: Foundations Theory, Proofs, Math, Probability, Boring Stuff, etc Second Half of the Course: Techniques Random Forests! Support Vector Machines! Neural Networks! Yay! Overview 4
Tentative Schedule Date Topic 1 8/28 Introduction 2 8/30 Generalization 3 9/4 Matlab tutorial 4 9/6 Hypothesis sets 5 9/11 Infinitedimensional hypothesis sets 6 9/13 Biasvariance tradeoff 7 9/18 Linear regression 8 9/20 Logistic regression 9 9/25 Overfitting 10 9/27 Regularization 11 10/2 Exam review 5
Machine Learning (Then) 6
Machine Learning (Now) 7
There exists a pattern Machine Learning The pattern is difficult/impossible to describe There is data Use data to learn the pattern 8
Example: Approving Credit 9
Unknown target function!: # % Training data Formal Setup * =,, /,,, 1, / 1 Learning Algorithm ) Hypothesis Set H Learned Hypothesis H (: # % 10
Unknown target function!: # % Training data Formal Setup & = ( ), + ),, (, + Learning Algorithm 2 Hypothesis Set H Learned Hypothesis H 0! 11
Unknown target function!: # % Learning Model Training data & = ( ), + ),, (, + Learning Algorithm 2 Hypothesis Set H Learned Hypothesis H 0! 12
Example: Inputs, Outputs and Data Assumptions: Two continuous inputs credit score and credit line size One binary output approve or deny Dataset of! historical observations Formally, Input space " = R % Output space & = 1 deny, +1 approve Dataset 4 = 5 66, 5 6%, 7 6,, 5 96, 5 9%, 7 9 = X, 7 where X = 5 66 5 6% R 9 % and 7 = 5 96 5 9% 7 6 7 9 R 9 13
Example: Hypothesis Set Perceptron Given some input " = " $, " & : & h " = +1 if,.$ / " > 1 1 otherwise 14
Example: Hypothesis Set Perceptron Given some input " = " $, " & : & h " = +1 if,.$ / " 1 > 0 1 otherwise 15
Example: Hypothesis Set Perceptron Given some input " = " $, " & : h " = ()*+, / " 1 &.$ 16
Example: Hypothesis Set Perceptron Given some input " = " $ = 1, " ', " ( : h " = *+,. 1 / " / ( /0$ 17
Example: Hypothesis Set Perceptron Given some input " = " $ = 1, " ', " ( : h " = *+,. 1 / " / ( /0$ 18
Example: Hypothesis Set Perceptron Given some input " = " $ = 1, " ', " ( : ( h " = *+,. 1 / " / = *+, 1 2 " /0$ 19
20 000 18 000 16 000 14 000 Example: Data Credit Line Size ($) 12 000 10 000 8000 6000 4000 2000 400 450 500 550 600 650 700 750 800 850 Credit Score 20
20 000 18 000 16 000 14 000 Example: Hypothesis Credit Line Size ($) 12 000 10 000 8000! " = 43 000! ( = 60 6000 4000! * = 1 2000 400 450 500 550 600 650 700 750 800 850 Credit Score 21
20 000 18 000 16 000 14 000 Example: Hypothesis Credit Line Size ($) 12 000 10 000 8000! " = 650! ( = 1 6000 4000! * = 0 2000 400 450 500 550 600 650 700 750 800 850 Credit Score 22
20 000 18 000 16 000 14 000 Example: Hypothesis Credit Line Size ($) 12 000 10 000 8000! " = 21 200! ( = 39 6000 4000! + = 0.5 2000 400 450 500 550 600 650 700 750 800 850 Credit Score 23
Example: Learning Algorithm Perceptron Learning Algorithm (PLA) PLA finds a linear separator in finite time, if the training data is linearly separable Given: training data! = # $, & $,, # (, & ( Initialize ) to all zeros or (small) random numbers While some misclassified training example Randomly pick a misclassified training example # +, & +! s.t. h # + =./01 ) 2 # + & + Update ): ) = ) + & + # + 24
Example: Learning Algorithm Perceptron Learning Algorithm (PLA) PLA finds a linear separator in finite time, if the data is linearly separable Given: training data! = # $, & $,, # (, & ( Initialize ) to all zeros or (small) random numbers While some misclassified training example i.e. # +, & +! s.t. h # + =./01 ) 2 # + & + Randomly pick a misclassified training example Update ): ) = ) + & + # + 25
Example: Learning Algorithm Perceptron Learning Algorithm (PLA) PLA finds a linear separator in finite time, if the data is linearly separable Given: training data! = # $, & $,, # (, & ( Initialize ) to all zeros or (small) random numbers While some misclassified training example i.e. # +, & +! s.t. h # + =./01 ) 2 # + & + Randomly pick a misclassified training example Update ): ) = ) + & + # + 26
Example: Learning Algorithm Perceptron Learning Algorithm (PLA) PLA finds a linear separator in finite time, if the data is linearly separable Given: training data! = # $, & $,, # (, & ( Initialize ) to all zeros or (small) random numbers While some misclassified training example i.e. # +, & +! s.t. h # + =./01 ) 2 # + & + Randomly pick a misclassified training example, #, & Update ): ) = ) + & # 27
Perceptron Learning Algorithm (Intuition) Suppose ", $ & is a misclassified training example and $ = +1 * + " is negative After updating * = * + $ ", * + $ " + " = * + " + $ " + " is less negative than * + " Because $ > 0 and " + " > 0 A similar argument holds if $ = 1 28
2 1.8 1.6 Example: PLA 1.4 Credit Line Size ($10000) 1.2 1 0.8! = 4.3, 0.6, 1 0.6 0.4 0.2 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 Credit Score/100 29
2 1.8 1.6 Example: PLA 1.4 Credit Line Size ($10000) 1.2 1 0.8! = 4.3, 0.6, 1, = 1, 6.2, 1.5 0.6 0.4 0.2! + 0, = 5.3, 5.6, 0.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 Credit Score/100 30
2 1.8 1.6 Example: PLA 1.4 Credit Line Size ($10000) 1.2 1 0.8! = 5.3, 5.6, 0.5 0.6 0.4 0.2 2 1 0 1 2 3 4 5 6 7 8 Credit Score/100 31
2 1.8 1.6 Example: PLA 1.4 Credit Line Size ($10000) 1.2 1 0.8! = 5.3, 5.6, 0.5 + = 1, 8, 1 0.6 0.4 0.2! + / + = 4.3, 2.4, 0.5 2 1 0 1 2 3 4 5 6 7 8 Credit Score/100 32
2 1.8 1.6 Example: PLA 1.4 Credit Line Size ($10000) 1.2 1 0.8! = 4.3, 2.4, 0.5 0.6 0.4 0.2 2 1 0 1 2 3 4 5 6 7 8 Credit Score/100 33
Types of Learning Supervised Learning Training data is (input, output) Examples: linear/logistic regression, support vector machines, neural networks Variants: active learning and online learning Unsupervised Learning Training data is (input) Examples: clustering, principal component analysis, outlier detection Reinforcement Learning Training data is (input, action, score) Examples: Qlearning, temporal difference learning 34
6 5.5 5 4.5 4 Types of Learning 3.5 3 2.5 2 1.5 1 0.5 0 0 1 2 3 4 5 6 7 8 9 35
Types of Learning Supervised Learning Training data is (input, output) Examples: linear/logistic regression, support vector machines, neural networks Variants: active learning and online learning Unsupervised Learning Training data is (input) Examples: clustering, principal component analysis, outlier detection Reinforcement Learning Training data is (input, action, score) Examples: Qlearning, temporal difference learning 36
Types of Learning Source: https://www.xkcd.com/242/ 37
Types of Learning Supervised Learning (this class!) Training data is (input, output) Examples: linear/logistic regression, support vector machines, neural networks Variants: active learning and online learning Unsupervised Learning (CSE 517A) Training data is (input) Examples: clustering, principal component analysis, outlier detection Reinforcement Learning (CSE 511A) Training data is (input, action, score) Examples: Qlearning, temporal difference learning 38