CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1
Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am Tuesday 10.50-11.40am Wednesday 11.45am- 12.35pm Lab hours Monday 1.30-4.10pm Tuesday 1.30-4.10pm TA Sanatan Sukhija sanatan@iitrpr.ac.in Second TA - TBD Office Hours Instructor Monday afternoon during the lab hours or by appointment TA- Monday and Tuesday lab hours Course google group csl603f2016@iitrpr.ac.in Pre-registered students will be automatically added. Others, please send an email by Friday July 29 th. Pseudonym Email your 5 character key by July 29 th. Else we will assign a random one for you. Introduction CSL465/603 - Machine Learning 2
Reference Material No fixed textbook. Primary reference books source will be announced Other reference material Copies of reference material is available in the library Introduction CSL465/603 - Machine Learning 3
Pre-requisites Officially CSL201 Data Structures However, we will be using concepts from Probability Statistics Linear Algebra Optimization (operations research) Revision might be helpful Introduction CSL465/603 - Machine Learning 4
Tentative Course Schedule Introduction CSL465/603 - Machine Learning 5
Quizzes 30% Almost every Thursday 9.00-10.00am Room - L3 Covers material discussed from the previous quiz till the current week Duration 30-45m Top 6 out of 8 will be considered towards the final grade. Additional quizzes will not be conducted. Quiz Date Q1 4/8 Q2 11/8 Q3 25/8 Q4 1/9 Q5 6/10 Q6 13/10 Q7 27/10 Q8 3/11 Introduction CSL465/603 - Machine Learning 6
Labs 30% Due every third Friday 11.55pm Programming assignments Start early, experiments will take time to run!!! Individual labs TA is available for any assistance Students are encouraged to contact the TA for clarifications regarding the labs Labs Date L1 19/8 L2 9/9 L3 30/9 L4 21/10 L5 11/11 Introduction CSL465/603 - Machine Learning 7
Project 10% - Tentative If project is included, contribution to the overall grade from quizzes will reduce to 20% Will be decided after the add and drop period is over. Teams of 2 students. Introduction CSL465/603 - Machine Learning 8
Grading Scheme Tentative Breakup Quizzes (6 out of 8) 20-30% Labs (5) 30% Mid-semester exam 20% End-semester exam - 20% Attendance Bonus 1% Attendance is not mandatory, however attendance will be taken for every class and will count towards the bonus points. Passing criteria A student must secure an overall score of 40(out of 100) and a combined exam score of 60 (out of 200) to pass the course. Introduction CSL465/603 - Machine Learning 9
Honor Code Unless explicitly stated otherwise, for all labs Strictly individual effort Group discussions at a high level are encouraged You are forbidden from trawling the web for answers/code etc. Any infraction will be dealt with the severest terms allowed. I reserve the right to question you with regards to your submission, if I suspect any misconduct. Introduction CSL465/603 - Machine Learning 10
Course Website http://cse.iitrpr.ac.in/ckn/courses/f2016/csl603/csl60 3.html All class related material will be accessible from the webpage Labs will be uploaded incrementally and will be notified through email Lab submission is only on moodle No separate handouts, encourage you to take notes during the class. PDF version of lecture slides will be available on the class website. Introduction CSL465/603 - Machine Learning 11
What is Machine Learning? Herbert Simon (1970) Any process by which a system improves its performance Tim Mitchell (1990) A computer program that improves its performance at some task through experience Wikipedia Deals with the construction and study of systems that can learn from data, rather than follow only explicity programmed instructions Introduction CSL465/603 - Machine Learning 12
Why study machine learning? Artificial Intelligence design and analysis of intelligent agents For an agent to exhibit intelligent behavior requires knowledge Explicitly specifying knowledge needed for specific tasks is hard, and often infeasible Learning an automated way to acquire knowledge. Introduction CSL465/603 - Machine Learning 13
Why study machine learning? http://www.gartner.com/newsroom/id/3114217 Introduction CSL465/603 - Machine Learning 14
Related Disciplines Probability and Statistics Applied Mathematics Operations Research Pattern Recognition Artificial Intelligence Data Mining Cognitive Science Neuroscience Big Data Introduction CSL465/603 - Machine Learning 15
General Architecture Pedro Domingos Hundreds (if not thousands) of machine learning algorithms Generic architecture has three components Representation How would you like to characterize what is being learned? Evaluation How would you like to measure the goodness of what is being learned Optimization Given the evaluation and characterization, find the optimum representation. Introduction CSL465/603 - Machine Learning 16
General Architecture - Representation Decision Trees Instances Bayes Networks Neural Networks Support Vector Machines Ensembles Gaussian Clusters Introduction CSL465/603 - Machine Learning 17
General Architecture - Evaluation Accuracy Precision and recall Sum of Squared Error Likelihood Posterior Probability Margin K-L Divergence Entropy Introduction CSL465/603 - Machine Learning 18
General Architecture- Optimization Combinatorial optimization Greedy search Convex optimization Gradient descent Constrained optimization Linear programming Introduction CSL465/603 - Machine Learning 19
Learning Paradigms and Applications 1. Introduction Supervised Learning Classification LeCun et. al., IEEE 1998 4 prostate specific antigen (PSA) and a number of clinical measures, in 97 men who were about to receive a radical prostatectomy. The goal is to predict the log of PSA (lpsa) from a number of measurements including log cancer volume (lcavol), log prostate weight lweight, age, log of benign prostatic hyperplasia amount lbph, seminal vesicle invasion svi, log of capsular penetration lcp, Gleason score gleason, and percent of Gleason scores 4 or 5 pgg45. Figure 1.1 is a scatterplot matrix of the variables. Some correlations with lpsa are evident, but a good predictive model is difficult to construct by eye. This is a supervised learning problem, known as a regression problem, because the outcome measurement is quantitative. Example 3: Handwritten Digit Recognition Introduction The data from this example come from the handwritten ZIP codes on envelopes from U.S. postal mail. Each image is a segment from a five digit ZIP code, isolating a single digit. The images are 16 16 eight-bit grayscale maps, with each pixel ranging in intensity from 0 to 255. Some sample images are shown in Figure 1.2. The images have been normalized to have approximately the same size and orientation. The task is to predict, from the 16 16 matrix of pixel intensities, the identity of each image (0, 1,..., 9) quickly and accurately. If it is accurate enough, the resulting algorithm would be used as part of an automatic sorting procedure for envelopes. This is a classification problem for which the error rate needs to be kept very low to avoid misdirection of Krizhevsky et. al., nips 2012 FIGURE 1.2. Examples of handwritten digits from U.S. postal envelopes. 20 the Figure 4: (Left) Eight ILSVRC-2010 test images and CSL465/603 - Machine Learning
Learning Paradigms and Applications Supervised Learning Classification Regression https://www.flickr.com/photos/306864 29@N07/sets/72157622330082619/ Introduction CSL465/603 - Machine Learning 21
Learning Paradigms and Applications Supervised Learning Classification Regression Unsupervised Learning Clustering Wiwie et.al., nature 2015 Introduction CSL465/603 - Machine Learning 22
Learning Paradigms and Applications Supervised Learning Classification Regression Unsupervised Learning Clustering Rule Mining Introduction CSL465/603 - Machine Learning 23
Learning Paradigms and Applications Supervised Learning Classification Regression Unsupervised Learning Clustering Rule Mining Semi-supervised Learning Shah et.al., bioinformatics 2015 Introduction CSL465/603 - Machine Learning 24
Reminder If you have decided to credit this course and have not pre-registered Send me an email at the earliest to add you to the google group. PG(MS, M.Tech, and PhD) students who are crediting the course, please meet me after today s class. There is no audit option in the course You can credit the course, or just attend the lectures If you have pre-registered and have decided to drop the course Please do so at the earliest, as it will help us organize the course and the TAs. Introduction CSL465/603 - Machine Learning 25
Learning Paradigms and Applications Supervised Learning Classification Regression Unsupervised Learning Clustering Rule Mining Semi-supervised Learning Dimensionality Reduction Tenenbaum et.al., science 2000 Introduction CSL465/603 - Machine Learning 26
Learning Paradigms and Applications Supervised Learning Classification Regression Unsupervised Learning Clustering Rule Mining Semi-supervised Learning Dimensionality Reduction Reinforcement Learning Kormushev et.al., robotics 2013 Introduction CSL465/603 - Machine Learning 27
Other Learning Paradigms Transfer Learning Transfer of knowledge between multiple domains Active Learning Learning algorithm interactively queries an oracle to obtain the desired outputs for new data points Online Learning Learning on the fly Zero shot learning Representation Learning Automatically learning the representation from raw data Deep Learning Introduction CSL465/603 - Machine Learning 28
Topics to be covered in this course* Supervised Learning Decision trees, Naïve Bayes classifier, Instance based learning (k-nn), Linear and Logistic regression, Artificial neural networks, Kernel methods, Ensembles. Unsupervised Learning Clustering Dimensionality reduction Temporal models Hidden Markov model Design and Analysis of Experiments *Tentative Introduction CSL465/603 - Machine Learning 29
Machine Learning in Practice Understanding the domain, prior knowledge, and goals Data collection, integration, selection, cleaning, preprocessing, Learning models Interpreting results Consolidating and delpoying discovered knowledge Loop... Pedro Domingos Introduction CSL465/603 - Machine Learning 30
Machine Learning Challenges Curse of Dimensionality Intuition fails in high dimensional spaces Overfitting Things look rosy while training, but fail miserably when testing Sample size (number of examples) Often obtaining good examples is a hard, cumbersome, and error-prone process What algorithm to choose? No clear answer on what approach to select from the different options. Too many knobs (hyper-parameters) to turn Carefully conducted experiments that search through the hyper-parameter space for the optimal setting Introduction CSL465/603 - Machine Learning 31
Machine Learning Resources Data Repositories UCI ML repository Challenges Kaggle, KDD cup, Software Weka (Java) R (~ Python) Machine learning open source software (mloss.org/software) LibSVM Conferences and Journals ICDM, ICML, KDD, IJCAI, AAAI, UAI, AISTATS, COLT,... ACM TKDD, IEEE TKDE, JMLR, MLJ,... Introduction CSL465/603 - Machine Learning 32
Supervised Learning Supervised Learning CSL465/603 - Machine Learning 33
Supervised Learning Given a set of training examples x, f x = y, for some unknown function f Estimate a good approximation to f Example applications Face recognition x: raw intensity face image f(x): name of the person. Loan approval x: properties of a customer (like age, income, liability, job, ) f(x): loan approved or not. Autonomous Steering x: image of the road ahead f(x): Degrees to turn the steering wheel. Introduction CSL465/603 - Machine Learning 34
Example: Family Car Learning Task Learn to classify cars into one of two classes- family car or otherwise Representation Each car is represented by two features (attributes) engine power and price Training set Several training examples of already classified cars Goal Learn a classifier that accurately classified (new unseen) cars Supervised Learning CSL465/603 - Machine Learning 35
Example: Cars x 2 : Engine power x 2 t x 1 t x 1 : Price Introduction CSL465/603 - Machine Learning 36
Definitions (1) Feature (attribute): x ) A property of the object to be classified Discrete or continuous E.g., engine power, price Instance: x = [x,, x -,, x / ] The feature values for a specific object E.g., engine power = 100, price = high Instance space: I Space of all possible instances Class: Y Categorical feature of an object Set of instances of objects in this category E.g., family car Introduction CSL465/603 - Machine Learning 37
Example: Family Car : Engine power x 2 e 2 C e 1 p 1 p 2 x 1 : Price Introduction CSL465/603 - Machine Learning 38
Definitions (2) Example: (x, y) Instance along with its class membership Positive example: member of class (y = 1) Negative example: not a member of class (y = 0) Training set: X = {x 7, y 7 }, 1 t N Set of N examples Target concept (C) Correct expression of class E.g., (e 1 engine power e 2 ) and (p 1 price p 2 ) Concept class Space of all possible target concepts E.g., axis-aligned rectangles in instance space E.g., power set of instance space Introduction CSL465/603 - Machine Learning 39
Definitions (3) Hypothesis: h x {0,1} Approximation to target concept Hypothesis class: H Space of all possible hypotheses E.g., axis-aligned rectangles E.g., axis-aligned ellipses Learning goal Find hypothesis h H that closely approximates target concept C h is the output classifier Target concept may not be in H Introduction CSL465/603 - Machine Learning 40
Example: Hypothesis Error Introduction CSL465/603 - Machine Learning 41
Definitions (4) Empirical error How well h classifies training set X D E h X = 1 N B 1 h x 7 y 7 EF, Generalization error How well h classifies instances not in X True error How well h classifies entire instance space E h = 1 I B 1 h x 7 y 7 I J Most specific hypothesis - S Consistent hypothesis covering fewest instances Most general hypothesis - G Consistent hypothesis covering most instances Version space All hypothesis between S and G Introduction CSL465/603 - Machine Learning 42
Example: Version Space : Engine power x 2 G S C x 1 : Price Introduction CSL465/603 - Machine Learning 43
Thinking of Supervised Learning Learning is the removal of our remaining uncertainty Suppose we know that the concept is a rectangle, we can use the training data to infer the correct rectangle. In general Model (hypothesis): h x θ Loss function: E θ X = L y 7, h x 7 θ E Optimization procedure: θ = argmin W E θ X Introduction CSL465/603 - Machine Learning 44
Learning under noisy conditions Sources for noise Incorrect feature values Incorrect class labels Hidden or latent features (missing) Impact Overfitting trying too hard to fit the hypothesis h to the noisy data. Introduction CSL465/603 - Machine Learning 45
Underfitting vs Overfitting x 2 h 2 h 1 Introduction CSL465/603 - Machine Learning 46 x 1
Bias vs Variance Low Variance High Variance High Bias Low Bias Domingos, cacm 2012 Introduction CSL465/603 - Machine Learning 47
Characterization of Hypothesis Space Is the hypothesis deterministic or stochastic? Deterministic - Training example is either consistent (correctly predicted) or inconsistent (incorrectly predicted) Stochastic Training example is more or less likely (probabilistic output) Parametrization discrete or continuous? (or mixed) Discrete space perform combinatorial search Continuous space perform numerical search Introduction CSL465/603 - Machine Learning 48
Framework for Learning Algorithms Pedro Domingos Search procedure Direct computation solve for hypothesis directly Local search start with an initial hypothesis, make small improvements until a local optimum Timing Eager Analyze training data and construct an explicit hypothesis Online analyze each training example as it is presented Batch collect training examples and analyze them together Lazy Store the training data and wait until a test data point is presented to construct the hypothesis Introduction CSL465/603 - Machine Learning 49