CSL465/603 - Machine Learning

Similar documents
Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Python Machine Learning

(Sub)Gradient Descent

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Rule Learning With Negation: Issues Regarding Effectiveness

Probabilistic Latent Semantic Analysis

Lecture 1: Basic Concepts of Machine Learning

CS Machine Learning

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Rule Learning with Negation: Issues Regarding Effectiveness

A Case Study: News Classification Based on Term Frequency

Learning From the Past with Experiment Databases

Softprop: Softmax Neural Network Backpropagation Learning

Generative models and adversarial training

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Assignment 1: Predicting Amazon Review Ratings

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Foothill College Summer 2016

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Welcome to. ECML/PKDD 2004 Community meeting

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Semi-Supervised Face Detection

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Agents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators

arxiv: v1 [cs.lg] 15 Jun 2015

A Case-Based Approach To Imitation Learning in Robotic Agents

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

MGT/MGP/MGB 261: Investment Analysis

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Switchboard Language Model Improvement with Conversational Data from Gigaword

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Word Segmentation of Off-line Handwritten Documents

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Introduction. Chem 110: Chemical Principles 1 Sections 40-52

Laboratorio di Intelligenza Artificiale e Robotica

Reducing Features to Improve Bug Prediction

Calibration of Confidence Measures in Speech Recognition

Intelligent Agents. Chapter 2. Chapter 2 1

Lecture 10: Reinforcement Learning

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

PHY2048 Syllabus - Physics with Calculus 1 Fall 2014

Probability and Statistics Curriculum Pacing Guide

Constructive Induction-based Learning Agents: An Architecture and Preliminary Experiments

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Truth Inference in Crowdsourcing: Is the Problem Solved?

Axiom 2013 Team Description Paper

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Model Ensemble for Click Prediction in Bing Search Ads

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Office Hours: Mon & Fri 10:00-12:00. Course Description

Speech Recognition at ICSI: Broadcast News and beyond

Reinforcement Learning by Comparing Immediate Reward

Laboratorio di Intelligenza Artificiale e Robotica

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Mining Association Rules in Student s Assessment Data

Evolutive Neural Net Fuzzy Filtering: Basic Description

Human Emotion Recognition From Speech

Artificial Neural Networks written examination

Learning Methods for Fuzzy Systems

Knowledge Transfer in Deep Convolutional Neural Nets

Indian Institute of Technology, Kanpur

STA 225: Introductory Statistics (CT)

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

THE UNIVERSITY OF SYDNEY Semester 2, Information Sheet for MATH2068/2988 Number Theory and Cryptography

Syllabus for CHEM 4660 Introduction to Computational Chemistry Spring 2010

AGN 331 Soil Science Lecture & Laboratory Face to Face Version, Spring, 2012 Syllabus

Firms and Markets Saturdays Summer I 2014

WHEN THERE IS A mismatch between the acoustic

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Math 96: Intermediate Algebra in Context

Data Structures and Algorithms

Using focal point learning to improve human machine tacit coordination

Australian Journal of Basic and Applied Sciences


Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Top US Tech Talent for the Top China Tech Company

CS 100: Principles of Computing

Mathematics Scoring Guide for Sample Test 2005

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Syllabus ENGR 190 Introductory Calculus (QR)

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Syllabus - ESET 369 Embedded Systems Software, Fall 2016

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Applications of data mining algorithms to analysis of medical data

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Transcription:

CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1

Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am Tuesday 10.50-11.40am Wednesday 11.45am- 12.35pm Lab hours Monday 1.30-4.10pm Tuesday 1.30-4.10pm TA Sanatan Sukhija sanatan@iitrpr.ac.in Second TA - TBD Office Hours Instructor Monday afternoon during the lab hours or by appointment TA- Monday and Tuesday lab hours Course google group csl603f2016@iitrpr.ac.in Pre-registered students will be automatically added. Others, please send an email by Friday July 29 th. Pseudonym Email your 5 character key by July 29 th. Else we will assign a random one for you. Introduction CSL465/603 - Machine Learning 2

Reference Material No fixed textbook. Primary reference books source will be announced Other reference material Copies of reference material is available in the library Introduction CSL465/603 - Machine Learning 3

Pre-requisites Officially CSL201 Data Structures However, we will be using concepts from Probability Statistics Linear Algebra Optimization (operations research) Revision might be helpful Introduction CSL465/603 - Machine Learning 4

Tentative Course Schedule Introduction CSL465/603 - Machine Learning 5

Quizzes 30% Almost every Thursday 9.00-10.00am Room - L3 Covers material discussed from the previous quiz till the current week Duration 30-45m Top 6 out of 8 will be considered towards the final grade. Additional quizzes will not be conducted. Quiz Date Q1 4/8 Q2 11/8 Q3 25/8 Q4 1/9 Q5 6/10 Q6 13/10 Q7 27/10 Q8 3/11 Introduction CSL465/603 - Machine Learning 6

Labs 30% Due every third Friday 11.55pm Programming assignments Start early, experiments will take time to run!!! Individual labs TA is available for any assistance Students are encouraged to contact the TA for clarifications regarding the labs Labs Date L1 19/8 L2 9/9 L3 30/9 L4 21/10 L5 11/11 Introduction CSL465/603 - Machine Learning 7

Project 10% - Tentative If project is included, contribution to the overall grade from quizzes will reduce to 20% Will be decided after the add and drop period is over. Teams of 2 students. Introduction CSL465/603 - Machine Learning 8

Grading Scheme Tentative Breakup Quizzes (6 out of 8) 20-30% Labs (5) 30% Mid-semester exam 20% End-semester exam - 20% Attendance Bonus 1% Attendance is not mandatory, however attendance will be taken for every class and will count towards the bonus points. Passing criteria A student must secure an overall score of 40(out of 100) and a combined exam score of 60 (out of 200) to pass the course. Introduction CSL465/603 - Machine Learning 9

Honor Code Unless explicitly stated otherwise, for all labs Strictly individual effort Group discussions at a high level are encouraged You are forbidden from trawling the web for answers/code etc. Any infraction will be dealt with the severest terms allowed. I reserve the right to question you with regards to your submission, if I suspect any misconduct. Introduction CSL465/603 - Machine Learning 10

Course Website http://cse.iitrpr.ac.in/ckn/courses/f2016/csl603/csl60 3.html All class related material will be accessible from the webpage Labs will be uploaded incrementally and will be notified through email Lab submission is only on moodle No separate handouts, encourage you to take notes during the class. PDF version of lecture slides will be available on the class website. Introduction CSL465/603 - Machine Learning 11

What is Machine Learning? Herbert Simon (1970) Any process by which a system improves its performance Tim Mitchell (1990) A computer program that improves its performance at some task through experience Wikipedia Deals with the construction and study of systems that can learn from data, rather than follow only explicity programmed instructions Introduction CSL465/603 - Machine Learning 12

Why study machine learning? Artificial Intelligence design and analysis of intelligent agents For an agent to exhibit intelligent behavior requires knowledge Explicitly specifying knowledge needed for specific tasks is hard, and often infeasible Learning an automated way to acquire knowledge. Introduction CSL465/603 - Machine Learning 13

Why study machine learning? http://www.gartner.com/newsroom/id/3114217 Introduction CSL465/603 - Machine Learning 14

Related Disciplines Probability and Statistics Applied Mathematics Operations Research Pattern Recognition Artificial Intelligence Data Mining Cognitive Science Neuroscience Big Data Introduction CSL465/603 - Machine Learning 15

General Architecture Pedro Domingos Hundreds (if not thousands) of machine learning algorithms Generic architecture has three components Representation How would you like to characterize what is being learned? Evaluation How would you like to measure the goodness of what is being learned Optimization Given the evaluation and characterization, find the optimum representation. Introduction CSL465/603 - Machine Learning 16

General Architecture - Representation Decision Trees Instances Bayes Networks Neural Networks Support Vector Machines Ensembles Gaussian Clusters Introduction CSL465/603 - Machine Learning 17

General Architecture - Evaluation Accuracy Precision and recall Sum of Squared Error Likelihood Posterior Probability Margin K-L Divergence Entropy Introduction CSL465/603 - Machine Learning 18

General Architecture- Optimization Combinatorial optimization Greedy search Convex optimization Gradient descent Constrained optimization Linear programming Introduction CSL465/603 - Machine Learning 19

Learning Paradigms and Applications 1. Introduction Supervised Learning Classification LeCun et. al., IEEE 1998 4 prostate specific antigen (PSA) and a number of clinical measures, in 97 men who were about to receive a radical prostatectomy. The goal is to predict the log of PSA (lpsa) from a number of measurements including log cancer volume (lcavol), log prostate weight lweight, age, log of benign prostatic hyperplasia amount lbph, seminal vesicle invasion svi, log of capsular penetration lcp, Gleason score gleason, and percent of Gleason scores 4 or 5 pgg45. Figure 1.1 is a scatterplot matrix of the variables. Some correlations with lpsa are evident, but a good predictive model is difficult to construct by eye. This is a supervised learning problem, known as a regression problem, because the outcome measurement is quantitative. Example 3: Handwritten Digit Recognition Introduction The data from this example come from the handwritten ZIP codes on envelopes from U.S. postal mail. Each image is a segment from a five digit ZIP code, isolating a single digit. The images are 16 16 eight-bit grayscale maps, with each pixel ranging in intensity from 0 to 255. Some sample images are shown in Figure 1.2. The images have been normalized to have approximately the same size and orientation. The task is to predict, from the 16 16 matrix of pixel intensities, the identity of each image (0, 1,..., 9) quickly and accurately. If it is accurate enough, the resulting algorithm would be used as part of an automatic sorting procedure for envelopes. This is a classification problem for which the error rate needs to be kept very low to avoid misdirection of Krizhevsky et. al., nips 2012 FIGURE 1.2. Examples of handwritten digits from U.S. postal envelopes. 20 the Figure 4: (Left) Eight ILSVRC-2010 test images and CSL465/603 - Machine Learning

Learning Paradigms and Applications Supervised Learning Classification Regression https://www.flickr.com/photos/306864 29@N07/sets/72157622330082619/ Introduction CSL465/603 - Machine Learning 21

Learning Paradigms and Applications Supervised Learning Classification Regression Unsupervised Learning Clustering Wiwie et.al., nature 2015 Introduction CSL465/603 - Machine Learning 22

Learning Paradigms and Applications Supervised Learning Classification Regression Unsupervised Learning Clustering Rule Mining Introduction CSL465/603 - Machine Learning 23

Learning Paradigms and Applications Supervised Learning Classification Regression Unsupervised Learning Clustering Rule Mining Semi-supervised Learning Shah et.al., bioinformatics 2015 Introduction CSL465/603 - Machine Learning 24

Reminder If you have decided to credit this course and have not pre-registered Send me an email at the earliest to add you to the google group. PG(MS, M.Tech, and PhD) students who are crediting the course, please meet me after today s class. There is no audit option in the course You can credit the course, or just attend the lectures If you have pre-registered and have decided to drop the course Please do so at the earliest, as it will help us organize the course and the TAs. Introduction CSL465/603 - Machine Learning 25

Learning Paradigms and Applications Supervised Learning Classification Regression Unsupervised Learning Clustering Rule Mining Semi-supervised Learning Dimensionality Reduction Tenenbaum et.al., science 2000 Introduction CSL465/603 - Machine Learning 26

Learning Paradigms and Applications Supervised Learning Classification Regression Unsupervised Learning Clustering Rule Mining Semi-supervised Learning Dimensionality Reduction Reinforcement Learning Kormushev et.al., robotics 2013 Introduction CSL465/603 - Machine Learning 27

Other Learning Paradigms Transfer Learning Transfer of knowledge between multiple domains Active Learning Learning algorithm interactively queries an oracle to obtain the desired outputs for new data points Online Learning Learning on the fly Zero shot learning Representation Learning Automatically learning the representation from raw data Deep Learning Introduction CSL465/603 - Machine Learning 28

Topics to be covered in this course* Supervised Learning Decision trees, Naïve Bayes classifier, Instance based learning (k-nn), Linear and Logistic regression, Artificial neural networks, Kernel methods, Ensembles. Unsupervised Learning Clustering Dimensionality reduction Temporal models Hidden Markov model Design and Analysis of Experiments *Tentative Introduction CSL465/603 - Machine Learning 29

Machine Learning in Practice Understanding the domain, prior knowledge, and goals Data collection, integration, selection, cleaning, preprocessing, Learning models Interpreting results Consolidating and delpoying discovered knowledge Loop... Pedro Domingos Introduction CSL465/603 - Machine Learning 30

Machine Learning Challenges Curse of Dimensionality Intuition fails in high dimensional spaces Overfitting Things look rosy while training, but fail miserably when testing Sample size (number of examples) Often obtaining good examples is a hard, cumbersome, and error-prone process What algorithm to choose? No clear answer on what approach to select from the different options. Too many knobs (hyper-parameters) to turn Carefully conducted experiments that search through the hyper-parameter space for the optimal setting Introduction CSL465/603 - Machine Learning 31

Machine Learning Resources Data Repositories UCI ML repository Challenges Kaggle, KDD cup, Software Weka (Java) R (~ Python) Machine learning open source software (mloss.org/software) LibSVM Conferences and Journals ICDM, ICML, KDD, IJCAI, AAAI, UAI, AISTATS, COLT,... ACM TKDD, IEEE TKDE, JMLR, MLJ,... Introduction CSL465/603 - Machine Learning 32

Supervised Learning Supervised Learning CSL465/603 - Machine Learning 33

Supervised Learning Given a set of training examples x, f x = y, for some unknown function f Estimate a good approximation to f Example applications Face recognition x: raw intensity face image f(x): name of the person. Loan approval x: properties of a customer (like age, income, liability, job, ) f(x): loan approved or not. Autonomous Steering x: image of the road ahead f(x): Degrees to turn the steering wheel. Introduction CSL465/603 - Machine Learning 34

Example: Family Car Learning Task Learn to classify cars into one of two classes- family car or otherwise Representation Each car is represented by two features (attributes) engine power and price Training set Several training examples of already classified cars Goal Learn a classifier that accurately classified (new unseen) cars Supervised Learning CSL465/603 - Machine Learning 35

Example: Cars x 2 : Engine power x 2 t x 1 t x 1 : Price Introduction CSL465/603 - Machine Learning 36

Definitions (1) Feature (attribute): x ) A property of the object to be classified Discrete or continuous E.g., engine power, price Instance: x = [x,, x -,, x / ] The feature values for a specific object E.g., engine power = 100, price = high Instance space: I Space of all possible instances Class: Y Categorical feature of an object Set of instances of objects in this category E.g., family car Introduction CSL465/603 - Machine Learning 37

Example: Family Car : Engine power x 2 e 2 C e 1 p 1 p 2 x 1 : Price Introduction CSL465/603 - Machine Learning 38

Definitions (2) Example: (x, y) Instance along with its class membership Positive example: member of class (y = 1) Negative example: not a member of class (y = 0) Training set: X = {x 7, y 7 }, 1 t N Set of N examples Target concept (C) Correct expression of class E.g., (e 1 engine power e 2 ) and (p 1 price p 2 ) Concept class Space of all possible target concepts E.g., axis-aligned rectangles in instance space E.g., power set of instance space Introduction CSL465/603 - Machine Learning 39

Definitions (3) Hypothesis: h x {0,1} Approximation to target concept Hypothesis class: H Space of all possible hypotheses E.g., axis-aligned rectangles E.g., axis-aligned ellipses Learning goal Find hypothesis h H that closely approximates target concept C h is the output classifier Target concept may not be in H Introduction CSL465/603 - Machine Learning 40

Example: Hypothesis Error Introduction CSL465/603 - Machine Learning 41

Definitions (4) Empirical error How well h classifies training set X D E h X = 1 N B 1 h x 7 y 7 EF, Generalization error How well h classifies instances not in X True error How well h classifies entire instance space E h = 1 I B 1 h x 7 y 7 I J Most specific hypothesis - S Consistent hypothesis covering fewest instances Most general hypothesis - G Consistent hypothesis covering most instances Version space All hypothesis between S and G Introduction CSL465/603 - Machine Learning 42

Example: Version Space : Engine power x 2 G S C x 1 : Price Introduction CSL465/603 - Machine Learning 43

Thinking of Supervised Learning Learning is the removal of our remaining uncertainty Suppose we know that the concept is a rectangle, we can use the training data to infer the correct rectangle. In general Model (hypothesis): h x θ Loss function: E θ X = L y 7, h x 7 θ E Optimization procedure: θ = argmin W E θ X Introduction CSL465/603 - Machine Learning 44

Learning under noisy conditions Sources for noise Incorrect feature values Incorrect class labels Hidden or latent features (missing) Impact Overfitting trying too hard to fit the hypothesis h to the noisy data. Introduction CSL465/603 - Machine Learning 45

Underfitting vs Overfitting x 2 h 2 h 1 Introduction CSL465/603 - Machine Learning 46 x 1

Bias vs Variance Low Variance High Variance High Bias Low Bias Domingos, cacm 2012 Introduction CSL465/603 - Machine Learning 47

Characterization of Hypothesis Space Is the hypothesis deterministic or stochastic? Deterministic - Training example is either consistent (correctly predicted) or inconsistent (incorrectly predicted) Stochastic Training example is more or less likely (probabilistic output) Parametrization discrete or continuous? (or mixed) Discrete space perform combinatorial search Continuous space perform numerical search Introduction CSL465/603 - Machine Learning 48

Framework for Learning Algorithms Pedro Domingos Search procedure Direct computation solve for hypothesis directly Local search start with an initial hypothesis, make small improvements until a local optimum Timing Eager Analyze training data and construct an explicit hypothesis Online analyze each training example as it is presented Batch collect training examples and analyze them together Lazy Store the training data and wait until a test data point is presented to construct the hypothesis Introduction CSL465/603 - Machine Learning 49