CSE 417T: Introduction to Machine Learning. Lecture 1: Introduction. Henry Chai 08/28/18

Similar documents
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Python Machine Learning

Artificial Neural Networks written examination

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

Lecture 1: Machine Learning Basics

CS Machine Learning

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Lecture 10: Reinforcement Learning

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Learning From the Past with Experiment Databases

A study of speaker adaptation for DNN-based speech synthesis

Discriminative Learning of Beam-Search Heuristics for Planning

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

CS177 Python Programming

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Lecture 1: Basic Concepts of Machine Learning

Introduction to Causal Inference. Problem Set 1. Required Problems

Assignment 1: Predicting Amazon Review Ratings

CSL465/603 - Machine Learning

STA 225: Introductory Statistics (CT)

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

12- A whirlwind tour of statistics

Multivariate k-nearest Neighbor Regression for Time Series data -

Introduction to Simulation

School of Innovative Technologies and Engineering

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Human-Computer Interaction CS Overview for Today. Who am I? 1/15/2012. Prof. Stephen Intille

Human Emotion Recognition From Speech

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

TD(λ) and Q-Learning Based Ludo Players

Axiom 2013 Team Description Paper

The Evolution of Random Phenomena

Data Structures and Algorithms

Speech Emotion Recognition Using Support Vector Machine

Using focal point learning to improve human machine tacit coordination

Welcome to. ECML/PKDD 2004 Community meeting

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Learning Methods for Fuzzy Systems

Time series prediction

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Multi-label classification via multi-target regression on data streams

Rule Learning With Negation: Issues Regarding Effectiveness

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

How to read a Paper ISMLL. Dr. Josif Grabocka, Carlotta Schatten

Data Fusion Through Statistical Matching

A Reinforcement Learning Variant for Control Scheduling

Model Ensemble for Click Prediction in Bing Search Ads

Probabilistic Latent Semantic Analysis

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

University of Victoria School of Exercise Science, Physical and Health Education EPHE 245 MOTOR LEARNING. Calendar Description Units: 1.

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

CS 100: Principles of Computing

Rule Learning with Negation: Issues Regarding Effectiveness

Hierarchical Linear Models I: Introduction ICPSR 2015

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Why Did My Detector Do That?!

Mathematics. Mathematics

B. How to write a research paper

Australian Journal of Basic and Applied Sciences

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Multi-label Classification via Multi-target Regression on Data Streams

Softprop: Softmax Neural Network Backpropagation Learning

CS 446: Machine Learning

Laboratorio di Intelligenza Artificiale e Robotica

Probability and Statistics Curriculum Pacing Guide

STAT 220 Midterm Exam, Friday, Feb. 24

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Issues in the Mining of Heart Failure Datasets

MGT/MGP/MGB 261: Investment Analysis

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Improving Fairness in Memory Scheduling

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Course Content Concepts

Mining Association Rules in Student s Assessment Data

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Using EEG to Improve Massive Open Online Courses Feedback Interaction

Learning to Schedule Straight-Line Code

Reinforcement Learning by Comparing Immediate Reward

An OO Framework for building Intelligence and Learning properties in Software Agents

Universidade do Minho Escola de Engenharia

arxiv: v1 [cs.lg] 15 Jun 2015

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Device Independence and Extensibility in Gesture Recognition

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

arxiv: v2 [cs.cv] 30 Mar 2017

Transcription:

CSE 417T: Introduction to Machine Learning Lecture 1: Introduction Henry Chai 08/28/18

Website: http://classes.cec.wustl.edu/~cse417t/ Piazza (signup with Wash U email) Course Information Gradescope (signup with M6Z8XD) and SVN Textbooks: Learning From Data (AML), http://amlbook.com/ Computer Age Statistical Inference: Algorithms, Evidence, and Data Science (CASI), https://web.stanford.edu/~hastie/casi/ 2

Course Information Grading: Homework assignments (68): 50% Mix of programming and pencilandpaper problems Worst (secondworst) scores discounted 60% (40%) 5 total late days, no more than 2 usable on any one assignment Collaboration: Feel free to discuss homework with other students Must write your own solutions Must cite all external sources (including other students) Tests (2): 50% 10/4/18, 6:30 PM 8:30 PM 12/5/18, 6:30 PM 8:30 PM Location TBD 3

First Half of the Course: Foundations Theory, Proofs, Math, Probability, Boring Stuff, etc Second Half of the Course: Techniques Random Forests! Support Vector Machines! Neural Networks! Yay! Overview 4

Tentative Schedule Date Topic 1 8/28 Introduction 2 8/30 Generalization 3 9/4 Matlab tutorial 4 9/6 Hypothesis sets 5 9/11 Infinitedimensional hypothesis sets 6 9/13 Biasvariance tradeoff 7 9/18 Linear regression 8 9/20 Logistic regression 9 9/25 Overfitting 10 9/27 Regularization 11 10/2 Exam review 5

Machine Learning (Then) 6

Machine Learning (Now) 7

There exists a pattern Machine Learning The pattern is difficult/impossible to describe There is data Use data to learn the pattern 8

Example: Approving Credit 9

Unknown target function!: # % Training data Formal Setup * =,, /,,, 1, / 1 Learning Algorithm ) Hypothesis Set H Learned Hypothesis H (: # % 10

Unknown target function!: # % Training data Formal Setup & = ( ), + ),, (, + Learning Algorithm 2 Hypothesis Set H Learned Hypothesis H 0! 11

Unknown target function!: # % Learning Model Training data & = ( ), + ),, (, + Learning Algorithm 2 Hypothesis Set H Learned Hypothesis H 0! 12

Example: Inputs, Outputs and Data Assumptions: Two continuous inputs credit score and credit line size One binary output approve or deny Dataset of! historical observations Formally, Input space " = R % Output space & = 1 deny, +1 approve Dataset 4 = 5 66, 5 6%, 7 6,, 5 96, 5 9%, 7 9 = X, 7 where X = 5 66 5 6% R 9 % and 7 = 5 96 5 9% 7 6 7 9 R 9 13

Example: Hypothesis Set Perceptron Given some input " = " $, " & : & h " = +1 if,.$ / " > 1 1 otherwise 14

Example: Hypothesis Set Perceptron Given some input " = " $, " & : & h " = +1 if,.$ / " 1 > 0 1 otherwise 15

Example: Hypothesis Set Perceptron Given some input " = " $, " & : h " = ()*+, / " 1 &.$ 16

Example: Hypothesis Set Perceptron Given some input " = " $ = 1, " ', " ( : h " = *+,. 1 / " / ( /0$ 17

Example: Hypothesis Set Perceptron Given some input " = " $ = 1, " ', " ( : h " = *+,. 1 / " / ( /0$ 18

Example: Hypothesis Set Perceptron Given some input " = " $ = 1, " ', " ( : ( h " = *+,. 1 / " / = *+, 1 2 " /0$ 19

20 000 18 000 16 000 14 000 Example: Data Credit Line Size ($) 12 000 10 000 8000 6000 4000 2000 400 450 500 550 600 650 700 750 800 850 Credit Score 20

20 000 18 000 16 000 14 000 Example: Hypothesis Credit Line Size ($) 12 000 10 000 8000! " = 43 000! ( = 60 6000 4000! * = 1 2000 400 450 500 550 600 650 700 750 800 850 Credit Score 21

20 000 18 000 16 000 14 000 Example: Hypothesis Credit Line Size ($) 12 000 10 000 8000! " = 650! ( = 1 6000 4000! * = 0 2000 400 450 500 550 600 650 700 750 800 850 Credit Score 22

20 000 18 000 16 000 14 000 Example: Hypothesis Credit Line Size ($) 12 000 10 000 8000! " = 21 200! ( = 39 6000 4000! + = 0.5 2000 400 450 500 550 600 650 700 750 800 850 Credit Score 23

Example: Learning Algorithm Perceptron Learning Algorithm (PLA) PLA finds a linear separator in finite time, if the training data is linearly separable Given: training data! = # $, & $,, # (, & ( Initialize ) to all zeros or (small) random numbers While some misclassified training example Randomly pick a misclassified training example # +, & +! s.t. h # + =./01 ) 2 # + & + Update ): ) = ) + & + # + 24

Example: Learning Algorithm Perceptron Learning Algorithm (PLA) PLA finds a linear separator in finite time, if the data is linearly separable Given: training data! = # $, & $,, # (, & ( Initialize ) to all zeros or (small) random numbers While some misclassified training example i.e. # +, & +! s.t. h # + =./01 ) 2 # + & + Randomly pick a misclassified training example Update ): ) = ) + & + # + 25

Example: Learning Algorithm Perceptron Learning Algorithm (PLA) PLA finds a linear separator in finite time, if the data is linearly separable Given: training data! = # $, & $,, # (, & ( Initialize ) to all zeros or (small) random numbers While some misclassified training example i.e. # +, & +! s.t. h # + =./01 ) 2 # + & + Randomly pick a misclassified training example Update ): ) = ) + & + # + 26

Example: Learning Algorithm Perceptron Learning Algorithm (PLA) PLA finds a linear separator in finite time, if the data is linearly separable Given: training data! = # $, & $,, # (, & ( Initialize ) to all zeros or (small) random numbers While some misclassified training example i.e. # +, & +! s.t. h # + =./01 ) 2 # + & + Randomly pick a misclassified training example, #, & Update ): ) = ) + & # 27

Perceptron Learning Algorithm (Intuition) Suppose ", $ & is a misclassified training example and $ = +1 * + " is negative After updating * = * + $ ", * + $ " + " = * + " + $ " + " is less negative than * + " Because $ > 0 and " + " > 0 A similar argument holds if $ = 1 28

2 1.8 1.6 Example: PLA 1.4 Credit Line Size ($10000) 1.2 1 0.8! = 4.3, 0.6, 1 0.6 0.4 0.2 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 Credit Score/100 29

2 1.8 1.6 Example: PLA 1.4 Credit Line Size ($10000) 1.2 1 0.8! = 4.3, 0.6, 1, = 1, 6.2, 1.5 0.6 0.4 0.2! + 0, = 5.3, 5.6, 0.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 Credit Score/100 30

2 1.8 1.6 Example: PLA 1.4 Credit Line Size ($10000) 1.2 1 0.8! = 5.3, 5.6, 0.5 0.6 0.4 0.2 2 1 0 1 2 3 4 5 6 7 8 Credit Score/100 31

2 1.8 1.6 Example: PLA 1.4 Credit Line Size ($10000) 1.2 1 0.8! = 5.3, 5.6, 0.5 + = 1, 8, 1 0.6 0.4 0.2! + / + = 4.3, 2.4, 0.5 2 1 0 1 2 3 4 5 6 7 8 Credit Score/100 32

2 1.8 1.6 Example: PLA 1.4 Credit Line Size ($10000) 1.2 1 0.8! = 4.3, 2.4, 0.5 0.6 0.4 0.2 2 1 0 1 2 3 4 5 6 7 8 Credit Score/100 33

Types of Learning Supervised Learning Training data is (input, output) Examples: linear/logistic regression, support vector machines, neural networks Variants: active learning and online learning Unsupervised Learning Training data is (input) Examples: clustering, principal component analysis, outlier detection Reinforcement Learning Training data is (input, action, score) Examples: Qlearning, temporal difference learning 34

6 5.5 5 4.5 4 Types of Learning 3.5 3 2.5 2 1.5 1 0.5 0 0 1 2 3 4 5 6 7 8 9 35

Types of Learning Supervised Learning Training data is (input, output) Examples: linear/logistic regression, support vector machines, neural networks Variants: active learning and online learning Unsupervised Learning Training data is (input) Examples: clustering, principal component analysis, outlier detection Reinforcement Learning Training data is (input, action, score) Examples: Qlearning, temporal difference learning 36

Types of Learning Source: https://www.xkcd.com/242/ 37

Types of Learning Supervised Learning (this class!) Training data is (input, output) Examples: linear/logistic regression, support vector machines, neural networks Variants: active learning and online learning Unsupervised Learning (CSE 517A) Training data is (input) Examples: clustering, principal component analysis, outlier detection Reinforcement Learning (CSE 511A) Training data is (input, action, score) Examples: Qlearning, temporal difference learning 38