Machine Learning Foundations

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS Machine Learning

Lecture 1: Machine Learning Basics

Python Machine Learning

Active Learning. Yingyu Liang Computer Sciences 760 Fall

(Sub)Gradient Descent

CS 446: Machine Learning

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Extending Place Value with Whole Numbers to 1,000,000

Assignment 1: Predicting Amazon Review Ratings

Laboratorio di Intelligenza Artificiale e Robotica

CSL465/603 - Machine Learning

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Lecture 1: Basic Concepts of Machine Learning

Exploration. CS : Deep Reinforcement Learning Sergey Levine

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Axiom 2013 Team Description Paper

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

EdX Learner s Guide. Release

Model Ensemble for Click Prediction in Bing Search Ads

Generative models and adversarial training

A Neural Network GUI Tested on Text-To-Phoneme Mapping

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Australian Journal of Basic and Applied Sciences

Artificial Neural Networks written examination

Laboratorio di Intelligenza Artificiale e Robotica

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Reinforcement Learning by Comparing Immediate Reward

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Lecture 6: Applications

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Kelli Allen. Vicki Nieter. Jeanna Scheve. Foreword by Gregory J. Kaiser

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Human Emotion Recognition From Speech

arxiv: v1 [cs.cv] 10 May 2017

Memory-based grammatical error correction

Linking Task: Identifying authors and book titles in verbose queries

Agents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

The Role of the Head in the Interpretation of English Deverbal Compounds

Reducing Features to Improve Bug Prediction

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Using focal point learning to improve human machine tacit coordination

A Case Study: News Classification Based on Term Frequency

12- A whirlwind tour of statistics

Teachers: Use this checklist periodically to keep track of the progress indicators that your learners have displayed.

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

INTERMEDIATE ALGEBRA PRODUCT GUIDE

Switchboard Language Model Improvement with Conversational Data from Gigaword

AQUA: An Ontology-Driven Question Answering System

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Algebra 2- Semester 2 Review

An OO Framework for building Intelligence and Learning properties in Software Agents

Rule Learning With Negation: Issues Regarding Effectiveness

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Learning Methods in Multilingual Speech Recognition

Intelligent Agents. Chapter 2. Chapter 2 1

Word Segmentation of Off-line Handwritten Documents

Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games

Rule Learning with Negation: Issues Regarding Effectiveness

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

A Reinforcement Learning Variant for Control Scheduling

Radius STEM Readiness TM

Grade 6: Module 4: Unit 1: Lesson 3 Tracing a Speaker s Argument: John Stossel DDT Video

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Function Tables With The Magic Function Machine

Probabilistic Latent Semantic Analysis

Five Challenges for the Collaborative Classroom and How to Solve Them

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

A Comparison of Two Text Representations for Sentiment Analysis

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Indian Institute of Technology, Kanpur

Create Quiz Questions

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

School of Medicine Finances, Funds Flows, and Fun Facts. Presentation for Research Wednesday June 11, 2014

The stages of event extraction

Compositional Semantics

Evolutive Neural Net Fuzzy Filtering: Basic Description

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Probability and Statistics Curriculum Pacing Guide

CLASS EXODUS. The alumni giving rate has dropped 50 percent over the last 20 years. How can you rethink your value to graduates?

Unit 3: Lesson 1 Decimals as Equal Divisions

MENTORING. Tips, Techniques, and Best Practices

Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners

Welcome to. ECML/PKDD 2004 Community meeting

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

IMPORTANT STEPS WHEN BUILDING A NEW TEAM

Parsing of part-of-speech tagged Assamese Texts

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Transcription:

Machine Learning Foundations ( 機器學習基石 ) Lecture 3: Types of Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系 ) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 0/29

1 When Can Machines Learn? Roadmap Lecture 2: Learning to Answer Yes/No PLA A takes linear separable D and perceptrons H to get hypothesis g Lecture 3: Types of Learning Learning with Different Output Space Y Learning with Different Data Label y n Learning with Different Protocol f (x n, y n ) Learning with Different Input Space X 2 Why Can Machines Learn? 3 How Can Machines Learn? 4 How Can Machines Learn Better? Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 1/29

unknown target function f : X Y (ideal credit approval formula) Learning with Different Output Space Y Credit Approval Problem Revisited age 23 years gender female annual salary NTD 1,000,000 year in residence 1 year year in job 0.5 year current debt 200,000 credit? {no( 1), yes(+1)} training examples D : (x 1, y 1 ),, (x N, y N ) (historical records in bank) learning algorithm A final hypothesis g f ( learned formula to be used) hypothesis set H (set of candidate formula) Y = { 1, +1}: binary classification Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 2/29

Learning with Different Output Space Y More Binary Classification Problems credit approve/disapprove email spam/non-spam patient sick/not sick ad profitable/not profitable answer correct/incorrect (KDDCup 2010) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 3/29

Learning with Different Output Space Y More Binary Classification Problems credit approve/disapprove email spam/non-spam patient sick/not sick ad profitable/not profitable answer correct/incorrect (KDDCup 2010) core and important problem with many tools as building block of other tools Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 3/29

Learning with Different Output Space Y Multiclass Classification: Coin Recognition Problem Mass 25 classify US coins (1c, 5c, 10c, 25c) by (size, mass) 1 5 10 Size Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/29

Learning with Different Output Space Y Multiclass Classification: Coin Recognition Problem Mass 1 5 25 classify US coins (1c, 5c, 10c, 25c) by (size, mass) Y = {1c, 5c, 10c, 25c}, or Y = {1, 2,, K } (abstractly) 10 Size Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/29

Learning with Different Output Space Y Multiclass Classification: Coin Recognition Problem Mass 10 1 5 25 classify US coins (1c, 5c, 10c, 25c) by (size, mass) Y = {1c, 5c, 10c, 25c}, or Y = {1, 2,, K } (abstractly) binary classification: special case with K = 2 Size Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/29

Learning with Different Output Space Y Multiclass Classification: Coin Recognition Problem Mass 10 1 5 25 classify US coins (1c, 5c, 10c, 25c) by (size, mass) Y = {1c, 5c, 10c, 25c}, or Y = {1, 2,, K } (abstractly) binary classification: special case with K = 2 Size Other Multiclass Classification Problems written digits 0, 1,, 9 pictures apple, orange, strawberry emails spam, primary, social, promotion, update (Google) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/29

Learning with Different Output Space Y Multiclass Classification: Coin Recognition Problem Mass 10 1 5 25 classify US coins (1c, 5c, 10c, 25c) by (size, mass) Y = {1c, 5c, 10c, 25c}, or Y = {1, 2,, K } (abstractly) binary classification: special case with K = 2 Size Other Multiclass Classification Problems written digits 0, 1,, 9 pictures apple, orange, strawberry emails spam, primary, social, promotion, update (Google) many applications in practice, especially for recognition Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/29

Learning with Different Output Space Y Regression: Patient Recovery Prediction Problem binary classification: patient features sick or not multiclass classification: patient features which type of cancer Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/29

Learning with Different Output Space Y Regression: Patient Recovery Prediction Problem binary classification: patient features sick or not multiclass classification: patient features which type of cancer regression: patient features how many days before recovery Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/29

Learning with Different Output Space Y Regression: Patient Recovery Prediction Problem binary classification: patient features sick or not multiclass classification: patient features which type of cancer regression: patient features how many days before recovery Y = R or Y = [lower, upper] R (bounded regression) deeply studied in statistics Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/29

Learning with Different Output Space Y Regression: Patient Recovery Prediction Problem binary classification: patient features sick or not multiclass classification: patient features which type of cancer regression: patient features how many days before recovery Y = R or Y = [lower, upper] R (bounded regression) deeply studied in statistics Other Regression Problems company data stock price climate data temperature Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/29

Learning with Different Output Space Y Regression: Patient Recovery Prediction Problem binary classification: patient features sick or not multiclass classification: patient features which type of cancer regression: patient features how many days before recovery Y = R or Y = [lower, upper] R (bounded regression) deeply studied in statistics Other Regression Problems company data stock price climate data temperature also core and important with many statistical tools as building block of other tools Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/29

Learning with Different Output Space Y Structured Learning: Sequence Tagging Problem multiclass classification: word word class I }{{} pronoun love }{{} verb ML }{{} noun Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 6/29

Learning with Different Output Space Y Structured Learning: Sequence Tagging Problem I }{{} pronoun love }{{} verb ML }{{} noun multiclass classification: word word class structured learning: sentence structure (class of each word) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 6/29

Learning with Different Output Space Y Structured Learning: Sequence Tagging Problem I }{{} pronoun love }{{} verb ML }{{} noun multiclass classification: word word class structured learning: sentence structure (class of each word) Y = {PVN, PVP, NVN, PV, }, not including VVVVV huge multiclass classification problem (structure hyperclass) without explicit class definition Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 6/29

Learning with Different Output Space Y Structured Learning: Sequence Tagging Problem I }{{} pronoun love }{{} verb ML }{{} noun multiclass classification: word word class structured learning: sentence structure (class of each word) Y = {PVN, PVP, NVN, PV, }, not including VVVVV huge multiclass classification problem (structure hyperclass) without explicit class definition Other Structured Learning Problems protein data protein folding speech data speech parse tree Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 6/29

Learning with Different Output Space Y Structured Learning: Sequence Tagging Problem I }{{} pronoun love }{{} verb ML }{{} noun multiclass classification: word word class structured learning: sentence structure (class of each word) Y = {PVN, PVP, NVN, PV, }, not including VVVVV huge multiclass classification problem (structure hyperclass) without explicit class definition Other Structured Learning Problems protein data protein folding speech data speech parse tree a fancy but complicated learning problem Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 6/29

unknown target function f : X Y Learning with Different Output Space Y Mini Summary Learning with Different Output Space Y binary classification: Y = { 1, +1} multiclass classification: Y = {1, 2,, K } regression: Y = R structured learning: Y = structures... and a lot more!! training examples D : (x 1, y 1 ),, (x N, y N ) learning algorithm A final hypothesis g f hypothesis set H core tools: binary classification and regression Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 7/29

Learning with Different Output Space Y Fun Time What is this learning problem? The entrance system of the school gym, which does automatic face recognition based on machine learning, is built to charge four different groups of users differently: Staff, Student, Professor, Other. What type of learning problem best fits the need of the system? 1 binary classification 2 multiclass classification 3 regression 4 structured learning Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 8/29

Learning with Different Output Space Y Fun Time What is this learning problem? The entrance system of the school gym, which does automatic face recognition based on machine learning, is built to charge four different groups of users differently: Staff, Student, Professor, Other. What type of learning problem best fits the need of the system? 1 binary classification 2 multiclass classification 3 regression 4 structured learning Reference Answer: 2 There is an explicit Y that contains four classes. Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 8/29

Learning with Different Data Label y n Supervised: Coin Recognition Revisited Mass 25 unknown target function f : X Y 10 1 5 Size training examples D : (x 1, y 1 ),, (x N, y N ) learning algorithm A final hypothesis g f hypothesis set H supervised learning: every x n comes with corresponding y n Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 9/29

Learning with Different Data Label y n Unsupervised: Coin Recognition without y n Mass 25 5 1 10 Size supervised multiclass classification Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 10/29

Learning with Different Data Label y n Unsupervised: Coin Recognition without y n Mass 25 Mass 5 1 10 Size supervised multiclass classification Size unsupervised multiclass classification clustering Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 10/29

Learning with Different Data Label y n Unsupervised: Coin Recognition without y n Mass 25 Mass 5 1 10 Size supervised multiclass classification Size unsupervised multiclass classification clustering Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 10/29

Learning with Different Data Label y n Unsupervised: Coin Recognition without y n Mass 25 Mass 5 1 10 Size supervised multiclass classification Size unsupervised multiclass classification clustering Other Clustering Problems articles topics consumer profiles consumer groups clustering: a challenging but useful problem Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 10/29

Learning with Different Data Label y n Unsupervised: Learning without y n Other Unsupervised Learning Problems clustering: {x n } cluster(x) ( unsupervised multiclass classification ) i.e. articles topics density estimation: {x n } density(x) ( unsupervised bounded regression ) i.e. traffic reports with location dangerous areas outlier detection: {x n } unusual(x) ( extreme unsupervised binary classification ) i.e. Internet logs intrusion alert... and a lot more!! Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 11/29

Learning with Different Data Label y n Unsupervised: Learning without y n Other Unsupervised Learning Problems clustering: {x n } cluster(x) ( unsupervised multiclass classification ) i.e. articles topics density estimation: {x n } density(x) ( unsupervised bounded regression ) i.e. traffic reports with location dangerous areas outlier detection: {x n } unusual(x) ( extreme unsupervised binary classification ) i.e. Internet logs intrusion alert... and a lot more!! unsupervised learning: diverse, with possibly very different performance goals Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 11/29

Learning with Different Data Label y n Semi-supervised: Coin Recognition with Some y n Mass 25 Mass 25 Mass 5 5 1 1 10 10 supervised Size Size semi-supervised Size unsupervised (clustering) Other Semi-supervised Learning Problems face images with a few labeled face identifier (Facebook) medicine data with a few labeled medicine effect predictor Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 12/29

Learning with Different Data Label y n Semi-supervised: Coin Recognition with Some y n Mass 25 Mass 25 Mass 5 5 1 1 10 10 supervised Size Size semi-supervised Size unsupervised (clustering) Other Semi-supervised Learning Problems face images with a few labeled face identifier (Facebook) medicine data with a few labeled medicine effect predictor semi-supervised learning: leverage unlabeled data to avoid expensive labeling Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 12/29

Learning with Different Data Label y n Reinforcement Learning a very different but natural way of learning Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 13/29

Learning with Different Data Label y n Reinforcement Learning a very different but natural way of learning Teach Your Dog: Say Sit Down The dog pees on the ground. BAD DOG. THAT S A VERY WRONG ACTION. cannot easily show the dog that y n = sit when x n = sit down but can punish to say ỹ n = pee is wrong Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 13/29

Learning with Different Data Label y n Reinforcement Learning a very different but natural way of learning Teach Your Dog: Say Sit Down The dog sits down. Good Dog. Let me give you some cookies. still cannot show y n = sit when x n = sit down but can reward to say ỹ n = sit is good Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 13/29

Learning with Different Data Label y n Reinforcement Learning a very different but natural way of learning Teach Your Dog: Say Sit Down The dog sits down. Good Dog. Let me give you some cookies. still cannot show y n = sit when x n = sit down but can reward to say ỹ n = sit is good Other Reinforcement Learning Problems Using (x, ỹ, goodness) (customer, ad choice, ad click earning) ad system (cards, strategy, winning amount) black jack agent Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 13/29

Learning with Different Data Label y n Reinforcement Learning a very different but natural way of learning Teach Your Dog: Say Sit Down The dog sits down. Good Dog. Let me give you some cookies. still cannot show y n = sit when x n = sit down but can reward to say ỹ n = sit is good Other Reinforcement Learning Problems Using (x, ỹ, goodness) (customer, ad choice, ad click earning) ad system (cards, strategy, winning amount) black jack agent reinforcement: learn with partial/implicit information (often sequentially) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 13/29

Learning with Different Data Label y n Mini Summary Learning with Different Data Label y n supervised: all y n unsupervised: no y n unknown target function f : X Y semi-supervised: some y n reinforcement: implicit y n by goodness(ỹ n )... and more!! training examples D : (x 1, y 1 ),, (x N, y N ) learning algorithm A final hypothesis g f hypothesis set H core tool: supervised learning Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 14/29

Learning with Different Data Label y n Fun Time What is this learning problem? To build a tree recognition system, a company decides to gather one million of pictures on the Internet. Then, it asks each of the 10 company members to view 100 pictures and record whether each picture contains a tree. The pictures and records are then fed to a learning algorithm to build the system. What type of learning problem does the algorithm need to solve? 1 supervised 2 unsupervised 3 semi-supervised 4 reinforcement Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 15/29

Learning with Different Data Label y n Fun Time What is this learning problem? To build a tree recognition system, a company decides to gather one million of pictures on the Internet. Then, it asks each of the 10 company members to view 100 pictures and record whether each picture contains a tree. The pictures and records are then fed to a learning algorithm to build the system. What type of learning problem does the algorithm need to solve? 1 supervised 2 unsupervised 3 semi-supervised 4 reinforcement Reference Answer: 3 The 1, 000 records are the labeled (x n, y n ); the other 999, 000 pictures are the unlabeled x n. Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 15/29

Learning with Different Protocol f (x n, y n) Batch Learning: Coin Recognition Revisited Mass 25 unknown target function f : X Y 10 1 5 Size training examples D : (x 1, y 1 ),, (x N, y N ) learning algorithm A final hypothesis g f hypothesis set H batch supervised multiclass classification: learn from all known data Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 16/29

Learning with Different Protocol f (x n, y n) More Batch Learning Problems Mass 25 Mass 1 5 10 Size Size batch of (email, spam?) spam filter batch of (patient, cancer) cancer classifier batch of patient data group of patients batch learning: a very common protocol Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 17/29

Learning with Different Protocol f (x n, y n) Online: Spam Filter that Improves batch spam filter: learn with known (email, spam?) pairs, and predict with fixed g Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 18/29

Learning with Different Protocol f (x n, y n) Online: Spam Filter that Improves batch spam filter: learn with known (email, spam?) pairs, and predict with fixed g online spam filter, which sequentially: 1 observe an email x t 2 predict spam status with current g t (x t ) 3 receive desired label y t from user, and then update g t with (x t, y t ) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 18/29

Learning with Different Protocol f (x n, y n) Online: Spam Filter that Improves batch spam filter: learn with known (email, spam?) pairs, and predict with fixed g online spam filter, which sequentially: 1 observe an email x t 2 predict spam status with current g t (x t ) 3 receive desired label y t from user, and then update g t with (x t, y t ) Connection to What We Have Learned PLA can be easily adapted to online protocol (how?) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 18/29

Learning with Different Protocol f (x n, y n) Online: Spam Filter that Improves batch spam filter: learn with known (email, spam?) pairs, and predict with fixed g online spam filter, which sequentially: 1 observe an email x t 2 predict spam status with current g t (x t ) 3 receive desired label y t from user, and then update g t with (x t, y t ) Connection to What We Have Learned PLA can be easily adapted to online protocol (how?) reinforcement learning is often done online (why?) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 18/29

Learning with Different Protocol f (x n, y n) Online: Spam Filter that Improves batch spam filter: learn with known (email, spam?) pairs, and predict with fixed g online spam filter, which sequentially: 1 observe an email x t 2 predict spam status with current g t (x t ) 3 receive desired label y t from user, and then update g t with (x t, y t ) Connection to What We Have Learned PLA can be easily adapted to online protocol (how?) reinforcement learning is often done online (why?) online: hypothesis improves through receiving data instances sequentially Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 18/29

Learning with Different Protocol f (x n, y n) unknown target function f : X Y Active Learning: Learning by Asking Protocol Learning Philosophy batch: duck feeding online: passive sequential training examples D : (x 1, y 1 ),, (x N, y N ) learning algorithm A final hypothesis g f hypothesis set H Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 19/29

Learning with Different Protocol f (x n, y n) unknown target function f : X Y Active Learning: Learning by Asking Protocol Learning Philosophy batch: duck feeding online: passive sequential active: question asking (sequentially) query the y n of the chosen x n training examples D : (x 1, y 1 ),, (x N, y N ) learning algorithm A final hypothesis g f hypothesis set H Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 19/29

Learning with Different Protocol f (x n, y n) unknown target function f : X Y Active Learning: Learning by Asking Protocol Learning Philosophy batch: duck feeding online: passive sequential active: question asking (sequentially) query the y n of the chosen x n training examples D : (x 1, y 1 ),, (x N, y N ) learning algorithm A final hypothesis g f hypothesis set H active: improve hypothesis with fewer labels (hopefully) by asking questions strategically Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 19/29

Learning with Different Protocol f (x n, y n) Mini Summary unknown target function f : X Y Learning with Different Protocol f (x n, y n ) batch: all known data online: sequential (passive) data active: strategically-observed data... and more!! training examples D : (x 1, y 1 ),, (x N, y N ) learning algorithm A final hypothesis g f hypothesis set H core protocol: batch Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 20/29

Learning with Different Protocol f (x n, y n) Fun Time What is this learning problem? A photographer has 100, 000 pictures, each containing one baseball player. He wants to automatically categorize the pictures by its player inside. He starts by categorizing 1, 000 pictures by himself, and then writes an algorithm that tries to categorize the other pictures if it is confident on the category while pausing for (& learning from) human input if not. What protocol best describes the nature of the algorithm? 1 batch 2 online 3 active 4 random Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/29

Learning with Different Protocol f (x n, y n) Fun Time What is this learning problem? A photographer has 100, 000 pictures, each containing one baseball player. He wants to automatically categorize the pictures by its player inside. He starts by categorizing 1, 000 pictures by himself, and then writes an algorithm that tries to categorize the other pictures if it is confident on the category while pausing for (& learning from) human input if not. What protocol best describes the nature of the algorithm? 1 batch 2 online 3 active 4 random Reference Answer: 3 The algorithm takes a active but naïve strategy: ask when confused. You should probably do the same when taking a class. :-) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/29

unknown target function f : X Y (ideal credit approval formula) Learning with Different Input Space X Credit Approval Problem Revisited age 23 years gender female annual salary NTD 1,000,000 year in residence 1 year year in job 0.5 year current debt 200,000 training examples D : (x 1, y 1 ),, (x N, y N ) (historical records in bank) learning algorithm A final hypothesis g f ( learned formula to be used) hypothesis set H (set of candidate formula) concrete features: each dimension of X R d represents sophisticated physical meaning Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/29

Learning with Different Input Space X More on Concrete Features (size, mass) for coin classification customer info for credit approval patient info for cancer diagnosis Mass 5 25 often including human intelligence on the learning task 10 1 Size concrete features: the easy ones for ML Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/29

Learning with Different Input Space X Raw Features: Digit Recognition Problem (1/2) digit recognition problem: features meaning of digit a typical supervised multiclass classification problem Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/29

Learning with Different Input Space X Raw Features: Digit Recognition Problem (2/2) by Concrete Features x =(symmetry, density) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 25/29

Learning with Different Input Space X Raw Features: Digit Recognition Problem (2/2) by Concrete Features by Raw Features 16 by 16 gray image x (0, 0, 0.9, 0.6, ) R 256 simple physical meaning ; thus more difficult for ML than concrete features x =(symmetry, density) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 25/29

Learning with Different Input Space X Raw Features: Digit Recognition Problem (2/2) by Concrete Features by Raw Features 16 by 16 gray image x (0, 0, 0.9, 0.6, ) R 256 simple physical meaning ; thus more difficult for ML than concrete features x =(symmetry, density) Other Problems with Raw Features image pixels, speech signal, etc. Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 25/29

Learning with Different Input Space X Raw Features: Digit Recognition Problem (2/2) by Concrete Features by Raw Features 16 by 16 gray image x (0, 0, 0.9, 0.6, ) R 256 simple physical meaning ; thus more difficult for ML than concrete features x =(symmetry, density) Other Problems with Raw Features image pixels, speech signal, etc. raw features: often need human or machines to convert to concrete ones Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 25/29

Learning with Different Input Space X Abstract Features: Rating Prediction Problem Rating Prediction Problem (KDDCup 2011) given previous (userid, itemid, rating) tuples, predict the rating that some userid would give to itemid? a regression problem with Y R as rating and X N N as (userid, itemid) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 26/29

Learning with Different Input Space X Abstract Features: Rating Prediction Problem Rating Prediction Problem (KDDCup 2011) given previous (userid, itemid, rating) tuples, predict the rating that some userid would give to itemid? a regression problem with Y R as rating and X N N as (userid, itemid) no physical meaning ; thus even more difficult for ML Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 26/29

Learning with Different Input Space X Abstract Features: Rating Prediction Problem Rating Prediction Problem (KDDCup 2011) given previous (userid, itemid, rating) tuples, predict the rating that some userid would give to itemid? a regression problem with Y R as rating and X N N as (userid, itemid) no physical meaning ; thus even more difficult for ML Other Problems with Abstract Features student ID in online tutoring system (KDDCup 2010) advertisement ID in online ad system Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 26/29

Learning with Different Input Space X Abstract Features: Rating Prediction Problem Rating Prediction Problem (KDDCup 2011) given previous (userid, itemid, rating) tuples, predict the rating that some userid would give to itemid? a regression problem with Y R as rating and X N N as (userid, itemid) no physical meaning ; thus even more difficult for ML Other Problems with Abstract Features student ID in online tutoring system (KDDCup 2010) advertisement ID in online ad system abstract: again need feature conversion/extraction/construction Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 26/29

Learning with Different Input Space X Mini Summary unknown target function f : X Y Learning with Different Input Space X concrete: sophisticated (and related) physical meaning raw: simple physical meaning abstract: no (or little) physical meaning... and more!! training examples D : (x 1, y 1 ),, (x N, y N ) learning algorithm A final hypothesis g f hypothesis set H easy input: concrete Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 27/29

Learning with Different Input Space X Fun Time What features can be used? Consider a problem of building an online image advertisement system that shows the users the most relevant images. What features can you choose to use? 1 concrete 2 concrete, raw 3 concrete, abstract 4 concrete, raw, abstract Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 28/29

Learning with Different Input Space X Fun Time What features can be used? Consider a problem of building an online image advertisement system that shows the users the most relevant images. What features can you choose to use? 1 concrete 2 concrete, raw 3 concrete, abstract 4 concrete, raw, abstract Reference Answer: 4 concrete user features, raw image features, and maybe abstract user/image IDs Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 28/29

Learning with Different Input Space X Summary 1 When Can Machines Learn? Lecture 2: Learning to Answer Yes/No Lecture 3: Types of Learning Learning with Different Output Space Y [classification], [regression], structured Learning with Different Data Label y n [supervised], un/semi-supervised, reinforcement Learning with Different Protocol f (x n, y n ) [batch], online, active Learning with Different Input Space X [concrete], raw, abstract next: learning is impossible?! 2 Why Can Machines Learn? 3 How Can Machines Learn? 4 How Can Machines Learn Better? Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 29/29