Class Overview and General Introduction to Machine Learning

Similar documents
Lecture 1: Machine Learning Basics

Python Machine Learning

(Sub)Gradient Descent

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS Machine Learning

CSL465/603 - Machine Learning

CS 446: Machine Learning

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Assignment 1: Predicting Amazon Review Ratings

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Lecture 1: Basic Concepts of Machine Learning

Laboratorio di Intelligenza Artificiale e Robotica

arxiv: v2 [cs.cv] 30 Mar 2017

Probabilistic Latent Semantic Analysis

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Axiom 2013 Team Description Paper

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

The taming of the data:

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

A Case Study: News Classification Based on Term Frequency

Rule Learning With Negation: Issues Regarding Effectiveness

Lecture 10: Reinforcement Learning

Learning Methods in Multilingual Speech Recognition

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Switchboard Language Model Improvement with Conversational Data from Gigaword

Artificial Neural Networks written examination

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Semi-Supervised Face Detection

Data Structures and Algorithms

Laboratorio di Intelligenza Artificiale e Robotica

Australian Journal of Basic and Applied Sciences

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Reinforcement Learning by Comparing Immediate Reward

Probability and Statistics Curriculum Pacing Guide

Comment-based Multi-View Clustering of Web 2.0 Items

Rule Learning with Negation: Issues Regarding Effectiveness

Online Updating of Word Representations for Part-of-Speech Tagging

Integrating simulation into the engineering curriculum: a case study

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Learning From the Past with Experiment Databases

A survey of multi-view machine learning

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Linking Task: Identifying authors and book titles in verbose queries

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

LEGO MINDSTORMS Education EV3 Coding Activities

Calibration of Confidence Measures in Speech Recognition

Statewide Framework Document for:

Speech Emotion Recognition Using Support Vector Machine

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

A Context-Driven Use Case Creation Process for Specifying Automotive Driver Assistance Systems

South Carolina College- and Career-Ready Standards for Mathematics. Standards Unpacking Documents Grade 5

Disambiguation of Thai Personal Name from Online News Articles

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Self Study Report Computer Science

Using focal point learning to improve human machine tacit coordination

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Introduction to Causal Inference. Problem Set 1. Required Problems

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Functional Skills Mathematics Level 2 assessment

Mining Student Evolution Using Associative Classification and Clustering

Beyond the Pipeline: Discrete Optimization in NLP

Circuit Simulators: A Revolutionary E-Learning Platform

Why Did My Detector Do That?!

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Agents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators

Lecture 6: Applications

Attributed Social Network Embedding

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

Mathematics Success Level E

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Mathematics process categories

An investigation of imitation learning algorithms for structured prediction

Interactive Whiteboard

UNIT ONE Tools of Algebra

Backwards Numbers: A Study of Place Value. Catherine Perez

Multivariate k-nearest Neighbor Regression for Time Series data -

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

A study of speaker adaptation for DNN-based speech synthesis

The stages of event extraction

Mathematics Success Grade 7

Using dialogue context to improve parsing performance in dialogue systems

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown

Chapter 2 Rule Learning in a Nutshell

WHEN THERE IS A mismatch between the acoustic

Transcription:

Class Overview and General Introduction to Machine Learning Piyush Rai www.cs.utah.edu/~piyush CS5350/6350: Machine Learning August 23, 2011 (CS5350/6350) Intro to ML August 23, 2011 1 / 25

What is Machine Learning? Machine Learning: Designing algorithms that can learn patterns from data (and exploit them) Approach: human supplies training examples, the machine learns (CS5350/6350) Intro to ML August 23, 2011 4 / 25

What is Machine Learning? Machine Learning: Designing algorithms that can learn patterns from data (and exploit them) Approach: human supplies training examples, the machine learns Example: Show the machine a bunch of spam and legitimate emails and let it learn to predict if a new email is spam or not (CS5350/6350) Intro to ML August 23, 2011 4 / 25

What is Machine Learning? Machine Learning: Designing algorithms that can learn patterns from data (and exploit them) Approach: human supplies training examples, the machine learns Example: Show the machine a bunch of spam and legitimate emails and let it learn to predict if a new email is spam or not Machine Learning primarily uses the statistically motivated approach No hand-crafted rules - subtle pattern nuances are often be difficult to specify (CS5350/6350) Intro to ML August 23, 2011 4 / 25

What is Machine Learning? Machine Learning: Designing algorithms that can learn patterns from data (and exploit them) Approach: human supplies training examples, the machine learns Example: Show the machine a bunch of spam and legitimate emails and let it learn to predict if a new email is spam or not Machine Learning primarily uses the statistically motivated approach No hand-crafted rules - subtle pattern nuances are often be difficult to specify Instead, let the machine figure out the rules on its own by looking at data.. by building statistical models of the data (CS5350/6350) Intro to ML August 23, 2011 4 / 25

What is Machine Learning? Machine Learning: Designing algorithms that can learn patterns from data (and exploit them) Approach: human supplies training examples, the machine learns Example: Show the machine a bunch of spam and legitimate emails and let it learn to predict if a new email is spam or not Machine Learning primarily uses the statistically motivated approach No hand-crafted rules - subtle pattern nuances are often be difficult to specify Instead, let the machine figure out the rules on its own by looking at data.. by building statistical models of the data The statistical model helps uncover the process which generated the data (CS5350/6350) Intro to ML August 23, 2011 4 / 25

What is Machine Learning? Machine Learning: Designing algorithms that can learn patterns from data (and exploit them) Approach: human supplies training examples, the machine learns Example: Show the machine a bunch of spam and legitimate emails and let it learn to predict if a new email is spam or not Machine Learning primarily uses the statistically motivated approach No hand-crafted rules - subtle pattern nuances are often be difficult to specify Instead, let the machine figure out the rules on its own by looking at data.. by building statistical models of the data The statistical model helps uncover the process which generated the data Desirable Property: Generalization The model shouldn t overfit on the training data It should generalize well on unseen (future) test data (CS5350/6350) Intro to ML August 23, 2011 4 / 25

Generalization (Pictorially) Pictures below: The X axis is the input. The Y axis is the response. (CS5350/6350) Intro to ML August 23, 2011 5 / 25

Generalization (Pictorially) Pictures below: The X axis is the input. The Y axis is the response. Which of the four red curves fits the data (blue dots) best? (CS5350/6350) Intro to ML August 23, 2011 5 / 25

Generalization (Pictorially) Pictures below: The X axis is the input. The Y axis is the response. Which of the four red curves fits the data (blue dots) best? Which curve is expected to generalize the best? (CS5350/6350) Intro to ML August 23, 2011 5 / 25

Generalization (Pictorially) Pictures below: The X axis is the input. The Y axis is the response. Which of the four red curves fits the data (blue dots) best? Which curve is expected to generalize the best? Are they both the same? If yes, why? If no, why not? (CS5350/6350) Intro to ML August 23, 2011 5 / 25

Generalization (Pictorially) Pictures below: The X axis is the input. The Y axis is the response. Which of the four red curves fits the data (blue dots) best? Which curve is expected to generalize the best? Are they both the same? If yes, why? If no, why not? Lesson: Simple models should be preferred over complicated models Simple models can prevent overfitting (CS5350/6350) Intro to ML August 23, 2011 5 / 25

Generalization (Pictorially) Pictures below: The X axis is the input. The Y axis is the response. Which of the four red curves fits the data (blue dots) best? Which curve is expected to generalize the best? Are they both the same? If yes, why? If no, why not? Lesson: Simple models should be preferred over complicated models Simple models can prevent overfitting Caution: Too simple a model can underfit (e.g., M = 0 above) (CS5350/6350) Intro to ML August 23, 2011 5 / 25

Generalization (Pictorially) Pictures below: The X axis is the input. The Y axis is the response. Which of the four red curves fits the data (blue dots) best? Which curve is expected to generalize the best? Are they both the same? If yes, why? If no, why not? Lesson: Simple models should be preferred over complicated models Simple models can prevent overfitting Caution: Too simple a model can underfit (e.g., M = 0 above) General guideline: Choose a model not-too-simple, yet not-too-complex (CS5350/6350) Intro to ML August 23, 2011 5 / 25

Machine Learning in the real-world Broadly applicable in many domains (e.g., finance, robotics, bioinformatics, vision, natural language, etc.). Some applications: Spam filtering (CS5350/6350) Intro to ML August 23, 2011 6 / 25

Machine Learning in the real-world Broadly applicable in many domains (e.g., finance, robotics, bioinformatics, vision, natural language, etc.). Some applications: Spam filtering Speech/handwriting recognition (CS5350/6350) Intro to ML August 23, 2011 6 / 25

Machine Learning in the real-world Broadly applicable in many domains (e.g., finance, robotics, bioinformatics, vision, natural language, etc.). Some applications: Spam filtering Speech/handwriting recognition Object detection/recognition (CS5350/6350) Intro to ML August 23, 2011 6 / 25

Machine Learning in the real-world Broadly applicable in many domains (e.g., finance, robotics, bioinformatics, vision, natural language, etc.). Some applications: Spam filtering Speech/handwriting recognition Object detection/recognition Weather prediction (CS5350/6350) Intro to ML August 23, 2011 6 / 25

Machine Learning in the real-world Broadly applicable in many domains (e.g., finance, robotics, bioinformatics, vision, natural language, etc.). Some applications: Spam filtering Speech/handwriting recognition Object detection/recognition Weather prediction Stock market analysis (CS5350/6350) Intro to ML August 23, 2011 6 / 25

Machine Learning in the real-world Broadly applicable in many domains (e.g., finance, robotics, bioinformatics, vision, natural language, etc.). Some applications: Spam filtering Speech/handwriting recognition Object detection/recognition Weather prediction Stock market analysis Search engines (e.g, Google) (CS5350/6350) Intro to ML August 23, 2011 6 / 25

Machine Learning in the real-world Broadly applicable in many domains (e.g., finance, robotics, bioinformatics, vision, natural language, etc.). Some applications: Spam filtering Speech/handwriting recognition Object detection/recognition Weather prediction Stock market analysis Search engines (e.g, Google) Ad placement on websites (CS5350/6350) Intro to ML August 23, 2011 6 / 25

Machine Learning in the real-world Broadly applicable in many domains (e.g., finance, robotics, bioinformatics, vision, natural language, etc.). Some applications: Spam filtering Speech/handwriting recognition Object detection/recognition Weather prediction Stock market analysis Search engines (e.g, Google) Ad placement on websites Adaptive website design (CS5350/6350) Intro to ML August 23, 2011 6 / 25

Machine Learning in the real-world Broadly applicable in many domains (e.g., finance, robotics, bioinformatics, vision, natural language, etc.). Some applications: Spam filtering Speech/handwriting recognition Object detection/recognition Weather prediction Stock market analysis Search engines (e.g, Google) Ad placement on websites Adaptive website design Credit-card fraud detection (CS5350/6350) Intro to ML August 23, 2011 6 / 25

Machine Learning in the real-world Broadly applicable in many domains (e.g., finance, robotics, bioinformatics, vision, natural language, etc.). Some applications: Spam filtering Speech/handwriting recognition Object detection/recognition Weather prediction Stock market analysis Search engines (e.g, Google) Ad placement on websites Adaptive website design Credit-card fraud detection Webpage clustering (e.g., Google News) (CS5350/6350) Intro to ML August 23, 2011 6 / 25

Machine Learning in the real-world Broadly applicable in many domains (e.g., finance, robotics, bioinformatics, vision, natural language, etc.). Some applications: Spam filtering Speech/handwriting recognition Object detection/recognition Weather prediction Stock market analysis Search engines (e.g, Google) Ad placement on websites Adaptive website design Credit-card fraud detection Webpage clustering (e.g., Google News) Machine Translation (e.g., Google Translate) (CS5350/6350) Intro to ML August 23, 2011 6 / 25

Machine Learning in the real-world Broadly applicable in many domains (e.g., finance, robotics, bioinformatics, vision, natural language, etc.). Some applications: Spam filtering Speech/handwriting recognition Object detection/recognition Weather prediction Stock market analysis Search engines (e.g, Google) Ad placement on websites Adaptive website design Credit-card fraud detection Webpage clustering (e.g., Google News) Machine Translation (e.g., Google Translate) Recommendation systems (e.g., Netflix, Amazon) (CS5350/6350) Intro to ML August 23, 2011 6 / 25

Machine Learning in the real-world Broadly applicable in many domains (e.g., finance, robotics, bioinformatics, vision, natural language, etc.). Some applications: Spam filtering Speech/handwriting recognition Object detection/recognition Weather prediction Stock market analysis Search engines (e.g, Google) Ad placement on websites Adaptive website design Credit-card fraud detection Webpage clustering (e.g., Google News) Machine Translation (e.g., Google Translate) Recommendation systems (e.g., Netflix, Amazon) Classifying DNA sequences (CS5350/6350) Intro to ML August 23, 2011 6 / 25

Machine Learning in the real-world Broadly applicable in many domains (e.g., finance, robotics, bioinformatics, vision, natural language, etc.). Some applications: Spam filtering Speech/handwriting recognition Object detection/recognition Weather prediction Stock market analysis Search engines (e.g, Google) Ad placement on websites Adaptive website design Credit-card fraud detection Webpage clustering (e.g., Google News) Machine Translation (e.g., Google Translate) Recommendation systems (e.g., Netflix, Amazon) Classifying DNA sequences Automatic vehicle navigation (CS5350/6350) Intro to ML August 23, 2011 6 / 25

Machine Learning in the real-world Broadly applicable in many domains (e.g., finance, robotics, bioinformatics, vision, natural language, etc.). Some applications: Spam filtering Speech/handwriting recognition Object detection/recognition Weather prediction Stock market analysis Search engines (e.g, Google) Ad placement on websites Adaptive website design Credit-card fraud detection Webpage clustering (e.g., Google News) Machine Translation (e.g., Google Translate) Recommendation systems (e.g., Netflix, Amazon) Classifying DNA sequences Automatic vehicle navigation Performance tuning of computer systems (CS5350/6350) Intro to ML August 23, 2011 6 / 25

Machine Learning in the real-world Broadly applicable in many domains (e.g., finance, robotics, bioinformatics, vision, natural language, etc.). Some applications: Spam filtering Speech/handwriting recognition Object detection/recognition Weather prediction Stock market analysis Search engines (e.g, Google) Ad placement on websites Adaptive website design Credit-card fraud detection Webpage clustering (e.g., Google News) Machine Translation (e.g., Google Translate) Recommendation systems (e.g., Netflix, Amazon) Classifying DNA sequences Automatic vehicle navigation Performance tuning of computer systems Predicting good compilation flags for programs (CS5350/6350) Intro to ML August 23, 2011 6 / 25

Machine Learning in the real-world Broadly applicable in many domains (e.g., finance, robotics, bioinformatics, vision, natural language, etc.). Some applications: Spam filtering Speech/handwriting recognition Object detection/recognition Weather prediction Stock market analysis Search engines (e.g, Google) Ad placement on websites Adaptive website design Credit-card fraud detection Webpage clustering (e.g., Google News) Machine Translation (e.g., Google Translate) Recommendation systems (e.g., Netflix, Amazon) Classifying DNA sequences Automatic vehicle navigation Performance tuning of computer systems Predicting good compilation flags for programs.. and many more (CS5350/6350) Intro to ML August 23, 2011 6 / 25

Machine Learning in the real-world Broadly applicable in many domains (e.g., finance, robotics, bioinformatics, vision, natural language, etc.). Some applications: Spam filtering Speech/handwriting recognition Object detection/recognition Weather prediction Stock market analysis Search engines (e.g, Google) Ad placement on websites Adaptive website design Credit-card fraud detection Webpage clustering (e.g., Google News) Machine Translation (e.g., Google Translate) Recommendation systems (e.g., Netflix, Amazon) Classifying DNA sequences Automatic vehicle navigation Performance tuning of computer systems Predicting good compilation flags for programs.. and many more 12 IT skills that employers can t say no to (Machine Learning is #1) http://www.computerworld.com/s/article/9026623/12_it_skills_that_employers_can_t_say_no_to_ (CS5350/6350) Intro to ML August 23, 2011 6 / 25

Major Machine Learning Paradigms Nomenclature: x denotes an input/example/instance, y denotes a response/output/label/prediction Supervised Learning: learning with a teacher (CS5350/6350) Intro to ML August 23, 2011 7 / 25

Major Machine Learning Paradigms Nomenclature: x denotes an input/example/instance, y denotes a response/output/label/prediction Supervised Learning: learning with a teacher Given: N labeled training examples {(x 1,y 1 ),...,(x N,y N )} (CS5350/6350) Intro to ML August 23, 2011 7 / 25

Major Machine Learning Paradigms Nomenclature: x denotes an input/example/instance, y denotes a response/output/label/prediction Supervised Learning: learning with a teacher Given: N labeled training examples {(x 1,y 1 ),...,(x N,y N )} Goal: learn mapping f that predicts label y for a test example x (CS5350/6350) Intro to ML August 23, 2011 7 / 25

Major Machine Learning Paradigms Nomenclature: x denotes an input/example/instance, y denotes a response/output/label/prediction Supervised Learning: learning with a teacher Given: N labeled training examples {(x 1,y 1 ),...,(x N,y N )} Goal: learn mapping f that predicts label y for a test example x Example: Spam classification, webpage categorization (CS5350/6350) Intro to ML August 23, 2011 7 / 25

Major Machine Learning Paradigms Nomenclature: x denotes an input/example/instance, y denotes a response/output/label/prediction Supervised Learning: learning with a teacher Given: N labeled training examples {(x 1,y 1 ),...,(x N,y N )} Goal: learn mapping f that predicts label y for a test example x Example: Spam classification, webpage categorization Unsupervised Learning: learning without a teacher (CS5350/6350) Intro to ML August 23, 2011 7 / 25

Major Machine Learning Paradigms Nomenclature: x denotes an input/example/instance, y denotes a response/output/label/prediction Supervised Learning: learning with a teacher Given: N labeled training examples {(x 1,y 1 ),...,(x N,y N )} Goal: learn mapping f that predicts label y for a test example x Example: Spam classification, webpage categorization Unsupervised Learning: learning without a teacher Given: a set of N unlabeled inputs {x 1,...,x N } (CS5350/6350) Intro to ML August 23, 2011 7 / 25

Major Machine Learning Paradigms Nomenclature: x denotes an input/example/instance, y denotes a response/output/label/prediction Supervised Learning: learning with a teacher Given: N labeled training examples {(x 1,y 1 ),...,(x N,y N )} Goal: learn mapping f that predicts label y for a test example x Example: Spam classification, webpage categorization Unsupervised Learning: learning without a teacher Given: a set of N unlabeled inputs {x 1,...,x N } Goal: learn some intrinsic structure in the inputs (e.g., groups/clusters) (CS5350/6350) Intro to ML August 23, 2011 7 / 25

Major Machine Learning Paradigms Nomenclature: x denotes an input/example/instance, y denotes a response/output/label/prediction Supervised Learning: learning with a teacher Given: N labeled training examples {(x 1,y 1 ),...,(x N,y N )} Goal: learn mapping f that predicts label y for a test example x Example: Spam classification, webpage categorization Unsupervised Learning: learning without a teacher Given: a set of N unlabeled inputs {x 1,...,x N } Goal: learn some intrinsic structure in the inputs (e.g., groups/clusters) Example: Automatically grouping news stories (Google News) (CS5350/6350) Intro to ML August 23, 2011 7 / 25

Major Machine Learning Paradigms Nomenclature: x denotes an input/example/instance, y denotes a response/output/label/prediction Supervised Learning: learning with a teacher Given: N labeled training examples {(x 1,y 1 ),...,(x N,y N )} Goal: learn mapping f that predicts label y for a test example x Example: Spam classification, webpage categorization Unsupervised Learning: learning without a teacher Given: a set of N unlabeled inputs {x 1,...,x N } Goal: learn some intrinsic structure in the inputs (e.g., groups/clusters) Example: Automatically grouping news stories (Google News) Reinforcement Learning: learning by interacting (CS5350/6350) Intro to ML August 23, 2011 7 / 25

Major Machine Learning Paradigms Nomenclature: x denotes an input/example/instance, y denotes a response/output/label/prediction Supervised Learning: learning with a teacher Given: N labeled training examples {(x 1,y 1 ),...,(x N,y N )} Goal: learn mapping f that predicts label y for a test example x Example: Spam classification, webpage categorization Unsupervised Learning: learning without a teacher Given: a set of N unlabeled inputs {x 1,...,x N } Goal: learn some intrinsic structure in the inputs (e.g., groups/clusters) Example: Automatically grouping news stories (Google News) Reinforcement Learning: learning by interacting Given: an agent acting in an environment (having a set of states) (CS5350/6350) Intro to ML August 23, 2011 7 / 25

Major Machine Learning Paradigms Nomenclature: x denotes an input/example/instance, y denotes a response/output/label/prediction Supervised Learning: learning with a teacher Given: N labeled training examples {(x 1,y 1 ),...,(x N,y N )} Goal: learn mapping f that predicts label y for a test example x Example: Spam classification, webpage categorization Unsupervised Learning: learning without a teacher Given: a set of N unlabeled inputs {x 1,...,x N } Goal: learn some intrinsic structure in the inputs (e.g., groups/clusters) Example: Automatically grouping news stories (Google News) Reinforcement Learning: learning by interacting Given: an agent acting in an environment (having a set of states) Goal: learn a policy (state to action mapping) that maximizes agent s reward (CS5350/6350) Intro to ML August 23, 2011 7 / 25

Major Machine Learning Paradigms Nomenclature: x denotes an input/example/instance, y denotes a response/output/label/prediction Supervised Learning: learning with a teacher Given: N labeled training examples {(x 1,y 1 ),...,(x N,y N )} Goal: learn mapping f that predicts label y for a test example x Example: Spam classification, webpage categorization Unsupervised Learning: learning without a teacher Given: a set of N unlabeled inputs {x 1,...,x N } Goal: learn some intrinsic structure in the inputs (e.g., groups/clusters) Example: Automatically grouping news stories (Google News) Reinforcement Learning: learning by interacting Given: an agent acting in an environment (having a set of states) Goal: learn a policy (state to action mapping) that maximizes agent s reward Example: Automatic vehicle navigation, (computer) learning to play Chess (CS5350/6350) Intro to ML August 23, 2011 7 / 25

Supervised Learning Given: N labeled training examples {(x 1,y 1 ),...,(x N,y N )} Goal: learn a model that predicts the label y for a test example x (CS5350/6350) Intro to ML August 23, 2011 8 / 25

Supervised Learning Given: N labeled training examples {(x 1,y 1 ),...,(x N,y N )} Goal: learn a model that predicts the label y for a test example x Assumption: The training and the test examples are drawn from the same data distribution (CS5350/6350) Intro to ML August 23, 2011 8 / 25

Supervised Learning Given: N labeled training examples {(x 1,y 1 ),...,(x N,y N )} Goal: learn a model that predicts the label y for a test example x Assumption: The training and the test examples are drawn from the same data distribution Things to keep in mind: No single learning algorithm is universally good ( no free lunch ) Different learning algorithms work with different assumptions Generalization is particularly important for supervised learning (CS5350/6350) Intro to ML August 23, 2011 8 / 25

Supervised Learning: Problem Settings f : x y Classification: when y is a discrete variable Discrete variable: takes a value from a discrete set y {1,...,K} Example: Category of a webpage (sports, politics, business, science, etc.) Regression: when y is a real-valued variable Example: Price of a stock (CS5350/6350) Intro to ML August 23, 2011 9 / 25

Supervised Learning: Classification Problem Types: Binary Classification: y is binary (two classes: 0/1 or -1/+1) Example: Spam Filtering (tell whether this email is spam or legitimate) (CS5350/6350) Intro to ML August 23, 2011 10 / 25

Supervised Learning: Classification Problem Types: Binary Classification: y is binary (two classes: 0/1 or -1/+1) Example: Spam Filtering (tell whether this email is spam or legitimate) Multi-class Classification: y is discrete with one of K > 2 possible values Example: Predicting your CS5350 grade (e.g., A, A, B+, B, B, other) (CS5350/6350) Intro to ML August 23, 2011 10 / 25

Supervised Learning: Classification Problem Types: Binary Classification: y is binary (two classes: 0/1 or -1/+1) Example: Spam Filtering (tell whether this email is spam or legitimate) Multi-class Classification: y is discrete with one of K > 2 possible values Example: Predicting your CS5350 grade (e.g., A, A, B+, B, B, other) Multi-label Classification: When y is a vector of discrete variables Each input x has multiple labels Each element of y is one label (individual labels can be binary/multi-class) Example: Image annotation (each image can have multiple labels) (CS5350/6350) Intro to ML August 23, 2011 10 / 25

Supervised Learning: Classification Problem Types: Binary Classification: y is binary (two classes: 0/1 or -1/+1) Example: Spam Filtering (tell whether this email is spam or legitimate) Multi-class Classification: y is discrete with one of K > 2 possible values Example: Predicting your CS5350 grade (e.g., A, A, B+, B, B, other) Multi-label Classification: When y is a vector of discrete variables Each input x has multiple labels Each element of y is one label (individual labels can be binary/multi-class) Example: Image annotation (each image can have multiple labels) Structured Prediction: When y is a vector with a structure Elements of y are not independent but related to each-other Example: Predicting parts-of-speech (POS) tags for a sentence (CS5350/6350) Intro to ML August 23, 2011 10 / 25

Supervised Learning: Regression Problem Types: Univariate Regression: y is a single real-valued number Example: Predicting the future price of a stock (CS5350/6350) Intro to ML August 23, 2011 11 / 25

Supervised Learning: Regression Problem Types: Univariate Regression: y is a single real-valued number Example: Predicting the future price of a stock Multivariate Regression: y is a real-valued vector Each element of y tells the value of one response variable Example: Torque values in multiple joints of a robotic arm Akin to multi-label classification (CS5350/6350) Intro to ML August 23, 2011 11 / 25

Supervised Learning: Pictorially Classification is about finding separation boundaries (linear/non-linear): (CS5350/6350) Intro to ML August 23, 2011 12 / 25

Supervised Learning: Pictorially Classification is about finding separation boundaries (linear/non-linear): Regression is more like fitting a curve/surface to the data: (CS5350/6350) Intro to ML August 23, 2011 12 / 25

Unsupervised Learning Unsupervised Learning: learning without a teacher Given: a set of unlabeled inputs {x 1,...,x N } Goal: learn some intrinsic structure in the data Some Examples: Data Clustering, Dimensionality Reduction (CS5350/6350) Intro to ML August 23, 2011 13 / 25

Unsupervised Learning Unsupervised Learning: learning without a teacher Given: a set of unlabeled inputs {x 1,...,x N } Goal: learn some intrinsic structure in the data Some Examples: Data Clustering, Dimensionality Reduction Data Clustering Grouping a given set of inputs based on their similarities Example: clustering new stories based on their topics (e.g., Google News) (CS5350/6350) Intro to ML August 23, 2011 13 / 25

Unsupervised Learning Unsupervised Learning: learning without a teacher Given: a set of unlabeled inputs {x 1,...,x N } Goal: learn some intrinsic structure in the data Some Examples: Data Clustering, Dimensionality Reduction Data Clustering Grouping a given set of inputs based on their similarities Example: clustering new stories based on their topics (e.g., Google News) Clustering sometimes is also referred to as (probability) density estimation (CS5350/6350) Intro to ML August 23, 2011 13 / 25

Unsupervised Learning Unsupervised Learning: learning without a teacher Given: a set of unlabeled inputs {x 1,...,x N } Goal: learn some intrinsic structure in the data Some Examples: Data Clustering, Dimensionality Reduction Data Clustering Grouping a given set of inputs based on their similarities Example: clustering new stories based on their topics (e.g., Google News) Clustering sometimes is also referred to as (probability) density estimation Dimensionality Reduction Often, real-world data is high dimensional Reducing dimensionality helps in several ways (CS5350/6350) Intro to ML August 23, 2011 13 / 25

Unsupervised Learning Unsupervised Learning: learning without a teacher Given: a set of unlabeled inputs {x 1,...,x N } Goal: learn some intrinsic structure in the data Some Examples: Data Clustering, Dimensionality Reduction Data Clustering Grouping a given set of inputs based on their similarities Example: clustering new stories based on their topics (e.g., Google News) Clustering sometimes is also referred to as (probability) density estimation Dimensionality Reduction Often, real-world data is high dimensional Reducing dimensionality helps in several ways Computational benefits: speeding up learning algorithms (CS5350/6350) Intro to ML August 23, 2011 13 / 25

Unsupervised Learning Unsupervised Learning: learning without a teacher Given: a set of unlabeled inputs {x 1,...,x N } Goal: learn some intrinsic structure in the data Some Examples: Data Clustering, Dimensionality Reduction Data Clustering Grouping a given set of inputs based on their similarities Example: clustering new stories based on their topics (e.g., Google News) Clustering sometimes is also referred to as (probability) density estimation Dimensionality Reduction Often, real-world data is high dimensional Reducing dimensionality helps in several ways Computational benefits: speeding up learning algorithms Better input representations for supervised learning tasks (CS5350/6350) Intro to ML August 23, 2011 13 / 25

Unsupervised Learning Unsupervised Learning: learning without a teacher Given: a set of unlabeled inputs {x 1,...,x N } Goal: learn some intrinsic structure in the data Some Examples: Data Clustering, Dimensionality Reduction Data Clustering Grouping a given set of inputs based on their similarities Example: clustering new stories based on their topics (e.g., Google News) Clustering sometimes is also referred to as (probability) density estimation Dimensionality Reduction Often, real-world data is high dimensional Reducing dimensionality helps in several ways Computational benefits: speeding up learning algorithms Better input representations for supervised learning tasks Used for data visualization by reducing data to smaller dimensions (CS5350/6350) Intro to ML August 23, 2011 13 / 25

Unsupervised Learning: Data Clustering (CS5350/6350) Intro to ML August 23, 2011 14 / 25

Unsupervised Learning: Data Clustering (CS5350/6350) Intro to ML August 23, 2011 14 / 25

Unsupervised Learning: Data Clustering (CS5350/6350) Intro to ML August 23, 2011 14 / 25

Unsupervised Learning: Dimensionality Reduction Data high-dimensional in ambient space, but intrinsically lower dimensional 2-D data lying close to 1-D space (CS5350/6350) Intro to ML August 23, 2011 15 / 25

Unsupervised Learning: Dimensionality Reduction Data high-dimensional in ambient space, but intrinsically lower dimensional 2-D data lying close to 1-D space 3-D data living on a manifold, instrinsically 2-D (CS5350/6350) Intro to ML August 23, 2011 15 / 25

Reinforcement Learning Unlike supervised/unsupervised learning, RL does not recieve examples Rather, it learns (gathers experience) by interacting with the world (CS5350/6350) Intro to ML August 23, 2011 16 / 25

Reinforcement Learning Unlike supervised/unsupervised learning, RL does not recieve examples Rather, it learns (gathers experience) by interacting with the world Defined by an agent and an environment the agent acts in Agent has a set A of actions, environment has a set S of states (CS5350/6350) Intro to ML August 23, 2011 16 / 25

Reinforcement Learning Unlike supervised/unsupervised learning, RL does not recieve examples Rather, it learns (gathers experience) by interacting with the world Defined by an agent and an environment the agent acts in Agent has a set A of actions, environment has a set S of states Goal: Find a sequence of actions by the agent that maximizes its reward Output: A policy which maps states to actions (CS5350/6350) Intro to ML August 23, 2011 16 / 25

Reinforcement Learning Unlike supervised/unsupervised learning, RL does not recieve examples Rather, it learns (gathers experience) by interacting with the world Defined by an agent and an environment the agent acts in Agent has a set A of actions, environment has a set S of states Goal: Find a sequence of actions by the agent that maximizes its reward Output: A policy which maps states to actions RL problems always include time as a variable (CS5350/6350) Intro to ML August 23, 2011 16 / 25

Reinforcement Learning Unlike supervised/unsupervised learning, RL does not recieve examples Rather, it learns (gathers experience) by interacting with the world Defined by an agent and an environment the agent acts in Agent has a set A of actions, environment has a set S of states Goal: Find a sequence of actions by the agent that maximizes its reward Output: A policy which maps states to actions RL problems always include time as a variable Example problems: Chess, Robot control, autonomous driving In RL, the key trade-off is exploration versus exploitation (CS5350/6350) Intro to ML August 23, 2011 16 / 25

Other Paradigms: Semi-supervised Learning Supervised Learning requires labeled data (the more, the better!) Problem 1: Labeling is expensive (usually done by humans) Problem 2: Sometimes labels are really hard to get Speech-analysis: transcribing an hour of speech can take several hundred hours! (CS5350/6350) Intro to ML August 23, 2011 17 / 25

Other Paradigms: Semi-supervised Learning Supervised Learning requires labeled data (the more, the better!) Problem 1: Labeling is expensive (usually done by humans) Problem 2: Sometimes labels are really hard to get Speech-analysis: transcribing an hour of speech can take several hundred hours! How can we learn well even with small amounts of labeled data? (CS5350/6350) Intro to ML August 23, 2011 17 / 25

Other Paradigms: Semi-supervised Learning Supervised Learning requires labeled data (the more, the better!) Problem 1: Labeling is expensive (usually done by humans) Problem 2: Sometimes labels are really hard to get Speech-analysis: transcribing an hour of speech can take several hundred hours! How can we learn well even with small amounts of labeled data? One answer: Semi-supervised Learning Using small amount of labeled + plenty of (freely available) unlabeled data (CS5350/6350) Intro to ML August 23, 2011 17 / 25

Other Paradigms: Semi-supervised Learning Often unlabeled data can give a good idea about class separation One intuition: Class boundary is expected to lie in a low-density region Low density region: region that has very few examples (CS5350/6350) Intro to ML August 23, 2011 18 / 25

Other Paradigms: Active Learning Similar motivation as semi-supervised learning (saving data labeling cost) (CS5350/6350) Intro to ML August 23, 2011 19 / 25

Other Paradigms: Active Learning Similar motivation as semi-supervised learning (saving data labeling cost) Standard supervised learning is passive Learner has no choice for the data it has to learn from (CS5350/6350) Intro to ML August 23, 2011 19 / 25

Other Paradigms: Active Learning Similar motivation as semi-supervised learning (saving data labeling cost) Standard supervised learning is passive Learner has no choice for the data it has to learn from Not all labeled examples are really informative Spending labeling efforts on uninformative examples isn t really worth it (CS5350/6350) Intro to ML August 23, 2011 19 / 25

Other Paradigms: Active Learning Similar motivation as semi-supervised learning (saving data labeling cost) Standard supervised learning is passive Learner has no choice for the data it has to learn from Not all labeled examples are really informative Spending labeling efforts on uninformative examples isn t really worth it Active Learning: allows the learner to ask for specific labeled examples.. the ones it considers the most informative (CS5350/6350) Intro to ML August 23, 2011 19 / 25

Other Paradigms: Active Learning Similar motivation as semi-supervised learning (saving data labeling cost) Standard supervised learning is passive Learner has no choice for the data it has to learn from Not all labeled examples are really informative Spending labeling efforts on uninformative examples isn t really worth it Active Learning: allows the learner to ask for specific labeled examples.. the ones it considers the most informative Active Learning can lead to several benefits: Less labeled data needed to learn Better classifiers (CS5350/6350) Intro to ML August 23, 2011 19 / 25

Other Paradigms: Transfer Learning Let s assume we have two related learning tasks A and B Plenty of labeled training data for A : Can learn A well Little or no labeled data for B : Little or no hope of learning B (CS5350/6350) Intro to ML August 23, 2011 20 / 25

Other Paradigms: Transfer Learning Let s assume we have two related learning tasks A and B Plenty of labeled training data for A : Can learn A well Little or no labeled data for B : Little or no hope of learning B Transfer Learning: allows B to leverage the data from task A Under suitable task-relatedness assumptions, transfer learning may help (CS5350/6350) Intro to ML August 23, 2011 20 / 25

Other Paradigms: Transfer Learning Let s assume we have two related learning tasks A and B Plenty of labeled training data for A : Can learn A well Little or no labeled data for B : Little or no hope of learning B Transfer Learning: allows B to leverage the data from task A Under suitable task-relatedness assumptions, transfer learning may help Caution: Incorrect/inappropriate assumptions can hurt learning (CS5350/6350) Intro to ML August 23, 2011 20 / 25

Other Paradigms: Transfer Learning Let s assume we have two related learning tasks A and B Plenty of labeled training data for A : Can learn A well Little or no labeled data for B : Little or no hope of learning B Transfer Learning: allows B to leverage the data from task A Under suitable task-relatedness assumptions, transfer learning may help Caution: Incorrect/inappropriate assumptions can hurt learning Several variants/names of Transfer Learning Multitask Learning Domain Adaptation Co-variate Shift (CS5350/6350) Intro to ML August 23, 2011 20 / 25

Bayesian Learning Not really a different learning paradigm Rather, a way of doing machine learning (can be used for any learning paradigm - supervised, unsupervised, etc.) (CS5350/6350) Intro to ML August 23, 2011 21 / 25

Bayesian Learning Not really a different learning paradigm Rather, a way of doing machine learning (can be used for any learning paradigm - supervised, unsupervised, etc.) Most ML algorithms: Provide them data, get a model out of it No way to know how confident your model parameters are No way to know how confident your predictions are But in some problem domains, confidence estimates are important (CS5350/6350) Intro to ML August 23, 2011 21 / 25

Bayesian Learning Not really a different learning paradigm Rather, a way of doing machine learning (can be used for any learning paradigm - supervised, unsupervised, etc.) Most ML algorithms: Provide them data, get a model out of it No way to know how confident your model parameters are No way to know how confident your predictions are But in some problem domains, confidence estimates are important Bayesian Learning gives a way to quantify confidence/uncertainty By maintaining a probability distribution over the parameters/predictions So we also have mean and variance estimates of the parameters/predictions (CS5350/6350) Intro to ML August 23, 2011 21 / 25

Bayesian Learning Not really a different learning paradigm Rather, a way of doing machine learning (can be used for any learning paradigm - supervised, unsupervised, etc.) Most ML algorithms: Provide them data, get a model out of it No way to know how confident your model parameters are No way to know how confident your predictions are But in some problem domains, confidence estimates are important Bayesian Learning gives a way to quantify confidence/uncertainty By maintaining a probability distribution over the parameters/predictions So we also have mean and variance estimates of the parameters/predictions Another advantage: Incorporating prior knowledge about the problem, Bayesian methods can automatically control overfitting (and can learn well with small amounts of data) (CS5350/6350) Intro to ML August 23, 2011 21 / 25

Machine Learning vs Statistics Traditionally, Statistics mainly cares about fitting a model over the data Main focus is on explaining the data Issues such as generalization are typically ignored Note: There may be some exceptions ML focuses more on the prediction aspect (generalization is important) Although knowing about the data generating model can help prediction, such modeling can sometimes be expensive. ML therefore often goes easy on the modeling aspect and focuses directly on the prediction task Statistics traditionally does not focus much on computational issues Most ML algorithms nowadays consider the computational issues For some discussion, see: http://brenocon.com/blog/2008/12/statistics-vs-machine-learning-fight/ (CS5350/6350) Intro to ML August 23, 2011 22 / 25

Data Representation Data has form: {(x 1,y 1 ),...,(x N,y N )} (labeled), or {x 1,...,x N } (unlabeled) What the label y looks like is task-specific (as we saw) What about x which denotes a real-world object (e.g., image or text document)? (CS5350/6350) Intro to ML August 23, 2011 23 / 25

Data Representation Data has form: {(x 1,y 1 ),...,(x N,y N )} (labeled), or {x 1,...,x N } (unlabeled) What the label y looks like is task-specific (as we saw) What about x which denotes a real-world object (e.g., image or text document)? Each example x is a set of (numeric) features/attributes/dimensions Features encode properties of the object which x represents (CS5350/6350) Intro to ML August 23, 2011 23 / 25

Data Representation Data has form: {(x 1,y 1 ),...,(x N,y N )} (labeled), or {x 1,...,x N } (unlabeled) What the label y looks like is task-specific (as we saw) What about x which denotes a real-world object (e.g., image or text document)? Each example x is a set of (numeric) features/attributes/dimensions Features encode properties of the object which x represents x is commonly represented as a D 1 vector (CS5350/6350) Intro to ML August 23, 2011 23 / 25

Data Representation Data has form: {(x 1,y 1 ),...,(x N,y N )} (labeled), or {x 1,...,x N } (unlabeled) What the label y looks like is task-specific (as we saw) What about x which denotes a real-world object (e.g., image or text document)? Each example x is a set of (numeric) features/attributes/dimensions Features encode properties of the object which x represents x is commonly represented as a D 1 vector Representing a 28 28 image: x can be a 784 1 vector of pixel values (CS5350/6350) Intro to ML August 23, 2011 23 / 25

Data Representation Data has form: {(x 1,y 1 ),...,(x N,y N )} (labeled), or {x 1,...,x N } (unlabeled) What the label y looks like is task-specific (as we saw) What about x which denotes a real-world object (e.g., image or text document)? Each example x is a set of (numeric) features/attributes/dimensions Features encode properties of the object which x represents x is commonly represented as a D 1 vector Representing a 28 28 image: x can be a 784 1 vector of pixel values Representing a text document: x can be a vector of word-counts of words appearing in that document (CS5350/6350) Intro to ML August 23, 2011 23 / 25

Data Representation Data has form: {(x 1,y 1 ),...,(x N,y N )} (labeled), or {x 1,...,x N } (unlabeled) What the label y looks like is task-specific (as we saw) What about x which denotes a real-world object (e.g., image or text document)? Each example x is a set of (numeric) features/attributes/dimensions Features encode properties of the object which x represents x is commonly represented as a D 1 vector Representing a 28 28 image: x can be a 784 1 vector of pixel values Representing a text document: x can be a vector of word-counts of words appearing in that document For some problems, non-vectorial representations may be more appropriate (CS5350/6350) Intro to ML August 23, 2011 23 / 25

Some Notations R D denotes the set of all D 1 real-valued column vectors x R D denotes a D 1 real-valued column vector x T denotes the transpose of x, a 1 D row vector R N D denotes the set of all N D real-valued matrices X R N D denotes an N D real-valued matrix Supervised Learning: Often, we write {(x 1,y 1 ),...,(x N,y N )} as (X,Y) X is an N D matrix Each row of X denotes an example, each column denotes a feature x ij denotes the j-th feature of the i-th example Y is an N 1 vector. Row i denotes the label of the i-th example X = x 1.. x N Y = = y 1.. y N x 11 x 1D...... x N1 x ND (CS5350/6350) Intro to ML August 23, 2011 24 / 25

Next class.. Two supervised learning algorithms K-Nearest Neighbors Decision Trees Both based more on intuition and less on maths :) (CS5350/6350) Intro to ML August 23, 2011 25 / 25