CS 2750: Machine Learning. Introduction. Prof. Adriana Kovashka University of Pittsburgh January 5, 2017

Similar documents
Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CSL465/603 - Machine Learning

CS Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Lecture 1: Machine Learning Basics

(Sub)Gradient Descent

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Rule Learning With Negation: Issues Regarding Effectiveness

Laboratorio di Intelligenza Artificiale e Robotica

Laboratorio di Intelligenza Artificiale e Robotica

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Australian Journal of Basic and Applied Sciences

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Probabilistic Latent Semantic Analysis

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Generative models and adversarial training

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

The Evolution of Random Phenomena

Axiom 2013 Team Description Paper

A Case Study: News Classification Based on Term Frequency

Multi-Lingual Text Leveling

Lecture 1: Basic Concepts of Machine Learning

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Evolutive Neural Net Fuzzy Filtering: Basic Description

Rule Learning with Negation: Issues Regarding Effectiveness

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Exploration. CS : Deep Reinforcement Learning Sergey Levine

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Multivariate k-nearest Neighbor Regression for Time Series data -

CS 446: Machine Learning

Introduction to Simulation

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Course Content Concepts

Speech Recognition at ICSI: Broadcast News and beyond

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

COSI Meet the Majors Fall 17. Prof. Mitch Cherniack Undergraduate Advising Head (UAH), COSI Fall '17: Instructor COSI 29a

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Welcome to. ECML/PKDD 2004 Community meeting

Word Segmentation of Off-line Handwritten Documents

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Measurement. When Smaller Is Better. Activity:

Probability and Game Theory Course Syllabus

Machine Learning and Development Policy

Math 96: Intermediate Algebra in Context

Learning From the Past with Experiment Databases

Statewide Framework Document for:

Spring 2016 Stony Brook University Instructor: Dr. Paul Fodor

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

A Neural Network GUI Tested on Text-To-Phoneme Mapping

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Corrective Feedback and Persistent Learning for Information Extraction

Probability and Statistics Curriculum Pacing Guide

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Shockwheat. Statistics 1, Activity 1

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

School of Innovative Technologies and Engineering

Artificial Neural Networks written examination

Disambiguation of Thai Personal Name from Online News Articles

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Managerial Decision Making

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

INPE São José dos Campos

Using dialogue context to improve parsing performance in dialogue systems

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

arxiv: v2 [cs.cv] 30 Mar 2017

Pitching Accounts & Advertising Sales ADV /PR

Indian Institute of Technology, Kanpur

Office Hours: Mon & Fri 10:00-12:00. Course Description

Switchboard Language Model Improvement with Conversational Data from Gigaword

UNIT ONE Tools of Algebra

Reducing Features to Improve Bug Prediction

Assignment 1: Predicting Amazon Review Ratings

Carnegie Mellon University Department of Computer Science /615 - Database Applications C. Faloutsos & A. Pavlo, Spring 2014.

Truth Inference in Crowdsourcing: Is the Problem Solved?

Students Understanding of Graphical Vector Addition in One and Two Dimensions

Self Study Report Computer Science

Time series prediction

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Agents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Grade 6: Correlated to AGS Basic Math Skills

Foothill College Summer 2016

STA 225: Introductory Statistics (CT)

Issues in the Mining of Heart Failure Datasets

Linking Task: Identifying authors and book titles in verbose queries

Human Emotion Recognition From Speech

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Transcription:

CS 2750: Machine Learning Introduction Prof. Adriana Kovashka University of Pittsburgh January 5, 2017

About the Instructor Born 1985 in Sofia, Bulgaria Got BA in 2008 at Pomona College, CA (Computer Science & Media Studies) Got PhD in 2014 at University of Texas at Austin (Computer Vision)

Course Info Course website: http://people.cs.pitt.edu/~kovashka/cs2750_sp17/ Instructor: Adriana Kovashka (kovashka@cs.pitt.edu) Use "CS2750" at the beginning of your Subject Office: Sennott Square 5325 Office hours: Tue/Thu, 3:30pm - 5:30pm

TA Longhao Li (lol16@cs.pitt.edu) Office: Sennott Square 5802 Office hours: TBD Do the Doodle by the end of Friday: http://doodle.com/poll/gtaxphum9mutd74x Note: Longhao is out of the country until Jan. 16; please email him any questions

Textbooks Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006 Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. MIT Press, 2012 More resources available on course webpage Your notes from class are your best study material, slides are not complete with notes

Course Goals To learn the basic machine learning techniques, both from a theoretical and practical perspective To learn how to apply these techniques on toy problems To get experience with these techniques on a real-world problem

Policies and Schedule http://people.cs.pitt.edu/~kovashka/cs2750_sp17/

Should I take this class? It will be a lot of work! But you will learn a lot Some parts will be hard and require that you pay close attention! But I will have periodic ungraded pop quizzes to see how you re doing I will also pick on students randomly to answer questions Use instructor s and TA s office hours!!!

Questions?

Plan for Today Introductions What is machine learning? Example problems and tasks ML in a nutshell Challenges Measuring performance

Introductions What is your name? What one thing outside of school are you passionate about? Do you have any prior experience with machine learning? What do you hope to get out of this class? Every time you speak, please remind me your name

What is machine learning? Finding patterns and relationships in data We can apply these patterns to make useful predictions E.g. we can predict how much a user will like a movie, even though that user never rated that movie

Example machine learning tasks Netflix challenge Given lots of data about how users rated movies (training data) But we don t know how user i will rate movie j and want to predict that (test data)

Example machine learning tasks Spam or not? vs Slide credit: Dhruv Batra

Example machine learning tasks Weather prediction Temperature Slide credit: Carlos Guestrin

Example machine learning tasks Who will win <contest of your choice>?

Example machine learning tasks Machine translation Slide credit: Dhruv Batra, figure credit: Kevin Gimpel

Example machine learning tasks Speech recognition Slide credit: Carlos Guestrin

Example machine learning tasks Pose estimation Slide credit: Noah Snavely

Example machine learning tasks Face recognition Slide credit: Noah Snavely

Example machine learning tasks Image categorization Pizza Wine Stove Slide credit: Dhruv Batra

Example machine learning tasks Slide credit: Derek Hoiem Is it dangerous? Is it alive? How fast does it run? Does it have a tail? Is it soft? Can I poke with it?

Example machine learning tasks Attribute-based image retrieval Kovashka et al., WhittleSearch: Image Search with Relative Attribute Feedback, CVPR 2012

Example machine learning tasks Dating car photographs Lee et al., Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time, ICCV 2013

Example machine learning tasks Inferring visual persuasion Joo et al., Visual Persuasion: Inferring Communicative Intents of Images, CVPR 2014

Example machine learning tasks Answering questions about images Antol et al., VQA: Visual Question Answering, ICCV 2015

Example machine learning tasks What else? What are some problems in your area of research (or from your everyday life) that can be helped by machine learning?

ML in a Nutshell Tens of thousands of machine learning algorithms Decades of ML research oversimplified: Learn a mapping from input to output f: X Y X: emails, Y: {spam, notspam} Slide credit: Pedro Domingos

ML in a Nutshell y = f(x) output prediction function features Training: given a training set of labeled examples {(x 1,y 1 ),, (x N,y N )}, estimate the prediction function f by minimizing the prediction error on the training set Testing: apply f to a never before seen test example x and output the predicted value y = f(x) Slide credit: L. Lazebnik

ML in a Nutshell Apply a prediction function to a feature representation of the image to get the desired output: f( ) = apple f( ) = tomato f( ) = cow Slide credit: L. Lazebnik

Training Training Images ML in a Nutshell Features Training Labels Training Learned model Testing Test Image Features Learned model Prediction Slide credit: D. Hoiem and L. Lazebnik

Training vs Testing What do we want? High accuracy on training data? No, high accuracy on unseen/new/test data! Why is this tricky? Training data Features (x) and labels (y) used to learn mapping f Test data Features used to make a prediction Labels only used to see how well we ve learned f!!! Validation data Held-out set of the training data Can use both features and labels to tune parameters of the model we re learning

Why do we hope this would work? Statistical estimation view: x and y are random variables D = (x 1,y 1 ), (x 2,y 2 ),, (x N,y N ) ~ P(X,Y) Both training & testing data sampled IID from P(X,Y) IID: Independent and Identically Distributed Learn on training set, have some hope of generalizing to test set Adapted from Dhruv Batra

ML in a Nutshell Every machine learning algorithm has: Data representation (x, y) Problem representation Evaluation / objective function Optimization Adapted from Pedro Domingos

Data Representation Let s brainstorm what our X should be for various Y prediction tasks

Problem Representation Decision trees Sets of rules / Logic programs Instances Graphical models (Bayes/Markov nets) Neural networks Support vector machines Model ensembles Etc. Slide credit: Pedro Domingos

Evaluation / objective function Accuracy Precision and recall Squared error Likelihood Posterior probability Cost / Utility Margin Entropy K-L divergence Etc. Slide credit: Pedro Domingos

Optimization Discrete / combinatorial optimization E.g. graph algorithms Continuous optimization E.g. linear programming Adapted from Dhruv Batra, image from Wikipedia

Types of Learning Supervised learning Training data includes desired outputs Unsupervised learning Training data does not include desired outputs Weakly or Semi-supervised learning Training data includes a few desired outputs Reinforcement learning Rewards from sequence of actions Slide credit: Dhruv Batra

Supervised Learning Types of Prediction Tasks x Classification y Discrete x Regression y Continuous Unsupervised Learning x Clustering x' Discrete ID Adapted from Dhruv Batra x Dimensionality x' Continuous Reduction 40

Defining the Learning Task Improve on task, T, with respect to performance metric, P, based on experience, E. T: Categorize email messages as spam or legitimate P: Percentage of email messages correctly classified E: Database of emails, some with human-given labels T: Recognizing hand-written words P: Percentage of words correctly classified E: Database of human-labeled images of handwritten words T: Playing checkers P: Percentage of games won against an arbitrary opponent E: Playing practice games against itself T: Driving on four-lane highways using vision sensors P: Average distance traveled before a human-judged error E: A sequence of images and steering commands recorded while observing a human driver. Slide credit: Ray Mooney

Example of Solving a ML Problem Spam or not? vs Slide credit: Dhruv Batra

Intuition Spam Emails a lot of words like money free bank account Regular Emails word usage pattern is more spread out Slide credit: Dhruv Batra, Fei Sha

Simple strategy: Let s count! This is X This is Y = 1 оr 0? Adapted from Dhruv Batra, Fei Sha

Weigh counts and sum to get prediction Why these words? Adapted from Dhruv Batra, Fei Sha Where do the weights come from?

Klingon vs Mlingon Classification Training Data Klingon: klix, kour, koop Mlingon: moo, maa, mou Testing Data: kap Which language? Why? BOARD Slide credit: Dhruv Batra

Why not just hand-code these weights? We re letting the data do the work rather than develop hand-code classification rules The machine is learning to program itself But there are challenges

Slide credit: Dhruv Batra, figure credit: Liang Huang I saw her duck

Slide credit: Dhruv Batra, figure credit: Liang Huang I saw her duck

Slide credit: Dhruv Batra, figure credit: Liang Huang I saw her duck

I saw her duck with a telescope Slide credit: Dhruv Batra, figure credit: Liang Huang

Slide credit: Larry Zitnick What humans see

What computers see 243 239 240 225 206 185 188 218 211 206 216 225 242 239 218 110 67 31 34 152 213 206 208 221 243 242 123 58 94 82 132 77 108 208 208 215 235 217 115 212 243 236 247 139 91 209 208 211 233 208 131 222 219 226 196 114 74 208 213 214 232 217 131 116 77 150 69 56 52 201 228 223 232 232 182 186 184 179 159 123 93 232 235 235 232 236 201 154 216 133 129 81 175 252 241 240 235 238 230 128 172 138 65 63 234 249 241 245 237 236 247 143 59 78 10 94 255 248 247 251 234 237 245 193 55 33 115 144 213 255 253 251 248 245 161 128 149 109 138 65 47 156 239 255 190 107 39 102 94 73 114 58 17 7 51 137 23 32 33 148 168 203 179 43 27 17 12 8 17 26 12 160 255 255 109 22 26 19 35 24 Slide credit: Larry Zitnick

Challenges Some challenges: ambiguity and context Machines take data representations too literally Humans are much better than machines at generalization, which is needed since test data will rarely look exactly like the training data

Challenges Why might it be hard to: Predict if a viewer will like a movie? Recognize cars in images? Translate between languages?

The Time is Ripe to Study ML Many basic effective and efficient algorithms available. Large amounts of on-line data available. Large amounts of computational resources available. Slide credit: Ray Mooney

Slide credit: Dhruv Batra, Fei Sha Where does ML fit in?

Measuring Performance If y is discrete: Accuracy: # correctly classified / # all test examples True Positive, False Positive, True Negative, False Negative Weighted misclassification via a confusion matrix

Measuring Performance If y is discrete: Precision/recall Precision = TP / (TP + FP) = # predicted true pos / # predicted pos Recall = TP / (TP + FN) = # predicted true pos / # true pos F-measure = 2PR / (P + R) Want evaluation metric to be in some range, e.g. [0 1] 0 = worst possible classifier, 1 = best possible classifier

Precision / Recall / F-measure True positives (images that contain people) True negatives (images that do not contain people) Predicted positives (images predicted to contain people) Predicted negatives (images predicted not to contain people) Precision = 2 / 5 = 0.4 Recall = 2 / 4 = 0.5 F-measure = 2*0.4*0.5 / 0.4+0.5 = 0.44 Accuracy: 5 / 10 = 0.5

Measuring Performance If y is continuous: Sum-of-Squared-Differences (SSD) error between predicted and true y: E n ( f ( x i ) y i 1 i 2 )

Your Homework Fill out Doodle Read entire course website Do first reading

Next Time Linear algebra review Matlab tutorial