Machine Learning CS 697AB Fall 2017
Administrative Stuff
Introduction Instructor: Dr. Kaushik Sinha 2 lectures per week TR 8:00-9:15 am Office Hours TR 9:45-10:45 Jabara Hall 243
Study Groups (2-3 people) This course will cover non-trivial material, learning in a group makes it less hard and more fun! It is recommended (but not required)
Prerequisites Three pillars of ML: Statistics / Probability Linear Algebra Multivariate Calculus Should be confident in at least 1/ 3, ideally 2/ 3.
Grades... Your grade is a composite of: (Homework) (45%) Exams (Mid-term1, Mid-term 2)(30%) Final Project (20%) Class participation (5%)
Homework You can discuss homework with your peers but your submitted answer should be your own! Make honest attempt on all questions (45% of your total grade) Typically include programming assignment on MATLAB
Exams Exams will be (to some degree) based on homework assignments Best preparation: Make sure you really really understand the homework assignments 2 Exams: Midterm 1 & 2 Will be 30% of your grade.
Final Project 20% of your grade. Individual projects. Sufficient details of the project will be provided in class. You have to fill in the gaps Will require thinking and in-depth study Details will be posted on course website later
Cheating Don t cheat! Use your common sense. I won t be your friend anymore!
MACHINE LEARNING!!!
What is Machine Learning? Formally: (Mitchell 1997): A computer program A is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. Informally: Algorithms that improve on some task with experience.
When should we use ML? Not an ML problem: E.g. traveling salesman, bin packing, 3-sat, etc. These are well defined problems, that can easily be formalized What if this is impossible? E.g. Which picture contains the human, which one contains the dog?
When should we use ML? Not ML problems: Traveling Salesman, 3-Sat, etc. ML Problems: Hard to formalize, but human expert can provide examples / feedback. Computer needs to learn from feedback. Is there a sign of cancer in this fmri scan? What will the Dow Jones be tomorrow? Teach a robot to ride a unicycle.
Sometimes easy for humans, hard for computers Male or Female? Even 1 year old children can identify gender pretty reliably Easy to come up with examples. But impossible to formalize as a CS problem. You need machine learning!
Example: Problem: Given an image of a handwritten digit, what digit is it? Input: Problem: You have absolutely no idea how to do this! Clever Algorithm Output: 2
Example: Problem: Given an image of a handwritten digit, what digit is it? 0 1 2 3 4 5 6 7 8 9 Input: Output: Clever Algorithm 2 Problem: You have absolutely no idea how to do this! Good news: You have examples
Example: Problem: Given an image of a handwritten digit, what digit is it? The Machine Learning Approach: 0 1 2 3 4 5 6 7 8 9 Machine Learning Algorithm Input: Output: Clever Algorithm 2
Example: Problem: Given an image of a handwritten digit, what digit is it? 0 1 2 3 4 5 6 7 8 9 Training Machine Learning Algorithm Testing Learned Algorithm 2
Handwritten Digits Recognition (1990-1995) Pretty much solved in the mid nine-tees. (Lecun et al) Convolutional Neural Networks Now used by USPS for zip-codes, ATMs for automatic check cashing etc.
TD-Gammon (1994) Gerry Tesauro (IBM) teaches a neural network to play Backgammon. The net plays 100K+ games against itself and beats world champion [Neurocomputation 1994] Algorithm teaches itself how to play so well!!!
Deep Blue (1997) IBM s Deep Blue wins against Kasparov in chess. Crucial winning move is made due to Machine Learning (G. Tesauro).
Watson (2011) IBM s Watson wins the game show jeopardy against former winners Brad Rutters and Ken Jennings. Extensive Machine Learning techniques were used.
Face Detection (2001) Viola Jone s solves face detection Previously very hard problem in computer vision Now commodity in off-the-shelf cellphones / cameras
Grand Challenge (2005) Darpa Grand Challenge: The vehicle must drive autonomously 150 Miles through the dessert along a difficult route. 2004 Darpa Grand Challenge huge disappointment, best team makes 11.78 / 150 miles 2005 Darpa Grand Challenge 2 is completed by several ML powered teams.
Speech, Netflix,... iphone ships with built-in speech recognition Google mobile search speech based (very reliable) Automatic translation...
ML is the engine for many fields... Natural Language Processing Computer Vision Machine Learning Computatio nal Biology Robotics
Internet companies Collecting massive amounts of data Hoping that some smart Machine Learning person makes money out of it. Your future job!
Example: Webmail Spam filtering Given Email, predict if it is spam or not. Ad - matching Given user info predict which ad will be clicked on.
Example: Websearch Ad Matching Given query, predict which ad will be clicked on. Web-search ranking Given query, predict which document will be clicked on.
Example: Google News Document clustering Given news articles, automatically identify and sort them by topic.
When will it stop? The human brain is one big learning machine We know that we can still do a lot better! However, it is hard. Very few people can design new ML algorithms. But many people can use them!
What types of ML are there? As far as this course is concerned: Supervised learning: Given labeled examples, find the right prediction of an unlabeled example. (e.g. Given annotated images learn to detect faces.) Unsupervised learning: Given data try to discover similar patterns, structure, low dimensional (e.g. automatically cluster news articles by topic)
Basic Setup Pre-processing Clean up the data. Boring but necessary. Feature Extraction Use expert knowledge to get representation of data. Learning Focus of this course. (Post-processing) Whatever you do when you are done.
Feature Extraction
Feature Extraction Represent data in terms of vectors. Features are statistics that describe the data. Real World Data Vector Space Each dimension is one feature.
Handwritten digits Features are statistics that describe the data Feature: width/height Pretty good for 1 vs. 2 Not so good for 2 vs. 3 Feature: raw pixels 16x16 Works for digits (to some degree) Does not work for trickier stuff 256x1
Bag of Words for Images Sparse Vector Image: Interest Points 0 1 0 0 0 3 Dictionary of possible interest points. 0 0 0 Extract interest points and represent the image as a bag of interest points. 0
Text (Bag of Words) Text documents: Bag of Words 0 1 in into... 0 0 0... 2 is 0... 0 0 Take dictionary with n words. Represent a text document as n dimensional vector, where the i-th dimension contains the number of times word i appears in the document. 0
Audio? Movies? QuickTime and a Photo - JPEG decompressor are needed to see this picture. Use a sliding window and Fast Fourier Transform Treat it as a sequence of images
Feature Space Everything that can be stored on a computer can stored as a vector Representation is critical for successful learning. [Not in this course, though.] Throughout this course we will assume data is just points in a Feature Space Important distinction: sparse / dense Most features are zero Every feature is present
Mini-Quiz T/F: Every traditional CS problem is also an ML problem. FALSE T/F: Image Features are always dense. FALSE T/F: The feature space can be very high dimensional. TRUE T/F: Bag of words features are sparse. TRUE
Mini-Quiz T/F: Every traditional CS problem is also an ML problem. FALSE T/F: Image Features are always dense. FALSE T/F: The feature space can be very high dimensional. TRUE T/F: Bag of words features are sparse. TRUE