CS534 Machine Learning Spring 2013 Lecture 1: Introduction to ML Course logistics Reading: The discipline of Machine learning by Tom Mitchell
Course Information Instructor: Dr. Xiaoli Fern Kec 3073, xfern@eecs.oregonstate.edu TA: Travis Moore Office hour (tentative) Instructor: MW before class 11 12 or by appointment TA: TBA (see class webpage for update) Class Web Page classes.engr.oregonstate.edu/eecs/spring2013/cs534/ Class email list cs534 sp13@engr.orst.edu
Course materials Text book: Pattern recognition and machine learning by Chris Bishop (Bishop) Slides and reading materials will be provided on course webpage Other good references Machine learning by Tom Mitchell (TM) Pattern Classification by Duda, Hart and Stork (DHS) 2 nd edition A lot of online resources on machine learning Check class website for a few links 3
Prerequisites Color Green means important Basic probability theory and statistics concepts: Distributions, Densities, Expectation, Variance, parameter estimation A brief review is provided on class website Multivariable Calculus and linear algebra Basic review slides, and links to useful video lectures provided on class webpage Knowledge of basic CS concepts such as data structure, search strategies, complexity Please spend some time review these! It will be tremendously helpful!
Homework Policies Homework is generally due at the beginning of the class on the due day Each student has one allowance of handing in late homework (no more than 48 hours late) Collaboration policy Discussions are allowed, but copying of solution or code is not See the Student Conduct page on OSU website for information regarding academic dishonesty (http://oregonstate.edu/studentconduct/code/ind ex.php#acdis)
Grading policy Grading policy: Written homework will not be graded based on correctness. We will record the number of problems that were "completed" (either correctly or incorrectly). Completing a problems requires a non trivial attempt at solving the problem. The judgment of whether a problem was "completed" is left to the instructor and the TA. Final grades breakdown: Midterm 25%; Final 25%; Final project 25%; Implementation assignments 25%. The resulting letter grade will be decreased by one if a student fails to complete at least 80% of the written homework problems.
What is Machine learning Task T Performance P Learning Algorithm Experience E Machine learning studies algorithms that Improve performance P at some task T based on experience E
Machine learning in Computer Science Machine learning is already the preferred approach to Speech recognition, Natural language processing Computer vision Medical outcomes analysis Robot control This trend is growing Improved machine learning algorithms Increase data capture, and new sensors Increasing demand for self customization to user and environment
Fields of Study Machine Learning Supervised Learning Semi supervised learning Unsupervised Learning Reinforcement Learning
Supervised Learning Learn to predict output from input. Output can be continuous: regression problems $ x x x x x x x x x x x x x Example: Predicting the price of a house based on its square footage x feet
Supervised Learning Learn to predict output from input. Output can be continuous: regression problems Discrete: classification problems Example: classify a loan applicant as either high risk or low risk based on income and saving amount.
Unsupervised Learning Given a collection of examples (objects), discover self similar groups within the data clustering Example: clustering artwork
Unsupervised learning Given a collection of examples (objects), discover self similar groups within the data clustering Image Segmentation 13
Unsupervised Learning Given a collection of examples (objects), discover self similar groups within the data clustering Learn the underlying distribution that generates the data we observe density estimation Represent high dimensional data using a lowdimensional representation for compression or visualization dimension reduction
Reinforcement Learning Learn to act An agent Observes the environment Takes action With each action, receives rewards/punishments Goal: learn a policy that optimizes rewards No examples of optimal outputs are given Not covered in this class. Take 533 if you want to learn about this.
When do we need computer to learn?
Appropriate Applications for Supervised Learning Situations where there is no human expert x: bond graph of a new molecule, f(x): predicted binding strength to AIDS protease molecule x: nano modification structure to a Fuel cell, f(x): predicted power output strength by the fuel cell Situations where humans can perform the task but can t describe how they do it x: picture of a hand written character, f(x): ascii code of the character x: recording of a bird song, f(x): species of the bird Situations where the desired function is changing frequently x: description of stock prices and trades for last 10 days, f(x): recommended stock transactions Situations where each user needs a customized function f x: incoming email message, f(x): importance score for presenting to the user (or deleting without presenting) 17
Supervised learning Given: a set of training examples,, : the input of the th example ( i.e., a vector) is its corresponding output (continuous or discrete) We assume there is some underlying function that maps from to our target function Goal: find a good approximation of so that accurate prediction can be made for previously unseen
The underline function:
Polynomial curve fitting There are infinite functions that will fit the training data perfectly. In order to learn, we have to focus on a limited set of possible functions We call this our hypothesis space E.g., all M th order polynomial functions 2 y( x, w) w w x w x... 0 1 M w M x w = (w 0, w 1,, w M ) represents the unknown parameters that we wish to learn from the training data Learning here means to find a good set of parameters w to minimize some loss function 2 This optimization problem can be solved easily. We will not focus on solving this at this point, will revisit this later.
Important Issue: Model Selection The red line shows the function learned with different M values Which M should we choose this is a model selection problem Can we use E(w) that we define in previous slides as a criterion to choose M?
Over fitting As M increases, loss on the training data decreases monotonically However, the loss on test data starts to increase after a while Why? Is this a fluke or generally true? It turns out this is generally the case caused by over fitting
Over fitting Over fitting refers to the phenomenon when the learner adjusts to very specific random features of the training data, which differs from the target function Real example: In Bug ID project, x: image of a robotically maneuvered bug, f(x): the species of the bug Initial attempt yields close to perfect accuracy Reason: the different species were imaged in different batches, one species when imaging, has a peculiar air bubble in the image.
Overfitting Over fitting happens when There is too little data (or some systematic bias in the data ) There are too many parameters
Key Issues in Machine Learning What are good hypothesis spaces? Linear functions? Polynomials? which spaces have been useful in practical applications? How to select among different hypothesis spaces? The Model selection problem Trade off between over fitting and under fitting How can we optimize accuracy on future data points? This is often called the Generalization Error error on unseen data pts Related to the issue of overfitting, i.e., the model fitting to the peculiarities rather than the generalities of the data What level of confidence should we have in the results? (A statistical question) How much training data is required to find an accurate hypotheses with high probability? This is the topic of learning theory Are some learning problems computationally intractable? (A computational question) Some learning problems are provably hard Heuristic / greedy approaches are often used when this is the case How can we formulate application problems as machine learning problems? (the engineering question) 25
Terminology Training example an example of the form <x,y> x: feature vector y continuous value for regression problems class label, in [1, 2,, K], for classification problems Training Set a set of training examples drawn randomly from P(x,y) Target function the true mapping from x to y Hypothesis: a proposed function h considered by the learning algorithm to be similar to the target function. Test Set a set of training examples used to evaluate a proposed hypothesis h. Hypothesis space The space of all hypotheses that can, in principle, be output by a particular learning algorithm 26