CS534 Machine Learning

Similar documents
(Sub)Gradient Descent

Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CSL465/603 - Machine Learning

CS Machine Learning

Python Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Lecture 1: Basic Concepts of Machine Learning

Self Study Report Computer Science

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

STA 225: Introductory Statistics (CT)

Laboratorio di Intelligenza Artificiale e Robotica

Assignment 1: Predicting Amazon Review Ratings

Probability and Statistics Curriculum Pacing Guide

Lecture 10: Reinforcement Learning

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Data Structures and Algorithms

Math 96: Intermediate Algebra in Context

Firms and Markets Saturdays Summer I 2014

Laboratorio di Intelligenza Artificiale e Robotica

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Mathematics. Mathematics

Chapter 2 Rule Learning in a Nutshell

Math 181, Calculus I

Generative models and adversarial training

MGT/MGP/MGB 261: Investment Analysis

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Probabilistic Latent Semantic Analysis

Learning From the Past with Experiment Databases

arxiv: v1 [cs.cl] 2 Apr 2017

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Learning Methods for Fuzzy Systems

EGRHS Course Fair. Science & Math AP & IB Courses

Artificial Neural Networks written examination

Axiom 2013 Team Description Paper

Accounting 312: Fundamentals of Managerial Accounting Syllabus Spring Brown

Speech Recognition at ICSI: Broadcast News and beyond

LOUISIANA HIGH SCHOOL RALLY ASSOCIATION

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Course Name: Elementary Calculus Course Number: Math 2103 Semester: Fall Phone:

Going to School: Measuring Schooling Behaviors in GloFish

Physics 270: Experimental Physics

A Case Study: News Classification Based on Term Frequency

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

WHEN THERE IS A mismatch between the acoustic

Penn State University - University Park MATH 140 Instructor Syllabus, Calculus with Analytic Geometry I Fall 2010

Exploring Derivative Functions using HP Prime

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Math Techniques of Calculus I Penn State University Summer Session 2017

Detailed course syllabus

12- A whirlwind tour of statistics

Disambiguation of Thai Personal Name from Online News Articles

Reinforcement Learning by Comparing Immediate Reward

Softprop: Softmax Neural Network Backpropagation Learning

MTH 215: Introduction to Linear Algebra

Pre-AP Geometry Course Syllabus Page 1

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

PROVIDENCE UNIVERSITY COLLEGE

Seminar - Organic Computing

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Word Segmentation of Off-line Handwritten Documents

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

FINN FINANCIAL MANAGEMENT Spring 2014

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

Office Hours: Mon & Fri 10:00-12:00. Course Description

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Using focal point learning to improve human machine tacit coordination

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

CS/SE 3341 Spring 2012

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

The Strong Minimalist Thesis and Bounded Optimality

INTERMEDIATE ALGEBRA PRODUCT GUIDE

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Unit 3: Lesson 1 Decimals as Equal Divisions

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Syllabus for CHEM 4660 Introduction to Computational Chemistry Spring 2010

Unit: Human Impact Differentiated (Tiered) Task How Does Human Activity Impact Soil Erosion?

Characteristics of Functions

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Mathematics subject curriculum

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

Introduction to Forensic Drug Chemistry

School of Innovative Technologies and Engineering

Agents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators

Spring 2016 Stony Brook University Instructor: Dr. Paul Fodor

Hardhatting in a Geo-World

Transcription:

CS534 Machine Learning Spring 2013 Lecture 1: Introduction to ML Course logistics Reading: The discipline of Machine learning by Tom Mitchell

Course Information Instructor: Dr. Xiaoli Fern Kec 3073, xfern@eecs.oregonstate.edu TA: Travis Moore Office hour (tentative) Instructor: MW before class 11 12 or by appointment TA: TBA (see class webpage for update) Class Web Page classes.engr.oregonstate.edu/eecs/spring2013/cs534/ Class email list cs534 sp13@engr.orst.edu

Course materials Text book: Pattern recognition and machine learning by Chris Bishop (Bishop) Slides and reading materials will be provided on course webpage Other good references Machine learning by Tom Mitchell (TM) Pattern Classification by Duda, Hart and Stork (DHS) 2 nd edition A lot of online resources on machine learning Check class website for a few links 3

Prerequisites Color Green means important Basic probability theory and statistics concepts: Distributions, Densities, Expectation, Variance, parameter estimation A brief review is provided on class website Multivariable Calculus and linear algebra Basic review slides, and links to useful video lectures provided on class webpage Knowledge of basic CS concepts such as data structure, search strategies, complexity Please spend some time review these! It will be tremendously helpful!

Homework Policies Homework is generally due at the beginning of the class on the due day Each student has one allowance of handing in late homework (no more than 48 hours late) Collaboration policy Discussions are allowed, but copying of solution or code is not See the Student Conduct page on OSU website for information regarding academic dishonesty (http://oregonstate.edu/studentconduct/code/ind ex.php#acdis)

Grading policy Grading policy: Written homework will not be graded based on correctness. We will record the number of problems that were "completed" (either correctly or incorrectly). Completing a problems requires a non trivial attempt at solving the problem. The judgment of whether a problem was "completed" is left to the instructor and the TA. Final grades breakdown: Midterm 25%; Final 25%; Final project 25%; Implementation assignments 25%. The resulting letter grade will be decreased by one if a student fails to complete at least 80% of the written homework problems.

What is Machine learning Task T Performance P Learning Algorithm Experience E Machine learning studies algorithms that Improve performance P at some task T based on experience E

Machine learning in Computer Science Machine learning is already the preferred approach to Speech recognition, Natural language processing Computer vision Medical outcomes analysis Robot control This trend is growing Improved machine learning algorithms Increase data capture, and new sensors Increasing demand for self customization to user and environment

Fields of Study Machine Learning Supervised Learning Semi supervised learning Unsupervised Learning Reinforcement Learning

Supervised Learning Learn to predict output from input. Output can be continuous: regression problems $ x x x x x x x x x x x x x Example: Predicting the price of a house based on its square footage x feet

Supervised Learning Learn to predict output from input. Output can be continuous: regression problems Discrete: classification problems Example: classify a loan applicant as either high risk or low risk based on income and saving amount.

Unsupervised Learning Given a collection of examples (objects), discover self similar groups within the data clustering Example: clustering artwork

Unsupervised learning Given a collection of examples (objects), discover self similar groups within the data clustering Image Segmentation 13

Unsupervised Learning Given a collection of examples (objects), discover self similar groups within the data clustering Learn the underlying distribution that generates the data we observe density estimation Represent high dimensional data using a lowdimensional representation for compression or visualization dimension reduction

Reinforcement Learning Learn to act An agent Observes the environment Takes action With each action, receives rewards/punishments Goal: learn a policy that optimizes rewards No examples of optimal outputs are given Not covered in this class. Take 533 if you want to learn about this.

When do we need computer to learn?

Appropriate Applications for Supervised Learning Situations where there is no human expert x: bond graph of a new molecule, f(x): predicted binding strength to AIDS protease molecule x: nano modification structure to a Fuel cell, f(x): predicted power output strength by the fuel cell Situations where humans can perform the task but can t describe how they do it x: picture of a hand written character, f(x): ascii code of the character x: recording of a bird song, f(x): species of the bird Situations where the desired function is changing frequently x: description of stock prices and trades for last 10 days, f(x): recommended stock transactions Situations where each user needs a customized function f x: incoming email message, f(x): importance score for presenting to the user (or deleting without presenting) 17

Supervised learning Given: a set of training examples,, : the input of the th example ( i.e., a vector) is its corresponding output (continuous or discrete) We assume there is some underlying function that maps from to our target function Goal: find a good approximation of so that accurate prediction can be made for previously unseen

The underline function:

Polynomial curve fitting There are infinite functions that will fit the training data perfectly. In order to learn, we have to focus on a limited set of possible functions We call this our hypothesis space E.g., all M th order polynomial functions 2 y( x, w) w w x w x... 0 1 M w M x w = (w 0, w 1,, w M ) represents the unknown parameters that we wish to learn from the training data Learning here means to find a good set of parameters w to minimize some loss function 2 This optimization problem can be solved easily. We will not focus on solving this at this point, will revisit this later.

Important Issue: Model Selection The red line shows the function learned with different M values Which M should we choose this is a model selection problem Can we use E(w) that we define in previous slides as a criterion to choose M?

Over fitting As M increases, loss on the training data decreases monotonically However, the loss on test data starts to increase after a while Why? Is this a fluke or generally true? It turns out this is generally the case caused by over fitting

Over fitting Over fitting refers to the phenomenon when the learner adjusts to very specific random features of the training data, which differs from the target function Real example: In Bug ID project, x: image of a robotically maneuvered bug, f(x): the species of the bug Initial attempt yields close to perfect accuracy Reason: the different species were imaged in different batches, one species when imaging, has a peculiar air bubble in the image.

Overfitting Over fitting happens when There is too little data (or some systematic bias in the data ) There are too many parameters

Key Issues in Machine Learning What are good hypothesis spaces? Linear functions? Polynomials? which spaces have been useful in practical applications? How to select among different hypothesis spaces? The Model selection problem Trade off between over fitting and under fitting How can we optimize accuracy on future data points? This is often called the Generalization Error error on unseen data pts Related to the issue of overfitting, i.e., the model fitting to the peculiarities rather than the generalities of the data What level of confidence should we have in the results? (A statistical question) How much training data is required to find an accurate hypotheses with high probability? This is the topic of learning theory Are some learning problems computationally intractable? (A computational question) Some learning problems are provably hard Heuristic / greedy approaches are often used when this is the case How can we formulate application problems as machine learning problems? (the engineering question) 25

Terminology Training example an example of the form <x,y> x: feature vector y continuous value for regression problems class label, in [1, 2,, K], for classification problems Training Set a set of training examples drawn randomly from P(x,y) Target function the true mapping from x to y Hypothesis: a proposed function h considered by the learning algorithm to be similar to the target function. Test Set a set of training examples used to evaluate a proposed hypothesis h. Hypothesis space The space of all hypotheses that can, in principle, be output by a particular learning algorithm 26