Question of the Day. Machine Learning 2D1431. Course Requirements. Machine Learning. What is the next symbol in this series?

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Axiom 2013 Team Description Paper

Laboratorio di Intelligenza Artificiale e Robotica

Laboratorio di Intelligenza Artificiale e Robotica

Lecture 1: Machine Learning Basics

TD(λ) and Q-Learning Based Ludo Players

Lecture 10: Reinforcement Learning

Evolutive Neural Net Fuzzy Filtering: Basic Description

Reinforcement Learning by Comparing Immediate Reward

Artificial Neural Networks written examination

CSL465/603 - Machine Learning

Lecture 1: Basic Concepts of Machine Learning

Learning Methods for Fuzzy Systems

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

A Neural Network GUI Tested on Text-To-Phoneme Mapping

An OO Framework for building Intelligence and Learning properties in Software Agents

(Sub)Gradient Descent

Python Machine Learning

Active Learning. Yingyu Liang Computer Sciences 760 Fall

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

LEGO MINDSTORMS Education EV3 Coding Activities

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

SAM - Sensors, Actuators and Microcontrollers in Mobile Robots

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Chapter 2. Intelligent Agents. Outline. Agents and environments. Rationality. PEAS (Performance measure, Environment, Actuators, Sensors)

Welcome to. ECML/PKDD 2004 Community meeting

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

A student diagnosing and evaluation system for laboratory-based academic exercises

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Softprop: Softmax Neural Network Backpropagation Learning

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Using focal point learning to improve human machine tacit coordination

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

AMULTIAGENT system [1] can be defined as a group of

CS Machine Learning

Lecture 6: Applications

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Speeding Up Reinforcement Learning with Behavior Transfer

Generative models and adversarial training

Using dialogue context to improve parsing performance in dialogue systems

Evolution of Symbolisation in Chimpanzees and Neural Nets

Firms and Markets Saturdays Summer I 2014

Human Emotion Recognition From Speech

On the Combined Behavior of Autonomous Resource Management Agents

Improving Action Selection in MDP s via Knowledge Transfer

A Reinforcement Learning Variant for Control Scheduling

Faculty of Health and Behavioural Sciences School of Health Sciences Subject Outline SHS222 Foundations of Biomechanics - AUTUMN 2013

INPE São José dos Campos

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

SARDNET: A Self-Organizing Feature Map for Sequences

Australian Journal of Basic and Applied Sciences

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Math 1313 Section 2.1 Example 2: Given the following Linear Program, Determine the vertices of the feasible set. Subject to:

Multisensor Data Fusion: From Algorithms And Architectural Design To Applications (Devices, Circuits, And Systems)

Cooperative evolutive concept learning: an empirical study

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

Knowledge Transfer in Deep Convolutional Neural Nets

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Seminar - Organic Computing

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

While you are waiting... socrative.com, room number SIMLANG2016

arxiv: v2 [cs.ro] 3 Mar 2017

Word Segmentation of Off-line Handwritten Documents

University of Groningen. Systemen, planning, netwerken Bosman, Aart

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

A Case Study: News Classification Based on Term Frequency

MYCIN. The MYCIN Task

Assignment 1: Predicting Amazon Review Ratings

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

High-level Reinforcement Learning in Strategy Games

Georgetown University at TREC 2017 Dynamic Domain Track

Intelligent Agents. Chapter 2. Chapter 2 1

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Sugar And Salt Solutions Phet Simulation Packet

arxiv: v1 [cs.cl] 2 Apr 2017

Radius STEM Readiness TM

Issues in the Mining of Heart Failure Datasets

An investigation of imitation learning algorithms for structured prediction

Learning and Transferring Relational Instance-Based Policies

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Knowledge Synthesis and Integration: Changing Models, Changing Practices

ACCOUNTING FOR MANAGERS BU-5190-AU7 Syllabus

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

Unit 3: Lesson 1 Decimals as Equal Divisions

Knowledge-Based - Systems

Corrective Feedback and Persistent Learning for Information Extraction

A study of speaker adaptation for DNN-based speech synthesis

Time series prediction

Lecture 15: Test Procedure in Engineering Design

Agents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators

Circuit Simulators: A Revolutionary E-Learning Platform

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Rule Learning With Negation: Issues Regarding Effectiveness

An empirical study of learning speed in backpropagation

Transcription:

Question of the Day Machine Learning 2D1431 What is the next symbol in this series? Lecture 1: Introduction to Machine Learning Machine Learning lecturer: Frank Hoffmann hoffmann@nada.kth.se lab assistants: Mikael Huss hussm@nada.kth.se Martin Rehn rehn@nada.kth.se Course Requirements four mandatory labs Location: Spelhallen, Sporthalle Dates: Lab 1: Thursday 14/11/02 13-17 Lab 2: Thursday 21/11/02 13-17 Lab 3: Thursday 28/11/02 13-17 Lab 4: Thursday 5/12/02 13-17 written exam Location: L21-22 Date: 14/12/02 8-13 1

Grading Exam grade: U : 0-22p 3 : 23-28p 4 : 29-34p 5 : 35-40p Final grade: To pass the course you need at least a 3 in the exam For each lab presented in time you get 1.5 bonus points Example: exam 25 points 3 labs in time 4.5 bonus point total: 29.5 points, final grade 4 Labs Preparation Learn or refresh your knowledge on Matlab Start at least 2 weeks before the lab Read the lab instructions Read the reference material Complete the assignments, write the Matlab code, answer the questions Presentation No more than two students per group Both students need to understand the entire assignment and code Book a time for presentation Present results and code to the teaching assistant Exam Exam theoretical questions small practical exercises Scope It is not sufficient to just study the course book!!! attend lectures (lecture slides available) study course book and read additional literature participate in the labs and complete the assignments Course Information course webpage http://www.nada.kth.se/kurser/kth/2d1431/02/index.html course newsgroup news:nada.kurser.mi course directory /info/mi02 course module course join mi02 course registration in RES res checkin mi02 NADA UNIX account http://www.sgr.nada.kth.se/ 2

Course Literature Textbook (required): Machine Learning Tom M. Mitchell, McGraw Hill,1997 ISBN: 0-07-115467-1 (paperback) Additional literature: Reinforcement Learning An Introduction Richard S. Sutton, Andrew G. Barto, MIT Press, 1998 http://www-anw.cs.umass.edu/~rich/book/the-book.html Pattern Classification 2nd edition Richard O. Duda, Peter E. Hart, David G. Stork Neural Networks A Comprehensive Foundation 2nd edition Simon Haykin, Prentice-Hall, 1999 Matlab Labs in the course are based on Matlab learn or refresh your knowledge on Matlab Matlab Primer, Kermit Sigmon A Practical Introduction to Matlab, Mark S. Gockenback Matlab at Google http://directory.google.com/top/science/math/software/matlab Course Overview introduction to machine learning concept learning decision trees artificial neural networks evolutionary algorithms instance based learning reinforcement learning Bayesian learning computational learning theory fuzzy logic machine learning in robotics Software Packages & Datasets Machine Learning at Google http://directory.google.com/top/computers/artificial_ Intelligence/Machine_Learning Matlab Toolbox for Pattern Recognition http://www.ph.tn.tudelft.nl/~bob/prtools.html MIT GALIB in C++ http://lancet.mit.edu/ga Machine Learning Data Repository UC Irvine http://www.ics.uci.edu/~mlearn/ml/repository.html 3

Learning & Adaptation Learning Modification of a behavioral tendency by expertise. (Webster 1984) A learning machine, broadly defined is any device whose actions are influenced by past experiences. (Nilsson 1965) Any change in a system that allows it to perform better the second time on repetition of the same task or on another task drawn from the same population. (Simon 1983) An improvement in information processing ability that results from information processing activity. (Tanimoto 1990) Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience. Disciplines relevant to ML Artificial intelligence Bayesian methods Control theory Information theory Computational complexity theory Philosophy Psychology and neurobiology Statistics Applications of ML Learning to recognize spoken words SPHINX (Lee 1989) Learning to drive an autonomous vehicle ALVINN (Pomerleau 1989) Learning to classify celestial objects (Fayyad et al 1995) Learning to play world-class backgammon TD-GAMMON (Tesauro 1992) Designing the morphology and control structure of electro-mechanical artefacts GOLEM (Lipton, Pollock 2000) 4

ALVINN Automated driving at 70 mph on a public highway Camera image 30 outputs for steering 4 hidden units 30x32 pixels as inputs 30x32 weights into one out of four hidden unit Artificial Life GOLEM Project (Nature: Lipson, Pollack 2000) http://demo.cs.brandeis.edu/golem Evolve simple electromechanical locomotion machines from basic building blocks (bars, acuators, artificial neurons) in a simulation of the physical world (gravity, friction). The individuals that demonstrate the best locomotion ability are fabricated through rapid prototyping technology. Evolvable Robot Golem Movie 5

Evolved Creatures Evolved creatures: Sims (1994) http://genarts.com/karl/evolved-virtual-creatures.html Darwinian evolution of virtual block creatures for swimming, jumping, following, competing for a block Learning Learning problems Learning with a teacher Learning with a critic Unsupervised learning Learning tasks Pattern association Pattern recognition (classification) Function approximation Control Filtering Credit Assignment Problem The problem of assigning credit or blame for the overall outcomes to each of the internal decisions made by the learning machine which contributed to these outcomes Temporal credit assignment problem Involves the instants of time when the actions that deserve credit were taken Structural credit assignment problem Involves assigning credit to the internal structures of of actions generated by the system Learning with a Teacher supervised learning knowledge represented by a set of input-output examples (x i,y i ) minimize the error between the actual response of the learner and the desired response Environment state x Teacher Learning system actual response error signal desired response - Σ + 6

Learning with a Critic learning through interaction with the environment exploration of states and actions feed-back through delayed primary reinforcement signal (temporal credit assignment problem) goal: maximize accumulated future reinforcements Environment state action Critic heuristic reinforcement signal Learning system primary reinforcement signal Unsupervised Learning self-organized learning no teacher or critic task independent quality measure identify regularities in the data and discover classes automatically competitive learning Environment state Learning system Pattern Recognition A pattern/signal is assigned to one of a prescribed number of classes/categories rice raisins soup sugar fanta teabox Object Recognition goal: recognize objects in the image input: cropped raw RGB image decision: contains object - yes/no training examples: images of the object in different poses and different backgrounds possible features: raw image data color histograms spatial filters edge, corner detection 7

Function Approximation The goal is to approximate an unknown function d = f(x) such that the mapping F(x) realized by the learning system is close enough to f(x). F(x)-f(x) <ε for all x System identification and modeling: Describe the input-output relationship of an unknown time-invariant multiple input multiple output system Pose Estimation from Images goal: estimate the pose (orientation, position) of an object from its appearance input: image data output: 3-D pose (x, y, z, θ, ϕ, ψ) training examples: pairs of images with known object pose Control Learning Adjust the parameters of a controller such that the closed loop control system demonstrates a desired behaviour. reference signal + Σ - error signal Controller plant input unity feedback Plant plant output Control Learning Learning to choose actions Robot learns navigation and obstacle avoidance Learning to choose actions to optimize a factory output Learning to play Backgammon Problem characteristics: Delayed reward instead of immediate reward for good or bad actions, temporal credit assignment problem No supervised learning (no training examples in form of correct state, action pairs) Learning with a critic Need for active exploration 8

Learning to play Backgammon state : board state actions : possible moves reward function +100 win -100 loose 0 for all other actions/states trained by playing 1.5 million games against itself now approximately equal to the best human player link: http://www.research.ibm.com/massive/tdl.html reading assignment Tesauro [1995]) state s t Reinforcement Learning reward r t r t+1 s t+1 Agent Environment s 0 a 0 r 1 s 1 a 1 r 2 s 2 a 2 r 3 Ziel: Learn a policy a=π(s) which maximizes future accumulated rewards R = r t +γ r t+1 + γ 2 r t+2 + + = Σ i=0 r t+i γ i s 3 action a t Upswing of an Inverse Pendulum reward r: +1000 penalty r: -1000 Upswing of an Inverse Pendulum state s: angle ϕ angular velocity ω control action a: left right brake upswing_1.mov 9

Learning Problem Learning to play checkers Learning: improving with experience at some task Improve over task T With respect to performance measure P Based on experience E Example: Learn to play checkers: T: play checkers P: percentage of games won in a tournament E: opportunity to play against itself T: play checkers P: percentage of games won What experience? What exactly should be learned? How shall it be represented? What specific algorithm to learn it? Type of Training Experience Direct or indirect? Direct: board state -> correct move Indirect: outcome of a complete game Credit assignment problem Teacher or not? Teacher selects board states Learner can select board states Is training experience representative of performance goal? Training playing against itself Performance evaluated playing against world champion Choose Target Function ChooseMove : B M : board state move Maps a legal board state to a legal move Evaluate : B V : board state board value Assigns a numerical score to any given board state, such that better board states obtain a higher score Select the best move by evaluating all successor states of legal moves and pick the one with the maximal score 10

Definition of Target Function If b is a final board state that is won then V(b) = 100 If b is a final board state that is lost then V(b) = - 100 If b is a final board state that is drawn then V(b)=0 If b is not a final board state, then V(b)=V(b ), where b is the best final board state that can be achieved starting from b and playing optimally until the end of the game. Gives correct values but is not operational State Space Search V(b)=? V(b)= max i V(b i ) m 1 : b b 1 m 2 : b b 2 m 3 : b b 3 State Space Search V(b 1 )=? V(b 1 )= min i V(b i ) Final Board States Black wins: V(b)=-100 m 4 : b b 4 m 5 : b b 5 m 6 : b b 6 Red wins: V(b)=100 draw: V(b)=0 11

Number of Board States Tic-Tac-Toe: #board states < 9!/(5! 4!) + 9!/(1! 4! 4!) + + 9!/(2! 4! 3!) + 9 = 6045 4 x 4 checkers: (no queens) #board states =? #board states < 8x7x6x5*2 2 /(2!*2!) = 1680 Regular checkers (8x8 board, 8 pieces each) #board states < 32!*2 16 /(8! * 8! * 16!) = 5.07*10 17 Representation of Target Function table look-up collection of rules neural networks polynomial function of board features trade-off in choosing an expressive representation: approximation accuracy number of training examples required to learn the target function Representation of Target Function V(b)=ω 0 + ω 1 bp(b) + ω 2 rp(b) + ω 3 bk(b) + ω 4 rk(b) + ω 5 bt(b) + ω 6 rt(b) bp(b): #black pieces rb(b): #red pieces bk(b): #black kings rk(b): #red kings bt(b): #red pieces threatened by black rt(b): #black pieces threatened by red Obtaining Training Examples V(b) : true target function V (b) : learned target function V train (b) : training value Rule for estimating training values: V train (b) V (Successor(b)) 12

Choose Weight Training Rule LMS weight update rule: Select a training example b at random 1. Compute error(b) error(b) = V train (b) V (b) 2. For each board feature f i, update weight ω i ω i + η f i error(b) η : learning rate approx. 0.1 Example: 4x4 checkers V(b)=ω 0 + ω 1 rp(b) + ω 2 bp(b) Initial weights: ω 0 =-10, ω 1 =75, ω 2 =-60 m 1 : b b 1 V(b 1 )=20 V(b 0 )=ω 0 + ω 1 *2 + ω 2 *2 = 20 m 2 : b b 2 V(b 2 )=20 m 3 : b b 3 V(b 3 )=20 Example 4x4 checkers Example: 4x4 checkers V(b 0 )=20 V(b 1 )=20 V(b 0 )=20 V(b 1 )=20 1. Compute error(b 0 ) = V train (b) V(b 0 ) = V(b 1 ) V(b 0 ) = 0 2. For each board feature f i, update weight ω i ω i + η f i error(b) ω 0 ω 0 + 0.1 * 1 * 0 ω 1 ω 1 + 0.1 * 2 * 0 ω 2 ω 2 + 0.1 * 2 * 0 V(b 3 )=20 V(b 2 )=20 13

Example: 4x4 checkers Example 4x4 checkers V(b 3 )=20 V(b 4a )=20 V(b 4b )=-55 V(b V(b 4 )=-55 3 )=20 1. Compute error(b 3 ) = V train (b) V(b 3 ) = V(b 4 ) V(b 3 ) = -75 2. For each board feature f i, update weight ω i ω i + η f i error(b) : ω 0 =-10, ω 1 =75, ω 2 =-60 ω 0 ω 0-0.1 * 1 * 75, ω 0 = -17.5 ω 1 ω 1-0.1 * 2 * 75, ω 1 = 60 ω 2 ω 2-0.1 * 2 * 75, ω 2 = -75 Example: 4x4 checkers Example 4x4 checkers ω 0 = -17.5, ω 1 = 60, ω 2 = -75 V(b 5 )=-107.5 V(b 6 )=-167.5 V(b 4 )=-107.5 V(b 5 )=-107.5 error(b 5 ) = V train (b) V(b 5 ) = V(b 6 ) V(b 5 ) = -60 ω 0 =-17.5, ω 1 =60, ω 2 =-75 ω i ω i + η f i error(b) ω 0 ω 0-0.1 * 1 * 60, ω 0 = -23.5 ω 1 ω 1-0.1 * 1 * 60, ω 1 = 54 ω 2 ω 2-0.1 * 2 * 60, ω 2 = -87 14

Example 4x4 checkers Final board state: black won V f (b)=-100 V(b 6 )=-197.5 error(b 6 ) = V train (b) V(b 6 ) = V f (b 6 ) V(b 6 ) = 97.5 ω 0 =-23.5, ω 1 =54, ω 2 =-87 ω i ω i + η f i error(b) ω 0 ω 0 + 0.1 * 1 * 97.5, ω 0 = 13.75 ω 1 ω 1 + 0.1 * 0 * 97.5, ω 1 = 54 ω 2 ω 2 + 0.1 * 2 * 97.5, ω 2 = -67.5 Evolution of Value Function Training data: before after Design Choices Games against experts Board Move Determine Type of Training Experience Games against self Determine Target Function Board Value Table of correct moves Determine Representation of Learned Function polynomial Linear function of Artificial neural Determine Learning Algorithm six features network Gradient descent Linear programming Learning Problem Examples Credit card applications Task T: Distinguish good applicants from risky applicants. Performance measure P :? Experience E :? (direct/indirect) Target function :? 15

Performance Measure P: Error based: minimize percentage of incorrectly classified customers : P = N fp + N fn / N N fp : # false positives (rejected good customers) N fn : # false negatives (accepted bad customers) Utility based: maximize expected profit of credit card business: P = N cp *U cp + N fn *U fn U cp : expected utility of an accepted good customer U fn : expected utility/loss of an accepted bad customer Experience E: Direct: Decisions on credit card applications made by a human financial expert Training data: <customer inf., reject/accept> Direct: Actual customer behavior based on previously accepted customers Training data: <customer inf., good/bad> Problem: Distribution of applicants P applicant is not identical with training data P train Indirect: Evaluate a decision policy based on the profit you made over the past N years. Distribution of Applicants Distribution of Accepted Customers Good customers Cw=38 Good customers Cw=43 Bad customers Bad customers Assume we want to minimize classification error: What is the optimal decision boundary? What is the optimal decision boundary? 16

Target Function Customer record: income, owns house, credit history, age, employed, accept $40000, yes, good, 38, full-time, yes $25000, no, excellent, 25, part-time, no $50000, no, poor, 55, unemployed, no T: Customer data accept/reject T: Customer data probability good customer T: Customer data expected utility/profit Learning methods Decision rules: If income < $30.000 then reject Bayesian network: P(good income, credit history,.) Neural Network: Nearest Neighbor: Take the same decision as for the customer in the data base that is most similar to the applicant Learning Problem Examples Obstacle Avoidance Behavior of a Mobile Robot Task T: Navigate robot safely through an environment. Performance measure P :? Experience E :? Target function :? Performance Measure P: P: Maximize time until collision with obstacle P: Maximize distance travelled until collision with obstacle P: Minimize rotational velocity, maximize translational velocity P: Minimize error between control action of a human operator and robot controller in the same situation 17

Training Experience E: Direct: Monitor human operator and use her control actions as training data: E = { <perception i, action i >} Indirect: Operate robot in the real world or in a simulation. Reward desirable states, penalize undesirable states V(b) = +1 if v > 0.5 m/s V(b) = +2 if ω < 10 deg/s V(b) = -100 if bumper state = 1 Question: Internal or external reward? Target Function Choose action: A: perception action Sonar readings: s1(t) sn(t) <v,ω> Evaluate perception/state: V: s1(t) sn(t) V(s1(t) sn(t)) Problem: states are only partially observable therefore world seems non-deterministic Markov Decision Process : successor state s(t+1) is a probabilistic function of current state s(t) and action a(t) Evaluate state/action pairs: V: s1(t) sn(t), a(t) V(s1(t) sn(t),a(t)) Learning Methods Neural Networks Require direct training experience Reinforcement Learning Indirect training experience Evolutionary Algorithms Indirect training experience Issues in Machine Learning What algorithms can approximate functions well and when? How does the number of training examples influence accuracy? How does the complexity of hypothesis representation impact it? How does noisy data influence accuracy? What are the theoretical limits of learnability? 18