Machine Learning 2D5362

Similar documents
Laboratorio di Intelligenza Artificiale e Robotica

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Laboratorio di Intelligenza Artificiale e Robotica

Lecture 1: Machine Learning Basics

(Sub)Gradient Descent

Axiom 2013 Team Description Paper

Artificial Neural Networks written examination

Python Machine Learning

Lecture 1: Basic Concepts of Machine Learning

Lecture 10: Reinforcement Learning

TD(λ) and Q-Learning Based Ludo Players

Reinforcement Learning by Comparing Immediate Reward

SAM - Sensors, Actuators and Microcontrollers in Mobile Robots

CSL465/603 - Machine Learning

CS Machine Learning

An OO Framework for building Intelligence and Learning properties in Software Agents

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Evolutive Neural Net Fuzzy Filtering: Basic Description

LEGO MINDSTORMS Education EV3 Coding Activities

Learning Methods for Fuzzy Systems

Evolution of Symbolisation in Chimpanzees and Neural Nets

Major Milestones, Team Activities, and Individual Deliverables

Softprop: Softmax Neural Network Backpropagation Learning

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Seminar - Organic Computing

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

MYCIN. The MYCIN Task

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

ECE-492 SENIOR ADVANCED DESIGN PROJECT

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Welcome to. ECML/PKDD 2004 Community meeting

On the Combined Behavior of Autonomous Resource Management Agents

Knowledge-Based - Systems

Exploration. CS : Deep Reinforcement Learning Sergey Levine

B. How to write a research paper

Human Emotion Recognition From Speech

Australian Journal of Basic and Applied Sciences

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

While you are waiting... socrative.com, room number SIMLANG2016

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Chapter 2. Intelligent Agents. Outline. Agents and environments. Rationality. PEAS (Performance measure, Environment, Actuators, Sensors)

Probability and Statistics Curriculum Pacing Guide

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Speech Recognition at ICSI: Broadcast News and beyond

The Strong Minimalist Thesis and Bounded Optimality

A Case Study: News Classification Based on Term Frequency

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Green Belt Curriculum (This workshop can also be conducted on-site, subject to price change and number of participants)

Winter School, February 1 to 5, 2016 Schedule. Ronald Schlegel, December 10, 2015

Using dialogue context to improve parsing performance in dialogue systems

Learning From the Past with Experiment Databases

Firms and Markets Saturdays Summer I 2014

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Generative models and adversarial training

Rule Learning With Negation: Issues Regarding Effectiveness

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

SARDNET: A Self-Organizing Feature Map for Sequences

Robot Shaping: Developing Autonomous Agents through Learning*

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Lesson plan for Maze Game 1: Using vector representations to move through a maze Time for activity: homework for 20 minutes

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Lecture 6: Applications

Multisensor Data Fusion: From Algorithms And Architectural Design To Applications (Devices, Circuits, And Systems)

Intelligent Agents. Chapter 2. Chapter 2 1

Student Perceptions of Reflective Learning Activities

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Short vs. Extended Answer Questions in Computer Science Exams

Self Study Report Computer Science

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

XXII BrainStorming Day

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Improving Action Selection in MDP s via Knowledge Transfer

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Multidisciplinary Engineering Systems 2 nd and 3rd Year College-Wide Courses

Time series prediction

Speeding Up Reinforcement Learning with Behavior Transfer

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

Using focal point learning to improve human machine tacit coordination

arxiv: v1 [cs.cl] 2 Apr 2017

Calibration of Confidence Measures in Speech Recognition

Modeling user preferences and norms in context-aware systems

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Word Segmentation of Off-line Handwritten Documents

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

Comparison of network inference packages and methods for multiple networks inference

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

BLENDED LEARNING IN ACADEMIA: SUGGESTIONS FOR KEY STAKEHOLDERS. Jeff Rooks, University of West Georgia. Thomas W. Gainey, University of West Georgia

Transcription:

Machine Learning 2D5362 Lecture 1: Introduction to Machine Learning Machine Learning Date/Time: Tuesday??? Thursday 13.30 Location: BB2? Course requirements: active participation homework assignments course project Credits: 3-5 credits depending on course project Course webpage: http://www.nada.kth.se/~hoffmann/ml.html 1

Course Material Textbook (recommended): Machine Learning Tom M. Mitchell, McGraw Hill,1997 ISBN: 0-07-042807-7 (available as paperback) Further readings: An Introduction to Genetic Algorithms Melanie Mitchell, MIT Press, 1996 Reinforcement Learning An Introduction Richard Sutton, MIT Press, 1998 Selected publications: check course webpage Course Overview Introduction to machine learning Concept learners Decision tree learning Neural networks Evolutionary algorithms Instance based learning Reinforcement learning Machine learning in robotics 2

Software Packages & Datasets MLC++ Machine learning library in C++ http://www.sig.com/technology/mlc GALIB MIT GALib in C++ http://lancet.mit.edu/ga UCI Machine Learning Data Repository UC Irvine http://www.ics.uci.edu/~mlearn/ml/repository.html Possible Course Projects Apply machine learning techniques to your own problem e.g. classification, clustering, data modeling, object recognition Investigating combining multiple classifiers Comparing different approaches in genetic fuzzy systems Learning robotic behaviors using evolutionary techniques or reinforcement learning LEGO Mindstorm Scout 3

Scout Robots 16 Sonar sensors Laser range scanner Odometry Differential drive Simulator API in C LEGO Mindstorms Touch sensor Light sensor Rotation sensor Video cam Motors 4

Learning & Adaptation Modification of a behavioral tendency by expertise. (Webster 1984) A learning machine, broadly defined is any device whose actions are influenced by past experiences. (Nilsson 1965) Any change in a system that allows it to perform better the second time on repetition of the same task or on another task drawn from the same population. (Simon 1983) An improvement in information processing ability that results from information processing activity. (Tanimoto 1990) Learning Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience. 5

Disciplines relevant to ML Artificial intelligence Bayesian methods Control theory Information theory Computational complexity theory Philosophy Psychology and neurobiology Statistics Applications of ML Learning to recognize spoken words SPHINX (Lee 1989) Learning to drive an autonomous vehicle ALVINN (Pomerleau 1989) Learning to classify celestial objects (Fayyad et al 1995) Learning to play world-class backgammon TD-GAMMON (Tesauro 1992) Designing the morphology and control structure of electro-mechanical artefacts GOLEM (Lipton, Pollock 2000) 6

Artificial Life GOLEM Project (Nature: Lipson, Pollack 2000) http://golem03.cs-i.brandeis.edu/index.html Evolve simple electromechanical locomotion machines from basic building blocks (bars, acuators, artificial neurons) in a simulation of the physical world (gravity, friction). The individuals that demonstrate the best locomotion ability are fabricated through rapid prototyping technology. Evolvable Robot 7

Arrow Ratchet 8

Tetra Evolved Creatures Evolved creatures: Sims (1994) http://genarts.com/karl/evolved-virtual-creatures.html Darwinian evolution of virtual block creatures for swimming, jumping, following, competing for a block 9

Learning Problem Learning: improving with experience at some task Improve over task T With respect to performance measure P Based on experience E Example: Learn to play checkers: T: play checkers P: percentage of games won in a tournament E: opportunity to play against itself Learning to play checkers T: play checkers P: percentage of games won What experience? What exactly should be learned? How shall it be represented? What specific algorithm to learn it? 10

Type of Training Experience Direct or indirect? Direct: board state -> correct move Indirect: outcome of a complete game Credit assignment problem Teacher or not? Teacher selects board states Learner can select board states Is training experience representative of performance goal? Training playing against itself Performance evaluated playing against world champion Choose Target Function ChooseMove : B? M : board state? move Maps a legal board state to a legal move Evaluate : B? V : board state? board value Assigns a numerical score to any given board state, such that better board states obtain a higher score Select the best move by evaluating all successor states of legal moves and pick the one with the maximal score 11

Possible Definition of Target Function If b is a final board state that is won then V(b) = 100 If b is a final board state that is lost then V(b) = -100 If b is a final board state that is drawn then V(b)=0 If b is not a final board state, then V(b)=V(b ), where b is the best final board state that can be achieved starting from b and playing optimally until the end of the game. Gives correct values but is not operational State Space Search V(b)=? V(b)= max i V(b i ) m 1 : b? b 1 m 2 : b? b 2 m 3 : b? b 3 12

State Space Search V(b 1 )=? V(b 1 )= min i V(b i ) m 4 : b? b 4 m 5 : b? b 5 m 6 : b? b 6 Final Board States Black wins: V(b)=-100 Red wins: V(b)=100 draw: V(b)=0 13

Depth-First Search Breadth-First Search 14

Number of Board States Tic-Tac-Toe: #board states < 9!/(5! 4!) + 9!/(1! 4! 4!) + + 9!/(2! 4! 3!) + 9 = 6045 4 x 4 checkers: (no queens) #board states =? #board states < 8x7x6x5*2 2 /(2!*2!) = 1680 Regular checkers (8x8 board, 8 pieces each) #board states < 32!*2 16 /(8! * 8! * 16!) = 5.07*10 17 Choose Representation of Target Function Table look-up Collection of rules Neural networks Polynomial function of board features Trade-off in choosing an expressive representation: Approximation accuracy Number of training examples to learn the target function 15

Representation of Target Function V(b)=? 0 +? 1 bp(b) +? 2 rp(b) +? 3 bk(b) +? 4 rk(b) +? 5 bt(b) +? 6 rt(b) bp(b): #black pieces rb(b): #red pieces bk(b): #black kings rk(b): #red kings bt(b): #red pieces threatened by black rt(b): #black pieces threatened by red Obtaining Training Examples V(b) : true target function V (b) : learned target function V train (b) : training value Rule for estimating training values: V train (b)? V (Successor(b)) 16

Choose Weight Training Rule LMS weight update rule: Select a training example b at random 1. Compute error(b) error(b) = V train (b) V (b) 2. For each board feature fi, update weight? i?? i +? f i error(b)? : learning rate approx. 0.1 Example: 4x4 checkers V(b)=? 0 +? 1 rp(b) +? 2 bp(b) Initial weights:? 0 =-10,? 1 =75,? 2 =-60 V(b 0 )=? 0 +? 1 *2 +? 2 *2 = 20 m 1 : b? b 1 V(b 1 )=20 m 2 : b? b 2 V(b 2 )=20 m 3 : b? b 3 V(b 3 )=20 17

Example 4x4 checkers V(b 0 )=20 V(b 1 )=20 1. Compute error(b 0 ) = V train (b) V(b 0 ) = V(b 1 ) V(b 0 ) = 0 2. For each board feature fi, update weight? i?? i +? f i error(b)? 0?? 0 + 0.1 * 1 * 0? 1?? 1 + 0.1 * 2 * 0? 2?? 2 + 0.1 * 2 * 0 Example: 4x4 checkers V(b 0 )=20 V(b 1 )=20 V(b 3 )=20 V(b 2 )=20 18

Example: 4x4 checkers V(b 3 )=20 V(b 4a )=20 V(b 4b )=-55 Example 4x4 checkers V(b 3 )=20 V(b 4 )=-55 1. Compute error(b 3 ) = V train (b) V(b 3 ) = V(b 4 ) V(b 3 ) = -75 2. For each board feature fi, update weight? i?? i +? f i error(b) :? 0 =-10,? 1 =75,? 2 =-60? 0?? 0-0.1 * 1 * 75,? 0 = -17.5? 1?? 1-0.1 * 2 * 75,? 1 = 60? 2?? 2-0.1 * 2 * 75,? 2 = -75 19

Example: 4x4 checkers? 0 = -17.5,? 1 = 60,? 2 = -75 V(b 4 )=-107.5 V(b 5 )=-107.5 Example 4x4 checkers V(b 5 )=-107.5 V(b 6 )=-167.5 error(b 5 ) = V train (b) V(b 5 ) = V(b 6 ) V(b 5 ) = -60? 0 =-17.5,? 1 =60,? 2 =-75? i?? i +? f i error(b)? 0?? 0-0.1 * 1 * 60,? 0 = -23.5? 1?? 1-0.1 * 1 * 60,? 1 = 54? 2?? 2-0.1 * 2 * 60,? 2 = -87 20

Example 4x4 checkers Final board state: black won V f (b)=-100 V(b 6 )=-197.5 error(b 6 ) = V train (b) V(b 6 ) = V f (b 6 ) V(b 6 ) = 97.5? 0 =-23.5,? 1 =54,? 2 =-87? i?? i +? f i error(b)? 0?? 0 + 0.1 * 1 * 97.5,? 0 = 13.75? 1?? 1 + 0.1 * 0 * 97.5,? 1 = 54? 2?? 2 + 0.1 * 2 * 97.5,? 2 = -67.5 Evolution of Value Function Training data: before after 21

Design Choices Games against experts Board? Move polynomial Determine Type of Training Experience Games against self Determine Target Function Board? Value Determine Representation of Learned Function Linear function of Determine Learning Algorithm Gradient descent six features Table of correct moves Linear programming Artificial neural network Learning Problem Examples Credit card applications Task T: Distinguish good applicants from risky applicants. Performance measure P :? Experience E :? (direct/indirect) Target function :? 22

Performance Measure P: Error based: minimize percentage of incorrectly classified customers : P = N fp + N fn / N N fp : # false positives (rejected good customers) N fn : # false negatives (accepted bad customers) Utility based: maximize expected profit of credit card business: P = N cp *U cp + N fn *U fn U cp : expected utility of an accepted good customer U fn : expected utility/loss of an accepted bad customer Experience E: Direct: Decisions on credit card applications made by a human financial expert Training data: <customer inf., reject/accept> Direct: Actual customer behavior based on previously accepted customers Training data: <customer inf., good/bad> Problem: Distribution of applicants P applicant is not identical with training data P train Indirect: Evaluate a decision policy based on the profit you made over the past N years. 23

Distribution of Applicants Good customers Bad customers Cw=38 Assume we want to minimize classification error: What is the optimal decision boundary? Distribution of Accepted Customers Good customers Bad customers Cw=43 What is the optimal decision boundary? 24

Target Function Customer record: income, owns house, credit history, age, employed, accept $40000, yes, good, 38, full-time, yes $25000, no, excellent, 25, part-time, no $50000, no, poor, 55, unemployed, no T: Customer data? accept/reject T: Customer data? probability good customer T: Customer data? expected utility/profit Learning methods Decision rules: If income < $30.000 then reject Bayesian network: P(good income, credit history,.) Neural Network: Nearest Neighbor: Take the same decision as for the customer in the data base that is most similar to the applicant 25

Learning Problem Examples Obstacle Avoidance Behavior of a Mobile Robot Task T: Navigate robot safely through an environment. Performance measure P :? Experience E :? Target function :? Performance Measure P: P: Maximize time until collision with obstacle P: Maximize distance travelled until collision with obstacle P: Minimize rotational velocity, maximize translational velocity P: Minimize error between control action of a human operator and robot controller in the same situation 26

Training Experience E: Direct: Monitor human operator and use her control actions as training data: E = { <perception i, action i >} Indirect: Operate robot in the real world or in a simulation. Reward desirable states, penalize undesirable states V(b) = +1 if v > 0.5 m/s V(b) = +2 if? < 10 deg/s V(b) = -100 if bumper state = 1 Question: Internal or external reward? Target Function Choose action: A: perception? action Sonar readings: s1(t) sn(t)? <v,? > Evaluate perception/state: V: s1(t) sn(t)? V(s1(t) sn(t)) Problem: states are only partially observable therefore world seems non-deterministic Markov Decision Process : successor state s(t+1) is a probabilistic function of current state s(t) and action a(t) Evaluate state/action pairs: V: s1(t) sn(t), a(t)? V(s1(t) sn(t),a(t)) 27

Learning Methods Neural Networks Require direct training experience Reinforcement Learning Indirect training experience Evolutionary Algorithms Indirect training experience Evolutionary Algorithms mutation population of genotypes 10111 10011 01001 10001 00111 11001 01011 f phenotype space coding scheme recombination selection 10011 10001 011 001 10001 01001 01011 0011 10001 11001 01011 fitness x 28

Evolution of Simple Navigation Issues in Machine Learning What algorithms can approximate functions well and when? How does the number of training examples influence accuracy? How does the complexity of hypothesis representation impact it? How does noisy data influence accuracy? What are the theoretical limits of learnability? 29

Machine vs. Robot Learning Machine Learning Learning in vaccum Statistically well-behaved data Mostly off-line Informative feed-back Computational time not an issue Hardware does not matter Convergence proof Robot Learning Embedded learning Data distribution not homegeneous Mostly on-line Qualitative and sparse feed-back Time is crucial Hardware is a priority Empirical proof Learning in Robotics behavioral adaptation: adjust the parameters of individual behaviors according to some direct feedback signal (e.g. adaptive control) evolutionary adaptation: application of artificial evolution to robotic systems sensor adaptation: adopt the perceptual system to the environment (e.g. classification of different contexts, recognition) learning complex, deliberative behaviors: unsupervised learning based on sparse feedback from the environment, credit assignment problem (e.g. reinforcement learning) 30