CS148 - Building Intelligent Robots Lecture 6: Learning for Robotics. Instructor: Chad Jenkins (cjenkins)

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Laboratorio di Intelligenza Artificiale e Robotica

Lecture 10: Reinforcement Learning

Laboratorio di Intelligenza Artificiale e Robotica

CSL465/603 - Machine Learning

Reinforcement Learning by Comparing Immediate Reward

Lecture 1: Machine Learning Basics

Axiom 2013 Team Description Paper

Python Machine Learning

Lecture 1: Basic Concepts of Machine Learning

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Artificial Neural Networks written examination

Learning Methods for Fuzzy Systems

A Case Study: News Classification Based on Term Frequency

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

(Sub)Gradient Descent

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners

Evolution of Symbolisation in Chimpanzees and Neural Nets

Word Segmentation of Off-line Handwritten Documents

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Human Emotion Recognition From Speech

Regret-based Reward Elicitation for Markov Decision Processes

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

Probabilistic Latent Semantic Analysis

TD(λ) and Q-Learning Based Ludo Players

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Learning and Transferring Relational Instance-Based Policies

CS Machine Learning

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Learning Prospective Robot Behavior

AMULTIAGENT system [1] can be defined as a group of

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Georgetown University at TREC 2017 Dynamic Domain Track

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Knowledge-Based - Systems

An investigation of imitation learning algorithms for structured prediction

Speech Recognition at ICSI: Broadcast News and beyond

Speeding Up Reinforcement Learning with Behavior Transfer

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

A Reinforcement Learning Variant for Control Scheduling

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Rule Learning With Negation: Issues Regarding Effectiveness

Linking Task: Identifying authors and book titles in verbose queries

AC : DESIGNING AN UNDERGRADUATE ROBOTICS ENGINEERING CURRICULUM: UNIFIED ROBOTICS I AND II

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Seminar - Organic Computing

INPE São José dos Campos

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Predicting Future User Actions by Observing Unmodified Applications

Mining Association Rules in Student s Assessment Data

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

A Case-Based Approach To Imitation Learning in Robotic Agents

Knowledge Transfer in Deep Convolutional Neural Nets

Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Evolutive Neural Net Fuzzy Filtering: Basic Description

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Calibration of Confidence Measures in Speech Recognition

Learning Methods in Multilingual Speech Recognition

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Causal Link Semantics for Narrative Planning Using Numeric Fluents

Self Study Report Computer Science

CSC200: Lecture 4. Allan Borodin

Visual CP Representation of Knowledge

Adaptive Generation in Dialogue Systems Using Dynamic User Modeling

Introduction to the Practice of Statistics

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Software Maintenance

Abstractions and the Brain

Lecture 6: Applications

Discriminative Learning of Beam-Search Heuristics for Planning

Speech Emotion Recognition Using Support Vector Machine

Corrective Feedback and Persistent Learning for Information Extraction

Case Acquisition Strategies for Case-Based Reasoning in Real-Time Strategy Games

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

Team Formation for Generalized Tasks in Expertise Social Networks

An Online Handwriting Recognition System For Turkish

Semi-Supervised Face Detection

Clouds = Heavy Sidewalk = Wet. davinci V2.1 alpha3

Comparison of network inference packages and methods for multiple networks inference

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

FF+FPG: Guiding a Policy-Gradient Planner

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

Transfer Learning Action Models by Measuring the Similarity of Different Domains

A study of speaker adaptation for DNN-based speech synthesis

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Switchboard Language Model Improvement with Conversational Data from Gigaword

Using focal point learning to improve human machine tacit coordination

Transcription:

Lecture 6 Robot Learning Slide 1 CS148 - Building Intelligent Robots Lecture 6: Learning for Robotics Instructor: Chad Jenkins (cjenkins)

Lecture 6 Robot Learning Slide 2 Administrivia: good news No class next Tuesday 10/12 you can show up, but I will not be here Rudy, you are like a robotics teacher out of the country Yeah, no class! A robotics teacher out of the country?

Lecture 6 Robot Learning Slide 3 Administrivia: bad news Someone left the Lego lab open and unattended yesterday!!! This is a huge problem and can lead to disaster for the class if the kits were to disappear, how would you implement the labs and projects This situation must be taken seriously thus, I will deduct a 1% from the final grade of ALL students in the standard track if lab is left open and unattended again next infraction will be 2%, then 4%, 8%,...

Lecture 6 Robot Learning Slide 4 Machine learning (from Wikipedia) Machine learning is an area of artificial intelligence involving developing techniques to allow computers to "learn". More specifically, machine learning is a method for creating computer programs by the analysis of data sets, rather than the intuition of engineers. Machine learning overlaps heavily with statistics, since both fields study the analysis of data. Applications: medical diagnosis, detecting credit card fraud, stock market analysis, classifying DNA sequences, speech and handwriting recognition, game playing and robot locomotion.

Lecture 6 Robot Learning Slide 5 Machine learning taxonomy Machine learning groups into the following categories supervised learning: an algorithm generates a function that maps inputs to desired outputs given data for x and y, find f(x) = y classification, regression unsupervised learning: an algorithm generates a model for a set of inputs given x, find models underlying x feature extraction, density estimation reinforcement learning: an algorithm learns a policy of how to act given an observation of the world find a policy u such that expected outcomes o = u(x,actions) learning to learn: an algorithm learns its own inductive bias based on previous experience.

Lecture 6 Robot Learning Slide 8 Supervised learning: regression Ask N students: x: # of CS classes taken y: typical Mountain Dew consumption Daily consumption of Mountain Dew Supervised problem: function of MD consump. w.r.t. CS background f(x) = y Number of CS classes taken

Lecture 6 Robot Learning Slide 9 Supervised learning: regression Ask N students: x: # of CS classes taken y: typical Mountain Dew consumption Daily consumption of Mountain Dew outlier (cjenkins) Supervised problem: function of MD consump. w.r.t. CS background f(x) = y Number of CS classes taken Linear regression fit a line: f(x) = ax + b = y

Lecture 6 Robot Learning Slide 10 Unsupervised learning: dimension reduction Ask N students: x1: # of CS classes taken x2: typical Mountain Dew consumption Daily consumption of Mountain Dew Unsupervised problem: find underlying coordinate system Principal Components Analysis find linear system that best expresses data Newbie Number of CS classes taken Hacker

Lecture 6 Robot Learning Slide 12 Examples for robotics Inverse dynamics f(desired states) = control commands collect control commands and states from robot teleoperation Inverse kinematics f(endeffector position) = joint angles

Lecture 6 Robot Learning Slide 13 Unsupervised learning: clustering Ask N CS students: x1: # of systems classes taken x2: # of AI classes taken x3: # of theory classes taken Systems Theory Unsupervised problem: find categories of students sets of students C1, C2, etc. AI

Lecture 6 Robot Learning Slide 14 Unsupervised learning: clustering Ask N CS students: x1: # of systems classes taken x2: # of AI classes taken x3: # of theory classes taken 3 dimensional data Systems Theory Unsupervised problem: find categories of students sets of students C1, C2, etc. Clustering estimates cluster associations AI K-means clustering assume K clusters with initial locations find cluster nearest to each point move cluster to centroid

Lecture 6 Robot Learning Slide 15 Supervised learning: classification From clustering we know: x: classes taken y: category (AI, systems,...) Systems Theory AI

Lecture 6 Robot Learning Slide 16 Supervised learning: classification From clustering we know: x: classes taken y: category (AI, systems,...) Find f(x) = y decision boundaries

Lecture 6 Robot Learning Slide 17 Supervised learning: classification From clustering we know: x: classes taken y: category (AI, systems,...) Find f(x) = y decision boundaries??? Classify new point x_new

Lecture 6 Robot Learning Slide 18 Supervised learning: classification From clustering we know: x: classes taken y: category (AI, systems,...) Find f(x) = y decision boundaries AI Classify new point x_new using decision boundaries

Lecture 6 Robot Learning Slide 19 Examples for robotics Behavior arbitration f(sensor readings) = behavior selection Landmarking for robot navigation f(sensor readings) = landmark category Neural navigation of mobile robots f(brain readings) = controller states

Lecture 6 Robot Learning Slide 20 Reinforcement learning (from Wikipedia) A class of problems in machine learning which postulate an agent exploring an environment in which the agent perceives its current state and takes actions The environment, in return, provides a reward (which can be positive or negative). Reinforcement learning algorithms attempt to find a policy for maximizing cumulative reward for the agent over the course of the problem.

Lecture 6 Robot Learning Slide 21 Reinforcement learning (from Wikipedia) RL differs from supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. RL focuses on on-line performance balance between exploration (of uncharted territory) exploitation (of current knowledge).

Lecture 6 Robot Learning Slide 22 Formal RL model A RL model consists of a discrete set of S states models describing the robot s environment a discrete set of A actions actions the robot can take to change state a set of scalar reinforcement signals R functions evaluating short-term and long-term reward a robot control policy P given state s at time t, selects action a to maximize rewards r what we are trying to learn

Lecture 6 Robot Learning Slide 23 Formal RL model Does anyone see a problem with this? A RL model consists of a discrete set of S states models describing the robot s environment a discrete set of A actions actions the robot can take to change state a set of scalar reinforcement signals R functions evaluating short-term and long-term reward a robot control policy P given state s at time t, selects action a to maximize rewards r what we are trying to learn

Lecture 6 Robot Learning Slide 24 Issues for reinforcement learning Estimation of states and state transitions Partial observability robot observes noisy or incomplete information about the world Discretization of states make assumptions or use domain knowledge Discretization of actions/behaviors hand coded robot controllers or learn them automatically (this is my research)

Lecture 6 Robot Learning Slide 25 Approaches to reinforcement learning Find policies as the utility or value of actions with respect to outcomes Two general approaches to learning policies Search search over the space of actions to find their utility techniques: breadth-first, depth-first, genetic algorithms Statistical modeling probabilistically model the utility of taking actions use statistical techniques with dynamic programming techniques: Markov Decision Processes

Lecture 6 Robot Learning Slide 26 Genetic algorithm procedure Randomly generate DNA of an initial population M(0) an individual has a genotype that encodes a control policy Compute and save the fitness u(m) for each individual m in the current population M(t) users defines the fitness function Define selection probabilities p(m) for each individual m in M(t) so that p(m) is proportional to u(m) Generate new population M(t+1) by probabilistically selecting individuals from M(t) to produce offspring genetic operators: crossover, mutation,... # Repeat step 2 until satisfying solution is obtained.

Lecture 6 Robot Learning Slide 27 Constraint optimization Genetic algorithms are related to constraint optimization Constraint optimization consists of an objective function to be minimized (fitness function) a set of constraint functions to be maintained

Lecture 6 Robot Learning Slide 28 Markov Decision Processes (MDPs) a set of states S a set of actions A a function of expected reward R(s,a) -> real numbers a state transition function T(s,a) -> Π(S) a member of Π(S) is a probability distribution over the set S Π(S) maps states to probabilities T(s,a,s ) is the probability of making a transition from state s to state s using action a.

Lecture 6 Robot Learning Slide 29 The Markov Property A system is Markovian if the state transitions are independent of previous state transitions or agent actions The Markov property allows for future states to be estimated using only the current state The past and the future are independent given the present This Markov will be hitting the ground regardless of previous situations or actions

Lecture 6 Robot Learning Slide 30 Partially Observable MDPs (POMDPs) Robots rarely have complete information A robot can only estimate the current state of the environment state estimation for robot belief b Incorporate into MDP finite set of observations I the probability of observing w and ending in state s after taking action a observation probability O(s,a,w)

Lecture 6 Robot Learning Slide 31 Hidden Markov Models (HMMs)

Lecture 6 Robot Learning Slide 32 Petri-nets

Lecture 6 Robot Learning Slide 33 State estimation: localization Estimate the distribution of probable robot locations Each particle is a hypothesis of a probable robot location By navigating the world, impossible hypotheses are eliminated Over time, the particle distribution indentifies robot location Fox et al.

Lecture 6 Robot Learning Slide 34 Particle filtering Condensation Distribution as particles particle = hypothesis Evaluate distribution through observation on particles

Lecture 6 Robot Learning Slide 35 Mapping Represent environment as a distribution Estimate the probability of a position of the world being occupied Thrun et al. From AAAI94

Lecture 6 Robot Learning Slide 36 Learning from demonstration Humans and the natural world are working models of control and policy learning Leverage human tutelage and/or performance to build robot controllers

Lecture 6 Robot Learning Slide 37 Probabilistic road maps: learning phase Build map of valid configurations start with an initial configuration Space of valid configurations Space of invalid configurations A robot configuration Boundary of valid configurations Configuration space C = [Θ 1, Θ 2,... Θ N ] [Kavraki, Svetska, Latombe,Overmars,, 95]

Lecture 6 Robot Learning Slide 38 Probabilistic road maps: learning phase Build map of valid configurations Sample neighbors of current config [Kavraki, Svetska, Latombe,Overmars,, 95]

Lecture 6 Robot Learning Slide 39 Probabilistic road maps: learning phase Build map of valid configurations Sample neighbors of current config Determine valid neighbors Invalid Valid [Kavraki, Svetska, Latombe,Overmars,, 95]

Lecture 6 Robot Learning Slide 40 Probabilistic road maps: learning phase Build map of valid configurations Sample neighbors of current config Determine valid neighbors remove invalid place edge transitions between valid neighbors Valid [Kavraki, Svetska, Latombe,Overmars,, 95]

Lecture 6 Robot Learning Slide 41 Probabilistic road maps: learning phase Build map of valid configurations Sample neighbors of current config Determine valid neighbors Continue exploration from valid neighbors [Kavraki, Svetska, Latombe,Overmars,, 95]

Lecture 6 Robot Learning Slide 42 Probabilistic road maps: query phase Given learned map Find a valid control path between two configurations Search on an undirected graph [Kavraki, Svetska, Latombe,Overmars,, 95]

Lecture 6 Robot Learning Slide 45 Additional references Duda and Hart, Pattern Classification Bishop, Neural Networks for Pattern Recognition L. Kaelbling, M. Littman, A. Moore, Reinforcement Learning: A Survey Journal of Artificial Intelligence Research 4 (1996) pp. 237 285. Sutton and Barto, Reinforcement Learning. MIT Press, 1998 S. Thrun, Is Robotics Going Statistics? The Field of Probabilistic Robotics, CACM, 2001. M. Isard, A. Blake, CONDENSATION conditional density propagation for visual tracking, 1998.

Lecture 6 Robot Learning Slide 46 Additional references L. Kavraki, P. Svestka, J. Latombe, M. Overmars, Probabilistic Roadmaps for Path Planning in High- Dimensional Configuration Spaces, IEEE Transactions on Robotics and Automation, 12(4):566-580, 1996 Read my papers (I command you... Muhuwahaha) O. Jenkins, M. Mataric, Performance-Derived Behavior Vocabularies: Deriving Skills from Motion, Internation Journal of Humanoid Robotics, 2004.