Reinforcement Learning in Continuous Environments

Similar documents
Lecture 10: Reinforcement Learning

Reinforcement Learning by Comparing Immediate Reward

Georgetown University at TREC 2017 Dynamic Domain Track

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Learning Methods for Fuzzy Systems

Improving Action Selection in MDP s via Knowledge Transfer

Axiom 2013 Team Description Paper

TD(λ) and Q-Learning Based Ludo Players

High-level Reinforcement Learning in Strategy Games

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

AI Agent for Ice Hockey Atari 2600

Learning Prospective Robot Behavior

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Speeding Up Reinforcement Learning with Behavior Transfer

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

SARDNET: A Self-Organizing Feature Map for Sequences

Artificial Neural Networks written examination

Regret-based Reward Elicitation for Markov Decision Processes

Laboratorio di Intelligenza Artificiale e Robotica

Evolutive Neural Net Fuzzy Filtering: Basic Description

Laboratorio di Intelligenza Artificiale e Robotica

Adaptive Generation in Dialogue Systems Using Dynamic User Modeling

FF+FPG: Guiding a Policy-Gradient Planner

Improving Fairness in Memory Scheduling

A Reinforcement Learning Variant for Control Scheduling

Lecture 1: Machine Learning Basics

Evolution of Symbolisation in Chimpanzees and Neural Nets

CSL465/603 - Machine Learning

AMULTIAGENT system [1] can be defined as a group of

SAM - Sensors, Actuators and Microcontrollers in Mobile Robots

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

INPE São José dos Campos

2017 Florence, Italty Conference Abstract

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

BMBF Project ROBUKOM: Robust Communication Networks

Seminar - Organic Computing

An OO Framework for building Intelligence and Learning properties in Software Agents

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

An investigation of imitation learning algorithms for structured prediction

Task Completion Transfer Learning for Reward Inference

Python Machine Learning

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Learning to Schedule Straight-Line Code

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

CS Machine Learning

A Case Study: News Classification Based on Term Frequency

Data Fusion Models in WSNs: Comparison and Analysis

Assignment 1: Predicting Amazon Review Ratings

COMPUTER-AIDED DESIGN TOOLS THAT ADAPT

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Rule Learning With Negation: Issues Regarding Effectiveness

Soft Computing based Learning for Cognitive Radio

Case Acquisition Strategies for Case-Based Reasoning in Real-Time Strategy Games

Time series prediction

Lecture 6: Applications

Using focal point learning to improve human machine tacit coordination

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

ProFusion2 Sensor Data Fusion for Multiple Active Safety Applications

An Introduction to Simulation Optimization

Knowledge Transfer in Deep Convolutional Neural Nets

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Modeling user preferences and norms in context-aware systems

Speech Emotion Recognition Using Support Vector Machine

Using dialogue context to improve parsing performance in dialogue systems

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Efficient Use of Space Over Time Deployment of the MoreSpace Tool

Agent-Based Software Engineering

Human Emotion Recognition From Speech

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics

Causal Link Semantics for Narrative Planning Using Numeric Fluents

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

LEARNING TO PLAY IN A DAY: FASTER DEEP REIN-

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

TOKEN-BASED APPROACH FOR SCALABLE TEAM COORDINATION. by Yang Xu PhD of Information Sciences

Computational Data Analysis Techniques In Economics And Finance

XXII BrainStorming Day

Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners

The Strong Minimalist Thesis and Bounded Optimality

PM tutor. Estimate Activity Durations Part 2. Presented by Dipo Tepede, PMP, SSBB, MBA. Empowering Excellence. Powered by POeT Solvers Limited

A Comparison of Annealing Techniques for Academic Course Scheduling

A Reinforcement Learning Approach for Adaptive Single- and Multi-Document Summarization

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Task Completion Transfer Learning for Reward Inference

Multisensor Data Fusion: From Algorithms And Architectural Design To Applications (Devices, Circuits, And Systems)

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

A NEW ALGORITHM FOR GENERATION OF DECISION TREES

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Detecting English-French Cognates Using Orthographic Edit Distance

Go fishing! Responsibility judgments when cooperation breaks down

Transcription:

Reinforcement Learning in Continuous Environments 64.425 Integrated Seminar: Intelligent Robotics Oke Martensen University of Hamburg Faculty of Mathematics, Informatics and Natural Sciences Technical Aspects of Multimodal Systems 30. November 2015 Oke Martensen 1

Outline 1. Reinforcement Learning in a Nutshell Basics of RL Standard Approaches Motivation: The Continuity Problem 2. RL in Continuous Environments Continuous Actor Critic Learning Automaton (CACLA) CACLA in Action 3. RL in Robotics Conclusion Oke Martensen 2

Reinforcement Learning in a Nutshell - Basics of RL Classical Reinforcement Learning Agent := algorithm that learns to interact with the environment. Environment := the world (including actor) Goal: optimize agent s behaviour wrt. a reward signal. Sutton and Barto (1998) Problem as Markov Decision Process (MDP): (S, A, R, T) Oke Martensen 3

Reinforcement Learning in a Nutshell - Basics of RL The General Procedure Policy π := action selection strategy exploration and exploitation trade-off e.g. ɛ-greedy, soft-max,... Different ways to model the environment: value functions V (s), Q(s, a): cumulative discounted reward expected after reaching state s (and after performing action a) Oke Martensen 4

Reinforcement Learning in a Nutshell - Standard Approaches Standard Algorithms Sutton and Barto (1998) Temporal-difference (TD) learning V (s t ) V (s t ) + α[r t+1 + γv (s t+1 ) V (s t )] Numerous algorithms are based on TD learning: SARSA Q-Learning actor-critic methods (details on next slide) Oke Martensen 5

Reinforcement Learning in a Nutshell - Standard Approaches Actor-Critic Models A TD method with separate memory structure to explicitly represent the policy independent of the value function. Actor: policy structure Critic: estimated value function The critic s output, TD error, drives all the learning. computationally cheap action selection biologically more plausible Sutton and Barto (1998) Oke Martensen 6

Reinforcement Learning in a Nutshell - Standard Approaches Why is RL so Cool? it s how humans do sophisticated, hard-to-engineer behaviour can cope with uncertain, noisy, non-observable stuff no need for labels online learning The relationship between [robotics and reinforcement learning] has sufficient promise to be likened to that between physics and mathematics Kober and Peters (2012) Oke Martensen 7

Reinforcement Learning in a Nutshell - Motivation: The Continuity Problem The Continuity Problem So far: discrete action and state spaces. Problem: world ain t discrete. Example: moving on a grid world Continuous state spaces have already been investigated a lot. Continuous action spaces, however, remain a problem. Oke Martensen 8

RL in Continuous Environments Tackling the Continuity Problem 1. Discretize spaces, then use regular RL methods e.g. tile coding: group space into binary features receptive fields But: How fine-grained? Where to put focus? Bad generalization.. 2. Use parameter vector θ t of a function approximator for updates often neural networks are used and the weights as parameters Oke Martensen 9

RL in Continuous Environments - Continuous Actor Critic Learning Automaton (CACLA) CACLA Continuous Actor Critic Learning Automaton Van Hasselt and Wiering (2007) learns undiscretized continuous actions in continuous states model-free computes updates and actions very fast easy to implement (cf. pseudocode next slide) Oke Martensen 10

RL in Continuous Environments - Continuous Actor Critic Learning Automaton (CACLA) CACLA Algorithm θ: parameter vector ψ: feature vector Van Hasselt (2011) Oke Martensen 11

RL in Continuous Environments - CACLA in Action A bio-inspired model of predictive sensorimotor integration Zhong et al. (2012) Latencies in sensory processing make it hard to do real time robotics; noisy, inaccurate readings may cause failure. 1. Elman network for sensory prediction/filtering 2. CACLA for continuous action generation Elman (1990) Zhong et al. (2012) Oke Martensen 12

RL in Continuous Environments - CACLA in Action Robot Docking & Grasping Behaviour Zhong et al. (2012) Zhong et al. (2012) https://www.youtube.com/watch?v=vf7u18h5ioy more natural and smooth behaviour flexible wrt. changes in the action space Oke Martensen 13

RL in Robotics - Conclusion Conclusion Challenges: problems with high-dimensional/continuous states and actions only partially observable, noisy environment uncertainty (e.g. Which state am I actually in?) hardware/physical system: tedious, time-intensive, costly data generation reproducibility Solution approaches: partially observable Markov decision processes (POMDPs) use of filters: raw observations + uncertainty in estimates Oke Martensen 14

Thanks for your attention! Questions? Oke Martensen 15

References Elman, J. L. (1990). Finding structure in time. Cognitive science, 14(2):179 211. Kober, J. and Peters, J. (2012). Reinforcement Learning in Robotics: A Survey. In Wiering, M. and van Otterlo, M., editors, Reinforcement Learning, volume 12, pages 579 610. Springer Berlin Heidelberg, Berlin, Heidelberg. Sutton, R. S. and Barto, A. G. (1998). Reinforcement learning: An introduction, volume 1. MIT press Cambridge. Van Hasselt, H. and Wiering, M. (2007). Reinforcement learning in continuous action spaces. In Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on, pages 272 279. IEEE. Van Hasselt, H. P. (2011). Insights in reinforcement learning. Hado Van Hasselt. Zhong, J., Weber, C., and Wermter, S. (2012). A predictive network architecture for a robust and smooth robot docking behavior. Paladyn, Journal of Behavioral Robotics, 3(4):172 180. Oke Martensen 16