Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Similar documents
Exploration. CS : Deep Reinforcement Learning Sergey Levine

Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task

Lecture 1: Machine Learning Basics

(Sub)Gradient Descent

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

arxiv: v2 [cs.ro] 3 Mar 2017

Georgetown University at TREC 2017 Dynamic Domain Track

arxiv: v1 [cs.lg] 8 Mar 2017

Artificial Neural Networks written examination

LEARNING TO PLAY IN A DAY: FASTER DEEP REIN-

Reinforcement Learning by Comparing Immediate Reward

Axiom 2013 Team Description Paper

An Introduction to Simulation Optimization

CSL465/603 - Machine Learning

Python Machine Learning

AI Agent for Ice Hockey Atari 2600

Laboratorio di Intelligenza Artificiale e Robotica

Learning to Schedule Straight-Line Code

FF+FPG: Guiding a Policy-Gradient Planner

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Lecture 10: Reinforcement Learning

arxiv: v1 [cs.dc] 19 May 2017

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

TD(λ) and Q-Learning Based Ludo Players

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

A Comparison of Annealing Techniques for Academic Course Scheduling

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Major Milestones, Team Activities, and Individual Deliverables

Introduction to Simulation

Generative models and adversarial training

Learning From the Past with Experiment Databases

Lecture 2: Quantifiers and Approximation

Davidson College Library Strategic Plan

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Speeding Up Reinforcement Learning with Behavior Transfer

AMULTIAGENT system [1] can be defined as a group of

Seminar - Organic Computing

WHEN THERE IS A mismatch between the acoustic

Laboratorio di Intelligenza Artificiale e Robotica

Individual Differences & Item Effects: How to test them, & how to test them well

Whole School Literacy Policy 2017/18

The Relationship Between Poverty and Achievement in Maine Public Schools and a Path Forward

An Introduction to Simio for Beginners

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

On-the-Fly Customization of Automated Essay Scoring

An investigation of imitation learning algorithms for structured prediction

Detailed course syllabus

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Task Completion Transfer Learning for Reward Inference

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

arxiv: v1 [cs.lg] 15 Jun 2015

Economics at UCD. Professor Karl Whelan Presentation at Open Evening January 17, 2017

Shockwheat. Statistics 1, Activity 1

A Metacognitive Approach to Support Heuristic Solution of Mathematical Problems

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

9.85 Cognition in Infancy and Early Childhood. Lecture 7: Number

Task Completion Transfer Learning for Reward Inference

An empirical study of learning speed in backpropagation

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Knowledge Transfer in Deep Convolutional Neural Nets

Top US Tech Talent for the Top China Tech Company

The Round Earth Project. Collaborative VR for Elementary School Kids

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Improving Fairness in Memory Scheduling

Process improvement, The Agile Way! By Ben Linders Published in Methods and Tools, winter

Australian Journal of Basic and Applied Sciences

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Corpus Linguistics (L615)

Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Softprop: Softmax Neural Network Backpropagation Learning

AUTHOR COPY. Techniques for cold-starting context-aware mobile recommender systems for tourism

Evolutive Neural Net Fuzzy Filtering: Basic Description

Active Ingredients of Instructional Coaching Results from a qualitative strand embedded in a randomized control trial

PGCE Secondary Education. Primary School Experience

Dialog-based Language Learning

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

MADERA SCIENCE FAIR 2013 Grades 4 th 6 th Project due date: Tuesday, April 9, 8:15 am Parent Night: Tuesday, April 16, 6:00 8:00 pm

A Reinforcement Learning Variant for Control Scheduling

arxiv: v1 [cs.cv] 10 May 2017

Action Learning Facilitator Accreditation

INTERMEDIATE ALGEBRA PRODUCT GUIDE

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

Model Ensemble for Click Prediction in Bing Search Ads

CS Machine Learning

MGT/MGP/MGB 261: Investment Analysis

Coaching Others for Top Performance 16 Hour Workshop

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

How to Do Research. Jeff Chase Duke University

The Oregon Literacy Framework of September 2009 as it Applies to grades K-3

arxiv: v2 [cs.ir] 22 Aug 2016

Extending Learning Across Time & Space: The Power of Generalization

University of Victoria School of Exercise Science, Physical and Health Education EPHE 245 MOTOR LEARNING. Calendar Description Units: 1.

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Inside the mind of a learner

Transcription:

Challenges in Deep Reinforcement Learning Sergey Levine UC Berkeley

Discuss some recent work in deep reinforcement learning Present a few major challenges Show some of our recent work toward tackling these challenges

Some recent work on deep RL stability efficiency scale RL on raw visual input Lange et al. 2009 End-to-end visuomotor policies Levine*, Finn* et al. 2015 Guided policy search Levine et al. 2013 Deep deterministic policy gradients Lillicrap et al. 2015 Deep Q-Networks Mnih et al. 2013 AlphaGo Silver et al. 2016 Trust region policy optimization Schulman et al. 2015 Supersizing self-supervision Pinto & Gupta 2016

Challenges in Deep Reinforcement Learning 1. Stability 2. Efficiency 3. Scale

Challenges in Deep Reinforcement Learning 1. Stability 2. Efficiency 3. Scale

Deep RL with Policy Gradients Unbiased but high-variance gradient Stable Requires many samples Example: TRPO [Schulman et al. 15]

Deep RL with Off-Policy Q-Function Critic Low-variance but biased gradient Much more efficient (because off-policy) Much less stable (because biased) Example: DDPG [Lillicrap et al. 16]

Improving Efficiency & Stability with Q-Prop Unbiased gradient, stable Efficient (uses off-policy samples) Critic comes from off-policy data Gradient comes from on-policy data Automatic variance-based adjustment Policy gradient: Q-function critic: Shane Gu Q-Prop:

Comparisons Works with smaller batches than TRPO More efficient than TRPO More stable than DDPG with respect to hyperparameters Likely responsible for the better performance on harder task

Challenges in Deep Reinforcement Learning 1. Stability 2. Efficiency 3. Scale

Parameter Space vs Policy Space parameters why policy space? local optima/easier optimization landscapes can be easier to update in policy space vs parameter space

Mirror Descent Guided Policy Search (MDGPS)

Mirror Descent Guided Policy Search (MDGPS) projection : supervised learning local policy optimization: trajectory-centric model-based RL [Montgomery 16] path integral policy iteration [Chebotar 16]

MDGPS with Random Initial States and Local Models Harley Montgomery Anurag Ajay Chelsea Finn

Efficiency & Real-World Evaluation Learning 2D reaching (simple benchmark task): TRPO (best known value): 3000 trials DDPG, NAF (best known value): 2000 trials Q-Prop: 2000 trials MDGPS: 500 trials

MDGPS with Demonstrations and Path Integral Policy Iteration much better handling of non-smooth problems (e.g. discontinuities) requires more samples, works best with demo initialization Mrinal Kalakrishnan Yevgen Chebotar Ali Yahya Adrian Li

Challenges in Deep Reinforcement Learning 1. Stability 2. Efficiency 3. Scale

ingredients for success in learning: supervised learning: computation algorithms data reinforcement learning: computation ~ data? algorithms L., Pastor, Krizhevsky, Quillen 16

Policy Learning with Multiple Robots Rollout execution Local policy optimization Global policy optimization Ali Yahya Adrian Li Mrinal Kalakrishnan Yevgen Chebotar

Yahya, Li, Kalakrishnan, Chebotar, L., 16

Policy Learning with Multiple Robots: Deep RL with NAF Shane Gu Ethan Holly Tim Lillicrap Gu*, Holly*, Lillicrap, L., 16

Future Outlook & Future Challenges Stability remains a huge challenge Can t do hyperparameter sweeps in the real world Likely missing a few more pieces of theory High efficiency is important, but what about diversity? Efficiency seems at odds with generalization Massively off-policy learning Semi-supervised learning (not addressed in this talk) What about the reward function? Highly nonobvious how to set in the real world

Acknowledgements Harley Montgomery Anurag Ajay Chelsea Finn Shane Gu Ethan Holly Tim Lillicrap Ali Yahya Adrian Li Mrinal Kalakrishnan Yevgen Chebotar