Reinforcement Learning

Similar documents
Reinforcement Learning by Comparing Immediate Reward

Lecture 10: Reinforcement Learning

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Georgetown University at TREC 2017 Dynamic Domain Track

An investigation of imitation learning algorithms for structured prediction

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Learning Prospective Robot Behavior

TD(λ) and Q-Learning Based Ludo Players

Lecture 6: Applications

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Axiom 2013 Team Description Paper

Improving Action Selection in MDP s via Knowledge Transfer

Laboratorio di Intelligenza Artificiale e Robotica

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Regret-based Reward Elicitation for Markov Decision Processes

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Laboratorio di Intelligenza Artificiale e Robotica

High-level Reinforcement Learning in Strategy Games

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Lecture 1: Basic Concepts of Machine Learning

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Task Completion Transfer Learning for Reward Inference

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

Task Completion Transfer Learning for Reward Inference

AI Agent for Ice Hockey Atari 2600

A Case-Based Approach To Imitation Learning in Robotic Agents

Artificial Neural Networks written examination

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Welcome to. ECML/PKDD 2004 Community meeting

AMULTIAGENT system [1] can be defined as a group of

Speeding Up Reinforcement Learning with Behavior Transfer

Lecture 1: Machine Learning Basics

Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners

Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Motivation to e-learn within organizational settings: What is it and how could it be measured?

Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Seminar - Organic Computing

Learning Methods for Fuzzy Systems

CS 446: Machine Learning

SAM - Sensors, Actuators and Microcontrollers in Mobile Robots

A Reinforcement Learning Variant for Control Scheduling

LEARNING TO PLAY IN A DAY: FASTER DEEP REIN-

Dialog-based Language Learning

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Developmental coordination disorder DCD. Overview. Gross & fine motor skill. Elisabeth Hill The importance of motor development

TOKEN-BASED APPROACH FOR SCALABLE TEAM COORDINATION. by Yang Xu PhD of Information Sciences

An Introduction to Simulation Optimization

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Accounting for student diversity

Date : Controller of Examinations Principal Wednesday Saturday Wednesday

CSL465/603 - Machine Learning

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

arxiv: v2 [cs.ro] 3 Mar 2017

Case Acquisition Strategies for Case-Based Reasoning in Real-Time Strategy Games

Intelligent Agents. Chapter 2. Chapter 2 1

Adaptive Generation in Dialogue Systems Using Dynamic User Modeling

Abstractions and the Brain

Computational Approaches to Motor Learning by Imitation

Python Machine Learning

Probabilistic Latent Semantic Analysis

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Semi-Supervised Face Detection

XXII BrainStorming Day

Learning From the Past with Experiment Databases

On the Combined Behavior of Autonomous Resource Management Agents

Cross Language Information Retrieval

An OO Framework for building Intelligence and Learning properties in Software Agents

Agents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

Airplane Rescue: Social Studies. LEGO, the LEGO logo, and WEDO are trademarks of the LEGO Group The LEGO Group.

Learning to Schedule Straight-Line Code

A survey of multi-view machine learning

Online Updating of Word Representations for Part-of-Speech Tagging

Emergency Management Games and Test Case Utility:

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

A virtual surveying fieldcourse for traversing

COMPUTER-AIDED DESIGN TOOLS THAT ADAPT

Programme Specification. MSc in International Real Estate

Knowledge Synthesis and Integration: Changing Models, Changing Practices

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

COURSE GUIDE: PRINCIPLES OF MANAGEMENT

Aviation English Solutions

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014


CORE CURRICULUM FOR REIKI

FF+FPG: Guiding a Policy-Gradient Planner

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Robot Shaping: Developing Autonomous Agents through Learning*

Learning Methods in Multilingual Speech Recognition

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

DOCTOR OF PHILOSOPHY HANDBOOK

A BOOK IN A SLIDESHOW. The Dragonfly Effect JENNIFER AAKER & ANDY SMITH

A Bayesian Model of Imitation in Infants and Robots

Julia Smith. Effective Classroom Approaches to.

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

While you are waiting... socrative.com, room number SIMLANG2016

DISTANCE LEARNING OF ENGINEERING BASED SUBJECTS: A CASE STUDY. Felicia L.C. Ong (author and presenter) University of Bradford, United Kingdom

Transcription:

Reinforcement Learning Lecture 1: Introduction Vien Ngo MLR, University of Stuttgart

What is Reinforcement Learning? Reinforcement Learning is a subfield of Machine Learning from David Silver s lecture 2/20

RL: A subfield of Machine Learning (from Machine Learning course, 2011, Marc Toussaint) Supervised learning: learn from labelled data {(x i, y i )} N i=1 Unsupervised learning: learn from unlabelled data {x i } N i=0 only Semi-supervised learning: many unlabelled data, few labelled data 3/20

RL: A subfield of Machine Learning (from Machine Learning course, 2011, Marc Toussaint) Supervised learning: learn from labelled data {(x i, y i )} N i=1 Unsupervised learning: learn from unlabelled data {x i } N i=0 only Semi-supervised learning: many unlabelled data, few labelled data Reinforcement learning: learn from data {(s t, a t, r t, s t+1 )} learn a predictive model (s, a) s learn to predict reward (s, a) r learn a behavior s a that maximizes the expected total reward 3/20

Success of Reinforcement Learning 4/20

Success of Reinforcement Learning Games Backgammon (Tesauro, 1994) deep RL in playing Atari games (2014), AlphaGO (2016) Operations Research Inventory Management (Van Roy, Bertsekas, Lee, & Tsitsiklis, 1996) Investment portfolio Dynamic Channel Allocation (e.g. Singh & Bertsekas, 1997) Online advertisements Robotics Helicopter Control (e.g. Ng, 2003, Abbeel & Ng, 2006) Many Robots (navigation, bi-pedal walking, grasping, switching between skills,...) 5/20

TD-Gammon, by Gerald Tesauro (See section 11.1 in Sutton & Barto s book.) See (Tesauro, 1992, 1994, 1995) Only reward given at end of game for win. Self-play: use the current policy to sample moves on both sides! After about 300,000 games against itself, near the level of the world s strongest grandmasters. 6/20

AlphaGO AlphaGO by Google Deepmind got the Go grandmaster rank (updated 4.4.2016) 7/20

Reinfocement Learning in Robotics Learning motor skills, Autonomous Helicopter Flight (2000, by Schaal, Atkeson, Vijayakumar) (2014, playing Atari games by Google Deepmind) (2004, Tedrake et al.) (2007, Andrew Ng et al.) 8/20

Reinforcement learning in neuroscience (Yael Niv, ICML 2009 s tutorial.) 9/20

Reinforcement learning in neuroscience Peter Dayan and Yael Niv, Neurobiology 2008. The brain employs both model-free and model-based decision-making strategies in parallel, with each dominating in different circumstances. 10/20

What is Reinforcement Learning? 11/20

What is Reinforcement Learning? RL is learning from interaction. There is no supervisor, only signals of reward/evaluative feedback. Decisions in sequence does matter as they affect the outcome of subsequent decisions. from Satinder Singh s Introduction to RL 12/20

What is Reinforcement Learning? s 1 a 1 r 2 s 2 a 2 r 2 s i a i r i+1 s i+1 13/20

What is Reinforcement Learning? s 1 a 1 r 2 s 2 a 2 r 2 s i a i r i+1 s i+1 States can be vectors or other structures, defined as sufficient statistics to predict what happens next. Actions/Controls can be multi-dimensional Rewards are scalar but can be arbitrarily uninformative, and might be delayed; e.g., r t tells how well the agent does at time t (after taking action a t at s t ). Objective: is desribed as the maximization of expected total reward. 13/20

What is Reinforcement Learning? s 1 a 1 r 2 s 2 a 2 r 2 s i a i r i+1 s i+1 States can be vectors or other structures, defined as sufficient statistics to predict what happens next. Actions/Controls can be multi-dimensional Rewards are scalar but can be arbitrarily uninformative, and might be delayed; e.g., r t tells how well the agent does at time t (after taking action a t at s t ). Objective: is desribed as the maximization of expected total reward. States are sometimes not directly observable, unobservable. o 1 a 1 r 2 o 2 a 2 r 2 o i a i r i+1 o i+1 Agent has only partial knowledge about environment, e.g unknown dynamics, reward, observation functions, etc.. 13/20

What is Reinforcement Learning? Example of Rewards: +1/ 1 of winning/losing a game, e.g. GO, Backgammon,... +/ for increasing/decreasing score, e.g. in deep RL algorithms playing Atari games. +/ rewards for earning/losing money in managing an investment portfolio. +/ rewards for following the desired trajectory/for crashing in controlling a stunt helicopter. etc. 14/20

Components of An RL Agent Policy: define behaviours of the agent, e.g a mapping π : S A or π : S A [0, 1] Value Functions: the expected return from this state (if starting from this state). V π [ (s) = E π γ t R t s 0 = s ] Model: the agent s internal representation of the environment, e.g. P (s s, a), R(s, a, s ). t 15/20

Admin 16/20

Schedule of this course Part 1: The Basis Markov Decision Process (MDP), Partially Observable MDP (POMDP). Dynamic Programming: Value Iteration, Policy Iteration Part 2: Reinforcement Learning Topics Temporal Difference learning, Q-Learning. Reinforcement learning with function approximation Policy search Part 3: Advanced Topics Inverse reinforcement learning, imitation learning. Exploration vs. Exploitation: Multi-armed bandis, PAC-MDP, Bayesian reinforcement learning. Hierarchical reinforcement learning: macro actions, skill acquisition. Deep reinforcement learning Reinforcement learning in POMDP environment. 17/20

Schedule of this course Missing: Relational MDP MDP/POMDP/RL as Inference 18/20

Literature Richard S. Sutton, Andrew Barto: Reinforcement Learning: An Introduction. The MIT Press Cambridge, Massachusetts London, England, 1998. http://webdocs.cs.ualberta.ca/ ~sutton/book/the-book.html Csaba Szepesvri: Algorithms for Reinforcement Learning. Morgan & Claypool in July 2010. http://www.ualberta.ca/~szepesva/ RLBook.html 19/20

Organisation Course webpage:: https://ipvs.informatik.uni-stuttgart.de/mlr/teaching/reinforcement-learning-ss16/ Slides, Exercises Links to other resources Secretary, admin issues Carola Stahl, Carola.Stahl@ipvs.uni-stuttgart.de, Raum 2.217 Lecture: Wed. 14:00-15:30, Room 0.108 Tutorial: Tue. 17:30-19:30, Room 38.03 Rules for the tutorials: Doing the exercises is crucial! At the beginning of each tutorial: sign into a list mark which exercises you have (successfully) worked on Students are randomly selected to present their solutions You need 50% of completed exercises to be allowed to the exam 20/20