Reinforcement Learning

Similar documents
Exploration. CS : Deep Reinforcement Learning Sergey Levine

Axiom 2013 Team Description Paper

Reinforcement Learning by Comparing Immediate Reward

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

TD(λ) and Q-Learning Based Ludo Players

An investigation of imitation learning algorithms for structured prediction

Lecture 6: Applications

Laboratorio di Intelligenza Artificiale e Robotica

Lecture 10: Reinforcement Learning

Learning Prospective Robot Behavior

Lecture 1: Machine Learning Basics

Laboratorio di Intelligenza Artificiale e Robotica

Georgetown University at TREC 2017 Dynamic Domain Track

Speeding Up Reinforcement Learning with Behavior Transfer

A Case-Based Approach To Imitation Learning in Robotic Agents

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Lecture 1: Basic Concepts of Machine Learning

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Python Machine Learning

Regret-based Reward Elicitation for Markov Decision Processes

CSL465/603 - Machine Learning

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

High-level Reinforcement Learning in Strategy Games

Improving Action Selection in MDP s via Knowledge Transfer

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

Artificial Neural Networks written examination

Welcome to. ECML/PKDD 2004 Community meeting

An OO Framework for building Intelligence and Learning properties in Software Agents

Task Completion Transfer Learning for Reward Inference

Intelligent Agents. Chapter 2. Chapter 2 1

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Task Completion Transfer Learning for Reward Inference

Case Acquisition Strategies for Case-Based Reasoning in Real-Time Strategy Games

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

AMULTIAGENT system [1] can be defined as a group of

Abstractions and the Brain

Learning to Schedule Straight-Line Code

(Sub)Gradient Descent

A Reinforcement Learning Variant for Control Scheduling

Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics

TOKEN-BASED APPROACH FOR SCALABLE TEAM COORDINATION. by Yang Xu PhD of Information Sciences

Seminar - Organic Computing

Automatic Discretization of Actions and States in Monte-Carlo Tree Search

Semi-Supervised Face Detection

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

AI Agent for Ice Hockey Atari 2600

Agents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators

A Bayesian Model of Imitation in Infants and Robots

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners

An Introduction to Simulation Optimization

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

FF+FPG: Guiding a Policy-Gradient Planner

Learning Methods for Fuzzy Systems

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task

An Investigation into Team-Based Planning

Probabilistic Latent Semantic Analysis

XXII BrainStorming Day

SARDNET: A Self-Organizing Feature Map for Sequences

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

MGT/MGP/MGB 261: Investment Analysis

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

COMPUTER-AIDED DESIGN TOOLS THAT ADAPT

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Motivation to e-learn within organizational settings: What is it and how could it be measured?

Aviation English Solutions

University of Victoria School of Exercise Science, Physical and Health Education EPHE 245 MOTOR LEARNING. Calendar Description Units: 1.

SOFTWARE EVALUATION TOOL

Rule Learning With Negation: Issues Regarding Effectiveness

Rajesh P. N. Rao, Aaron P. Shon and Andrew N. Meltzoff

Navigating the PhD Options in CMS

arxiv: v2 [cs.ro] 3 Mar 2017

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Toward Probabilistic Natural Logic for Syllogistic Reasoning

MYCIN. The MYCIN Task

SAM - Sensors, Actuators and Microcontrollers in Mobile Robots

Dialog-based Language Learning

On the Combined Behavior of Autonomous Resource Management Agents

Indian Institute of Technology, Kanpur

Speech Emotion Recognition Using Support Vector Machine

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Computational Approaches to Motor Learning by Imitation

Chapter 2. Intelligent Agents. Outline. Agents and environments. Rationality. PEAS (Performance measure, Environment, Actuators, Sensors)

Learning and Transferring Relational Instance-Based Policies

Go fishing! Responsibility judgments when cooperation breaks down

A survey of multi-view machine learning

Developmental coordination disorder DCD. Overview. Gross & fine motor skill. Elisabeth Hill The importance of motor development

DOCTOR OF PHILOSOPHY HANDBOOK

Learning Methods in Multilingual Speech Recognition

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Transcription:

Reinforcement Learning Introduction Vien Ngo Marc Toussaint University of Stuttgart

Problems facing in daily life? 2/20

Problems facing in daily life? 2/20

Problems facing in daily life? 3/20

Problems facing in daily life? This is a sequential decision problem: optimal decision making maximize reward, or minimize penalty. 3/20

Problems facing in daily life? This is a sequential decision problem: optimal decision making maximize reward, or minimize penalty. hard? stochasticity and uncertainty. delayed reward or penalty. 3/20

What is Reinforcement Learning? RL is learning from interaction. from Satinder Singh s Introduction to RL, videolectures.com 4/20

What is Reinforcement Learning? s 1 a 1 r 2 s 2 a 2 r 2 s i a i r i+1 s i+1 5/20

What is Reinforcement Learning? s 1 a 1 r 2 s 2 a 2 r 2 s i a i r i+1 s i+1 States can be vectors or other structures, defined as sufficient statistics to predict the future. Actions can be multi-dimensional Rewards are scalar but can be arbitrarily uninformative 5/20

What is Reinforcement Learning? s 1 a 1 r 2 s 2 a 2 r 2 s i a i r i+1 s i+1 States can be vectors or other structures, defined as sufficient statistics to predict the future. Actions can be multi-dimensional Rewards are scalar but can be arbitrarily uninformative States are sometimes not directly observable. o 1 a 1 r 2 o 2 a 2 r 2 o i a i r i+1 o i+1 5/20

What is Reinforcement Learning? s 1 a 1 r 2 s 2 a 2 r 2 s i a i r i+1 s i+1 States can be vectors or other structures, defined as sufficient statistics to predict the future. Actions can be multi-dimensional Rewards are scalar but can be arbitrarily uninformative States are sometimes not directly observable. o 1 a 1 r 2 o 2 a 2 r 2 o i a i r i+1 o i+1 Agent has only partial knowledge about environment. 5/20

What is Reinforcement Learning? from Satinder Singh s Introduction to RL, videolectures.com 6/20

Long history in AI Idea of programming a computer to learn by trial and error (Turing, 1954) SNARCs (Stochastic Neural-Analog Reinforcement Calculators) (Minsky, 54) Checkers playing program (Samuel, 59) Lots of RL in the 60s (e.g., Waltz & Fu 65; Mendel 66; Fu 70) MENACE (Matchbox Educable Naughts and Crosses Engine (Mitchie, 63) RL based Tic Tac Toe learner (GLEE) (Mitchie 68) Classifier Systems (Holland, 75) Adaptive Critics (Barto & Sutton, 81) Temporal Differences (Sutton, 88) from Satinder Singh s Introduction to RL, videolectures.com 7/20

RL: A subfield of Machine Learning 8/20

RL: A subfield of Machine Learning (from Machine Learning course, 2011, Marc Toussaint) Supervised learning: learn from labelled data {(x i, y i )} N i=1 Unsupervised learning: learn from unlabelled data {x i } N i=0 only Semi-supervised learning: many unlabelled data, few labelled data 8/20

RL: A subfield of Machine Learning (from Machine Learning course, 2011, Marc Toussaint) Supervised learning: learn from labelled data {(x i, y i )} N i=1 Unsupervised learning: learn from unlabelled data {x i } N i=0 only Semi-supervised learning: many unlabelled data, few labelled data Reinforcement learning: learn from data {(s t, a t, r t, s t+1 )} learn a predictive model (s, a) s learn to predict reward (s, a) r learn a behavior s a that maximizes reward 8/20

Success of Reinforcement Learning Games Backgammon (Tesauro, 1994) Solitaire (X. Yan et. al., 2005) Chess, Checkers, Operations Research Inventory Management (Van Roy, Bertsekas, Lee, & Tsitsiklis, 1996) Dynamic Channel Allocation (e.g. Singh & Bertsekas, 1997) Vehicle Routing, etc. Economics Trading, Robotics Robocup Soccer (e.g. Stone & Veloso, 1999) Helicopter Control (e.g. Ng, 2003, Abbeel & Ng, 2006) Many Robots (navigation, bi-pedal walking, grasping, switching between skills,...) more from http://umichrl.pbworks.com/w/page/7597597/successes of Reinforcement Learning 9/20

TD-Gammon, by Gerald Tesauro (See section 11.1 in Sutton & Barto s book.) See (Tesauro, 1992, 1994, 1995) Only reward given at end of game for win. Self-play: use the current policy to sample moves on both sides! After about 300,000 games against itself, near the level of the world s strongest grandmasters. 10/20

GO using UCT, by Gelly (See Gelly et. al 2012, Communications of the ACM for a review.) 11/20

Reinfocement Learning in Robotics Learning motor skills, Autonomous Helicopter Flight (around 2000, by Schaal, Atkeson, Vijayakumar) 12/20

(2007, Andrew Ng et al.) 12/20 Reinfocement Learning in Robotics Learning motor skills, Autonomous Helicopter Flight (around 2000, by Schaal, Atkeson, Vijayakumar)

Reinfocement Learning in Robotics Planning and exploration in a relational stochastic world (Lang and Marc, JMLR 2012) 13/20

Reinforcement learning in neuroscience (Yael Niv, ICML 2009 s tutorial.) 14/20

Reinforcement learning in neuroscience Peter Dayan and Yael Niv, Neurobiology 2008. The brain employs both model-free and model-based decision-making strategies in parallel, with each dominating in different circumstances. 15/20

Schedule of this course Part 1: The Basis Markov Decision Process Dynamic Programming: Value Iteration, Policy Iteration Part 2: Reinforcement Learning Topics TD, Q-Learning. Reinforcement learning with function approximation: LSPI, regression,... Policy search: Policy gradient, covariant policy search, entropy policy search,... Actor-Critic Part 3: Advance Topics Inverse reinforcement learning, imitation learning. Exploration vs. Exploitation: Multi-armed bandis, PAC-MDP, Bayesian reinforcement learning. Hierarchical reinforcement learning: macro actions, skill acquisition. Intrinsically motivated reinforcement learning. Connection to control theory. Reinforcement learning in POMDP environment. 16/20

Schedule of this course Missing: Relational MDP MDP/POMDP/RL as Inference 17/20

Literature Richard S. Sutton, Andrew Barto: Reinforcement Learning: An Introduction. The MIT Press Cambridge, Massachusetts London, England, 1998. http://webdocs.cs.ualberta.ca/ ~sutton/book/the-book.html 18/20

Literature Csaba Szepesvri: Algorithms for Reinforcement Learning. Morgan & Claypool in July 2010. http://www.ualberta.ca/ ~szepesva/rlbook.html 19/20

Organisation Course webpage:: http://ipvs.informatik.uni-stuttgart.de/mlr/reinforcement-learning-ws1314/ Slides, Exercises Links to other resources Secretary, admin issues Carola Stahl, Carola.Stahl@ipvs.uni-stuttgart.de, Raum 2.217 one exercise: Freitag 08:00-09:30 Rules for the tutorials: Doing the exercises is crucial! At the beginning of each tutorial: sign into a list mark which exercises you have (successfully) worked on Students are randomly selected to present their solutions You need 50% of completed exercises to be allowed to the exam (Prof. Marc Toussaint s rules.) 20/20