Reinforcement Learning

Size: px

Start display at page:

Download "Reinforcement Learning"

Virgil Miles
5 years ago
Views:

1 Reinforcement Learning Introduction Vien Ngo Marc Toussaint University of Stuttgart

2 Problems facing in daily life? 2/20

3 Problems facing in daily life? 2/20

4 Problems facing in daily life? 3/20

5 Problems facing in daily life? This is a sequential decision problem: optimal decision making maximize reward, or minimize penalty. 3/20

6 Problems facing in daily life? This is a sequential decision problem: optimal decision making maximize reward, or minimize penalty. hard? stochasticity and uncertainty. delayed reward or penalty. 3/20

7 What is Reinforcement Learning? RL is learning from interaction. from Satinder Singh s Introduction to RL, videolectures.com 4/20

8 What is Reinforcement Learning? s 1 a 1 r 2 s 2 a 2 r 2 s i a i r i+1 s i+1 5/20

9 What is Reinforcement Learning? s 1 a 1 r 2 s 2 a 2 r 2 s i a i r i+1 s i+1 States can be vectors or other structures, defined as sufficient statistics to predict the future. Actions can be multi-dimensional Rewards are scalar but can be arbitrarily uninformative 5/20

10 What is Reinforcement Learning? s 1 a 1 r 2 s 2 a 2 r 2 s i a i r i+1 s i+1 States can be vectors or other structures, defined as sufficient statistics to predict the future. Actions can be multi-dimensional Rewards are scalar but can be arbitrarily uninformative States are sometimes not directly observable. o 1 a 1 r 2 o 2 a 2 r 2 o i a i r i+1 o i+1 5/20

11 What is Reinforcement Learning? s 1 a 1 r 2 s 2 a 2 r 2 s i a i r i+1 s i+1 States can be vectors or other structures, defined as sufficient statistics to predict the future. Actions can be multi-dimensional Rewards are scalar but can be arbitrarily uninformative States are sometimes not directly observable. o 1 a 1 r 2 o 2 a 2 r 2 o i a i r i+1 o i+1 Agent has only partial knowledge about environment. 5/20

12 What is Reinforcement Learning? from Satinder Singh s Introduction to RL, videolectures.com 6/20

13 Long history in AI Idea of programming a computer to learn by trial and error (Turing, 1954) SNARCs (Stochastic Neural-Analog Reinforcement Calculators) (Minsky, 54) Checkers playing program (Samuel, 59) Lots of RL in the 60s (e.g., Waltz & Fu 65; Mendel 66; Fu 70) MENACE (Matchbox Educable Naughts and Crosses Engine (Mitchie, 63) RL based Tic Tac Toe learner (GLEE) (Mitchie 68) Classifier Systems (Holland, 75) Adaptive Critics (Barto & Sutton, 81) Temporal Differences (Sutton, 88) from Satinder Singh s Introduction to RL, videolectures.com 7/20

14 RL: A subfield of Machine Learning 8/20

15 RL: A subfield of Machine Learning (from Machine Learning course, 2011, Marc Toussaint) Supervised learning: learn from labelled data {(x i, y i )} N i=1 Unsupervised learning: learn from unlabelled data {x i } N i=0 only Semi-supervised learning: many unlabelled data, few labelled data 8/20

16 RL: A subfield of Machine Learning (from Machine Learning course, 2011, Marc Toussaint) Supervised learning: learn from labelled data {(x i, y i )} N i=1 Unsupervised learning: learn from unlabelled data {x i } N i=0 only Semi-supervised learning: many unlabelled data, few labelled data Reinforcement learning: learn from data {(s t, a t, r t, s t+1 )} learn a predictive model (s, a) s learn to predict reward (s, a) r learn a behavior s a that maximizes reward 8/20

17 Success of Reinforcement Learning Games Backgammon (Tesauro, 1994) Solitaire (X. Yan et. al., 2005) Chess, Checkers, Operations Research Inventory Management (Van Roy, Bertsekas, Lee, & Tsitsiklis, 1996) Dynamic Channel Allocation (e.g. Singh & Bertsekas, 1997) Vehicle Routing, etc. Economics Trading, Robotics Robocup Soccer (e.g. Stone & Veloso, 1999) Helicopter Control (e.g. Ng, 2003, Abbeel & Ng, 2006) Many Robots (navigation, bi-pedal walking, grasping, switching between skills,...) more from of Reinforcement Learning 9/20

Self-play: use the current policy to sample moves on both sides!

18 TD-Gammon, by Gerald Tesauro (See section 11.1 in Sutton & Barto s book.) See (Tesauro, 1992, 1994, 1995) Only reward given at end of game for win. Self-play: use the current policy to sample moves on both sides! After about 300,000 games against itself, near the level of the world s strongest grandmasters. 10/20

19 GO using UCT, by Gelly (See Gelly et. al 2012, Communications of the ACM for a review.) 11/20

20 Reinfocement Learning in Robotics Learning motor skills, Autonomous Helicopter Flight (around 2000, by Schaal, Atkeson, Vijayakumar) 12/20

21 (2007, Andrew Ng et al.) 12/20 Reinfocement Learning in Robotics Learning motor skills, Autonomous Helicopter Flight (around 2000, by Schaal, Atkeson, Vijayakumar)

22 Reinfocement Learning in Robotics Planning and exploration in a relational stochastic world (Lang and Marc, JMLR 2012) 13/20

23 Reinforcement learning in neuroscience (Yael Niv, ICML 2009 s tutorial.) 14/20

24 Reinforcement learning in neuroscience Peter Dayan and Yael Niv, Neurobiology The brain employs both model-free and model-based decision-making strategies in parallel, with each dominating in different circumstances. 15/20

25 Schedule of this course Part 1: The Basis Markov Decision Process Dynamic Programming: Value Iteration, Policy Iteration Part 2: Reinforcement Learning Topics TD, Q-Learning. Reinforcement learning with function approximation: LSPI, regression,... Policy search: Policy gradient, covariant policy search, entropy policy search,... Actor-Critic Part 3: Advance Topics Inverse reinforcement learning, imitation learning. Exploration vs. Exploitation: Multi-armed bandis, PAC-MDP, Bayesian reinforcement learning. Hierarchical reinforcement learning: macro actions, skill acquisition. Intrinsically motivated reinforcement learning. Connection to control theory. Reinforcement learning in POMDP environment. 16/20

26 Schedule of this course Missing: Relational MDP MDP/POMDP/RL as Inference 17/20

27 Literature Richard S. Sutton, Andrew Barto: Reinforcement Learning: An Introduction. The MIT Press Cambridge, Massachusetts London, England, ~sutton/book/the-book.html 18/20

28 Literature Csaba Szepesvri: Algorithms for Reinforcement Learning. Morgan & Claypool in July ~szepesva/rlbook.html 19/20

29 Organisation Course webpage:: Slides, Exercises Links to other resources Secretary, admin issues Carola Stahl, Raum one exercise: Freitag 08:00-09:30 Rules for the tutorials: Doing the exercises is crucial! At the beginning of each tutorial: sign into a list mark which exercises you have (successfully) worked on Students are randomly selected to present their solutions You need 50% of completed exercises to be allowed to the exam (Prof. Marc Toussaint s rules.) 20/20

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Exploration. CS : Deep Reinforcement Learning Sergey Levine Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?