Reinforcement Learning LU 1 - Introduction Dr. Joschka Bödecker AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg jboedeck@informatik.uni-freiburg.de Acknowledgement Slides courtesy of Martin Riedmiller and Martin Lauer Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (1)
Organisational issues Dr. Joschka Boedecker Room 00010, building 079 jboedeck@informatik.uni-freiburg.de Office hours: Tuesday 2-3 pm no script - slides available online http://ml.informatik.uni-freiburg.de/teaching/ws1516/rl Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (2)
Dates winter term 2015/2016 3+1 Lecture Monday, 14:00 (c.t.) - 15:30, SR 02-017, building 052 Wednesday, 16:00 (s.t) - 17:30, SR 02-017, building 052 Exercise sessions on Wednesday, 16:00-17:30, interleaved with lecture starting at Oct. 28 held by Jan Wülfing, wuelfj@informatik.uni-freiburg.de Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (3)
Goal of this lecture Introduction of learning problem type Reinforcement Learning Introduction to the mathematical basics of an independently learning system. Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (4)
Goal of the 1. unit Motivation, definition and differentiation Outline Examples Solution approaches Machine Learning Reinforcement Learning Overview Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (5)
Example Backgammon Can a program independently learn Backgammon? Learning from success (win) and failure (loss) Neuro-Backgammon: Playing at world champion level (Tesauro, 1992) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (6)
Example pole balancing (control engineering) Can a program independently learn balancing? Learning from success and failure Neural RL Controller: Noise, inaccuracies, unknown behaviour, non-linearities,... (Riedmiller et.al. ) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (7)
Example robot soccer Can programs independently learn how to cooperate? Learning from success and failure Cooperative RL Agents: Complexity, distributed intelligence,... (Riedmiller et.al. ) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (8)
Example: Autonomous (e.g. humanoid) robots Task: Movement control similar to humans (walking, running, playing soccer, cycling, skiing,...) Input: Image from camera Output: Control signals to the joints Problems: very complex consequences of actions hard to predict interference / noise Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (9)
Example: Maze Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (10)
The Agent Concept [Russell and Norvig 1995, page 33] An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through effectors. examples: a human a robot arm an autonomous car a motor controller... Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (11)
Solution approaches in Artificial Intelligence (AI) Planning / search (e.g. A, backtracking) Deduction (e.g. logic programming, predicate logic) Expert systems (e.g. knowledge generated by experts) Fuzzy control systems (fuzzy logic) Genetic algorithms (evolution of solutions) Machine Learning (e.g. reinforcement learning) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (12)
Types of learning (in humans) Learning from a teacher Structuring of objects Learning from experience Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (13)
Types of Machine Learning (ML) Learning with a teacher. Supervised Learning: Examples of input / (target-)output. Goal: generalization (in general not simply memorization) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (14)
Types of Machine Learning (ML) Learning with a teacher. Supervised Learning: Examples of input / (target-)output. Goal: generalization (in general not simply memorization) Structuring / recognition of correlations. Unsupervised learning: Goal: Clustering of similar data points, e.g. for preprocessing. Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (14)
Types of Machine Learning (ML) Learning with a teacher. Supervised Learning: Examples of input / (target-)output. Goal: generalization (in general not simply memorization) Structuring / recognition of correlations. Unsupervised learning: Goal: Clustering of similar data points, e.g. for preprocessing. Learning through reward / penalty. Reinforcement Learning: Prerequisite: Specification of target goal (or events to be avoided).... Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (14)
Machine Learning: ingredients 1. Type of the learning problem (given / seeked) 2. Representation of learned solution knowledge table, rules, linear mapping, neural network,... 3. Solution process (observed data solution) (heuristic) search, gradient descent, optimization technique,... Not at all: For this problem I need a neural network Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (15)
Emphasis of the lecture: Reinforcement Learning No information regarding the solution strategy required Independent learning of a strategy by smart trial of solutions ( trial and error ) Biggest challenge of a learning system Representation of solution knowledge by usage of a function approximator (e.g. tables, linear models, neural networks, etc.) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (16)
RL using the example of autonomous robots bad: Damage (fall,...) good: task done successfully better: fast / low energy / smooth movements /... optimization! Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (17)
Reinforcement Learning (RL) Also: Learning from evaluations, autonomous learning, neuro dynamic programming Defines a learning type and not a method! Central feature: Evaluating training signal - e.g. good / bad RL with immediate evaluation: Decision Evaluation Example: Parameter for a basketball throw RL with rewards delayed in time Decision, decision,..., decision evaluation substantially harder; interesting, because of versatile applications Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (18)
Delayed RL Decision, decision,..., decision evaluation Example: Robotics, control systems, games (chess, backgammon) Basic problem: Temporal credit assignment Basic architecture: Actor-critic system Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (19)
Multistage decision problems Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (20)
Actor-critic system (Barto, Sutton, 1983) Actor: In situation s choose action u (strategy π : S U) Critic: Distribution of the external signal onto single actions Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (21)
Reinforcement Learning 1959 Samuel s Checker-Player: Temporal difference (TD) methods 1968 Michie and Chambers: Boxes 1983 Barto, Sutton s AHC/ACE, 1987 Sutton s TD(λ) Early 90ies: Correlation between dynamic programming (DP) and RL: Werbos, Sutton, Barto, Watkins, Singh, Bertsekas DP - classic optimization technique (late 50ies: Bellman) too much effort for large tasks Advantage: Clean mathematical formulation, convergences 2000 Policy Gradient methods (Sutton et. al, Peters et. al,...) 2005 Fitted Q (Batch DP method) (Ernst et. al, Riedmiller,..) many examples of successful, at least practically relevant applications since Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (22)
Other examples field input goal example output (actions) games board situation winning backgammon, chess valid move robotics sensor data reference value pendulum, robot soccer control variable sequence state gain assembly line, mobile network planning candidate benchmark state goal position maze direction Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (23)
Goal: Autonomous learning system Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (24)
Approach - rough outline Formulation of the learning problem as an optimization task Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (25)
Approach - rough outline Formulation of the learning problem as an optimization task Solution by learning based on the optimization technique of Dynamic Programming Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (25)
Approach - rough outline Formulation of the learning problem as an optimization task Solution by learning based on the optimization technique of Dynamic Programming Difficulties: very large state space process behaviour unknown Application of approximation techniques (e.g. neural networks,...) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (25)
Outline of lecture 1. part: Introduction Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)
Outline of lecture 1. part: Introduction 2. part: Dynamic Programming Markov Decision Problems, Backwards DP, Value Iteration, Policy Iteration Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)
Outline of lecture 1. part: Introduction 2. part: Dynamic Programming Markov Decision Problems, Backwards DP, Value Iteration, Policy Iteration 3. part: Approximate DP / Reinforcement Learning Monte Carlo methods, stochastic approximation, TD(λ), Q-learning Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)
Outline of lecture 1. part: Introduction 2. part: Dynamic Programming Markov Decision Problems, Backwards DP, Value Iteration, Policy Iteration 3. part: Approximate DP / Reinforcement Learning Monte Carlo methods, stochastic approximation, TD(λ), Q-learning 4. part: Advanced methods of Reinforcement Learning Policy Gradient methods, hierarchic methods, POMDPs, relational Reinforcement Learning Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)
Outline of lecture 1. part: Introduction 2. part: Dynamic Programming Markov Decision Problems, Backwards DP, Value Iteration, Policy Iteration 3. part: Approximate DP / Reinforcement Learning Monte Carlo methods, stochastic approximation, TD(λ), Q-learning 4. part: Advanced methods of Reinforcement Learning Policy Gradient methods, hierarchic methods, POMDPs, relational Reinforcement Learning 5. part: Applications of Reinforcement Learning Robot soccer, Pendulum, RL competition Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)
Further courses on machine learning lecture: machine learning (summer term) lab course: deep learning (Wed., 10-12) Bachelor-/ Master theses, team projects Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (27)
Further readings WWW: D. P. Bertsekas and J.N. Tsitsiklis. Neuro Dynamic Programming. Athena Scientific, Belmont, Massachusetts, 1996. A. Barto and R. Sutton. Reinforcement Learning. MIT Press, Cambridge, Massachusetts, 1998. M. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, New York, 1994. L.P. Kaelbling, M.L. Littman and A.W. Moore. Reinforcement Learning: A survey. Journal of Artificial Intelligence Research, 4:237-285, 1996 M. Wiering (ed.). Reinforcement learning : state-of-the-art. Springer, 2012 http://www-all.cs.umass.edu/rlr/ http://richsutton.com/rl-faq.html Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (28)