Reinforcement Learning

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Axiom 2013 Team Description Paper

Reinforcement Learning by Comparing Immediate Reward

Lecture 10: Reinforcement Learning

TD(λ) and Q-Learning Based Ludo Players

Learning Methods for Fuzzy Systems

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Laboratorio di Intelligenza Artificiale e Robotica

Speeding Up Reinforcement Learning with Behavior Transfer

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Exploration. CS : Deep Reinforcement Learning Sergey Levine

(Sub)Gradient Descent

Laboratorio di Intelligenza Artificiale e Robotica

A Reinforcement Learning Variant for Control Scheduling

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Georgetown University at TREC 2017 Dynamic Domain Track

Seminar - Organic Computing

CSL465/603 - Machine Learning

Artificial Neural Networks written examination

Lecture 1: Basic Concepts of Machine Learning

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

SAM - Sensors, Actuators and Microcontrollers in Mobile Robots

Learning Prospective Robot Behavior

Python Machine Learning

Lecture 1: Machine Learning Basics

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

FF+FPG: Guiding a Policy-Gradient Planner

On the Combined Behavior of Autonomous Resource Management Agents

Learning and Transferring Relational Instance-Based Policies

Knowledge-Based - Systems

Lecture 6: Applications

Intelligent Agents. Chapter 2. Chapter 2 1

Evolutive Neural Net Fuzzy Filtering: Basic Description

Introduction to Simulation

High-level Reinforcement Learning in Strategy Games

Agent-Based Software Engineering

Stochastic Calculus for Finance I (46-944) Spring 2008 Syllabus

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

An OO Framework for building Intelligence and Learning properties in Software Agents

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Robot Shaping: Developing Autonomous Agents through Learning*

Data Fusion Models in WSNs: Comparison and Analysis

A systems engineering laboratory in the context of the Bologna Process

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Improving Fairness in Memory Scheduling

DOCTOR OF PHILOSOPHY HANDBOOK

STA 225: Introductory Statistics (CT)

Modeling user preferences and norms in context-aware systems

Agents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

2017 Florence, Italty Conference Abstract

Improving Action Selection in MDP s via Knowledge Transfer

Action Models and their Induction

Automatic Discretization of Actions and States in Monte-Carlo Tree Search

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Generative models and adversarial training

Evolution of Symbolisation in Chimpanzees and Neural Nets

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

A Case-Based Approach To Imitation Learning in Robotic Agents

Regret-based Reward Elicitation for Markov Decision Processes

Instructional Approach(s): The teacher should introduce the essential question and the standard that aligns to the essential question

AI Agent for Ice Hockey Atari 2600

Causal Link Semantics for Narrative Planning Using Numeric Fluents

Undergraduate Program Guide. Bachelor of Science. Computer Science DEPARTMENT OF COMPUTER SCIENCE and ENGINEERING

Chapter 2. Intelligent Agents. Outline. Agents and environments. Rationality. PEAS (Performance measure, Environment, Actuators, Sensors)

Every curriculum policy starts from this policy and expands the detail in relation to the specific requirements of each policy s field.

An investigation of imitation learning algorithms for structured prediction

COMPUTER-AIDED DESIGN TOOLS THAT ADAPT

Computers Change the World

Top US Tech Talent for the Top China Tech Company

An Investigation into Team-Based Planning

Whole School Literacy Policy 2017/18

LEARNING TO PLAY IN A DAY: FASTER DEEP REIN-

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

XXII BrainStorming Day

Planning with External Events

Fall Classes At A Glance

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Syllabus for CHEM 4660 Introduction to Computational Chemistry Spring 2010

Human Emotion Recognition From Speech

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Applying Fuzzy Rule-Based System on FMEA to Assess the Risks on Project-Based Software Engineering Education

arxiv: v1 [cs.lg] 15 Jun 2015

A student diagnosing and evaluation system for laboratory-based academic exercises

Learning to Schedule Straight-Line Code

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)

Surprise-Based Learning for Autonomous Systems

Math 181, Calculus I

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Multisensor Data Fusion: From Algorithms And Architectural Design To Applications (Devices, Circuits, And Systems)

Transcription:

Reinforcement Learning LU 1 - Introduction Dr. Joschka Bödecker AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg jboedeck@informatik.uni-freiburg.de Acknowledgement Slides courtesy of Martin Riedmiller and Martin Lauer Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (1)

Organisational issues Dr. Joschka Boedecker Room 00010, building 079 jboedeck@informatik.uni-freiburg.de Office hours: Tuesday 2-3 pm no script - slides available online http://ml.informatik.uni-freiburg.de/teaching/ws1516/rl Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (2)

Dates winter term 2015/2016 3+1 Lecture Monday, 14:00 (c.t.) - 15:30, SR 02-017, building 052 Wednesday, 16:00 (s.t) - 17:30, SR 02-017, building 052 Exercise sessions on Wednesday, 16:00-17:30, interleaved with lecture starting at Oct. 28 held by Jan Wülfing, wuelfj@informatik.uni-freiburg.de Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (3)

Goal of this lecture Introduction of learning problem type Reinforcement Learning Introduction to the mathematical basics of an independently learning system. Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (4)

Goal of the 1. unit Motivation, definition and differentiation Outline Examples Solution approaches Machine Learning Reinforcement Learning Overview Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (5)

Example Backgammon Can a program independently learn Backgammon? Learning from success (win) and failure (loss) Neuro-Backgammon: Playing at world champion level (Tesauro, 1992) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (6)

Example pole balancing (control engineering) Can a program independently learn balancing? Learning from success and failure Neural RL Controller: Noise, inaccuracies, unknown behaviour, non-linearities,... (Riedmiller et.al. ) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (7)

Example robot soccer Can programs independently learn how to cooperate? Learning from success and failure Cooperative RL Agents: Complexity, distributed intelligence,... (Riedmiller et.al. ) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (8)

Example: Autonomous (e.g. humanoid) robots Task: Movement control similar to humans (walking, running, playing soccer, cycling, skiing,...) Input: Image from camera Output: Control signals to the joints Problems: very complex consequences of actions hard to predict interference / noise Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (9)

Example: Maze Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (10)

The Agent Concept [Russell and Norvig 1995, page 33] An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through effectors. examples: a human a robot arm an autonomous car a motor controller... Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (11)

Solution approaches in Artificial Intelligence (AI) Planning / search (e.g. A, backtracking) Deduction (e.g. logic programming, predicate logic) Expert systems (e.g. knowledge generated by experts) Fuzzy control systems (fuzzy logic) Genetic algorithms (evolution of solutions) Machine Learning (e.g. reinforcement learning) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (12)

Types of learning (in humans) Learning from a teacher Structuring of objects Learning from experience Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (13)

Types of Machine Learning (ML) Learning with a teacher. Supervised Learning: Examples of input / (target-)output. Goal: generalization (in general not simply memorization) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (14)

Types of Machine Learning (ML) Learning with a teacher. Supervised Learning: Examples of input / (target-)output. Goal: generalization (in general not simply memorization) Structuring / recognition of correlations. Unsupervised learning: Goal: Clustering of similar data points, e.g. for preprocessing. Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (14)

Types of Machine Learning (ML) Learning with a teacher. Supervised Learning: Examples of input / (target-)output. Goal: generalization (in general not simply memorization) Structuring / recognition of correlations. Unsupervised learning: Goal: Clustering of similar data points, e.g. for preprocessing. Learning through reward / penalty. Reinforcement Learning: Prerequisite: Specification of target goal (or events to be avoided).... Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (14)

Machine Learning: ingredients 1. Type of the learning problem (given / seeked) 2. Representation of learned solution knowledge table, rules, linear mapping, neural network,... 3. Solution process (observed data solution) (heuristic) search, gradient descent, optimization technique,... Not at all: For this problem I need a neural network Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (15)

Emphasis of the lecture: Reinforcement Learning No information regarding the solution strategy required Independent learning of a strategy by smart trial of solutions ( trial and error ) Biggest challenge of a learning system Representation of solution knowledge by usage of a function approximator (e.g. tables, linear models, neural networks, etc.) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (16)

RL using the example of autonomous robots bad: Damage (fall,...) good: task done successfully better: fast / low energy / smooth movements /... optimization! Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (17)

Reinforcement Learning (RL) Also: Learning from evaluations, autonomous learning, neuro dynamic programming Defines a learning type and not a method! Central feature: Evaluating training signal - e.g. good / bad RL with immediate evaluation: Decision Evaluation Example: Parameter for a basketball throw RL with rewards delayed in time Decision, decision,..., decision evaluation substantially harder; interesting, because of versatile applications Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (18)

Delayed RL Decision, decision,..., decision evaluation Example: Robotics, control systems, games (chess, backgammon) Basic problem: Temporal credit assignment Basic architecture: Actor-critic system Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (19)

Multistage decision problems Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (20)

Actor-critic system (Barto, Sutton, 1983) Actor: In situation s choose action u (strategy π : S U) Critic: Distribution of the external signal onto single actions Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (21)

Reinforcement Learning 1959 Samuel s Checker-Player: Temporal difference (TD) methods 1968 Michie and Chambers: Boxes 1983 Barto, Sutton s AHC/ACE, 1987 Sutton s TD(λ) Early 90ies: Correlation between dynamic programming (DP) and RL: Werbos, Sutton, Barto, Watkins, Singh, Bertsekas DP - classic optimization technique (late 50ies: Bellman) too much effort for large tasks Advantage: Clean mathematical formulation, convergences 2000 Policy Gradient methods (Sutton et. al, Peters et. al,...) 2005 Fitted Q (Batch DP method) (Ernst et. al, Riedmiller,..) many examples of successful, at least practically relevant applications since Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (22)

Other examples field input goal example output (actions) games board situation winning backgammon, chess valid move robotics sensor data reference value pendulum, robot soccer control variable sequence state gain assembly line, mobile network planning candidate benchmark state goal position maze direction Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (23)

Goal: Autonomous learning system Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (24)

Approach - rough outline Formulation of the learning problem as an optimization task Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (25)

Approach - rough outline Formulation of the learning problem as an optimization task Solution by learning based on the optimization technique of Dynamic Programming Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (25)

Approach - rough outline Formulation of the learning problem as an optimization task Solution by learning based on the optimization technique of Dynamic Programming Difficulties: very large state space process behaviour unknown Application of approximation techniques (e.g. neural networks,...) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (25)

Outline of lecture 1. part: Introduction Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)

Outline of lecture 1. part: Introduction 2. part: Dynamic Programming Markov Decision Problems, Backwards DP, Value Iteration, Policy Iteration Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)

Outline of lecture 1. part: Introduction 2. part: Dynamic Programming Markov Decision Problems, Backwards DP, Value Iteration, Policy Iteration 3. part: Approximate DP / Reinforcement Learning Monte Carlo methods, stochastic approximation, TD(λ), Q-learning Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)

Outline of lecture 1. part: Introduction 2. part: Dynamic Programming Markov Decision Problems, Backwards DP, Value Iteration, Policy Iteration 3. part: Approximate DP / Reinforcement Learning Monte Carlo methods, stochastic approximation, TD(λ), Q-learning 4. part: Advanced methods of Reinforcement Learning Policy Gradient methods, hierarchic methods, POMDPs, relational Reinforcement Learning Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)

Outline of lecture 1. part: Introduction 2. part: Dynamic Programming Markov Decision Problems, Backwards DP, Value Iteration, Policy Iteration 3. part: Approximate DP / Reinforcement Learning Monte Carlo methods, stochastic approximation, TD(λ), Q-learning 4. part: Advanced methods of Reinforcement Learning Policy Gradient methods, hierarchic methods, POMDPs, relational Reinforcement Learning 5. part: Applications of Reinforcement Learning Robot soccer, Pendulum, RL competition Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)

Further courses on machine learning lecture: machine learning (summer term) lab course: deep learning (Wed., 10-12) Bachelor-/ Master theses, team projects Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (27)

Further readings WWW: D. P. Bertsekas and J.N. Tsitsiklis. Neuro Dynamic Programming. Athena Scientific, Belmont, Massachusetts, 1996. A. Barto and R. Sutton. Reinforcement Learning. MIT Press, Cambridge, Massachusetts, 1998. M. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, New York, 1994. L.P. Kaelbling, M.L. Littman and A.W. Moore. Reinforcement Learning: A survey. Journal of Artificial Intelligence Research, 4:237-285, 1996 M. Wiering (ed.). Reinforcement learning : state-of-the-art. Springer, 2012 http://www-all.cs.umass.edu/rlr/ http://richsutton.com/rl-faq.html Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (28)