Reinforcement Learning

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Reinforcement Learning"

Transcription

1 Reinforcement Learning LU 1 - Introduction Dr. Joschka Bödecker AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg Acknowledgement Slides courtesy of Martin Riedmiller and Martin Lauer Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (1)

2 Organisational issues Dr. Joschka Boedecker Room 00010, building 079 Office hours: Tuesday 2-3 pm no script - slides available online Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (2)

3 Dates winter term 2015/ Lecture Monday, 14:00 (c.t.) - 15:30, SR , building 052 Wednesday, 16:00 (s.t) - 17:30, SR , building 052 Exercise sessions on Wednesday, 16:00-17:30, interleaved with lecture starting at Oct. 28 held by Jan Wülfing, Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (3)

4 Goal of this lecture Introduction of learning problem type Reinforcement Learning Introduction to the mathematical basics of an independently learning system. Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (4)

5 Goal of the 1. unit Motivation, definition and differentiation Outline Examples Solution approaches Machine Learning Reinforcement Learning Overview Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (5)

6 Example Backgammon Can a program independently learn Backgammon? Learning from success (win) and failure (loss) Neuro-Backgammon: Playing at world champion level (Tesauro, 1992) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (6)

7 Example pole balancing (control engineering) Can a program independently learn balancing? Learning from success and failure Neural RL Controller: Noise, inaccuracies, unknown behaviour, non-linearities,... (Riedmiller et.al. ) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (7)

8 Example robot soccer Can programs independently learn how to cooperate? Learning from success and failure Cooperative RL Agents: Complexity, distributed intelligence,... (Riedmiller et.al. ) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (8)

9 Example: Autonomous (e.g. humanoid) robots Task: Movement control similar to humans (walking, running, playing soccer, cycling, skiing,...) Input: Image from camera Output: Control signals to the joints Problems: very complex consequences of actions hard to predict interference / noise Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (9)

10 Example: Maze Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (10)

11 The Agent Concept [Russell and Norvig 1995, page 33] An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through effectors. examples: a human a robot arm an autonomous car a motor controller... Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (11)

12 Solution approaches in Artificial Intelligence (AI) Planning / search (e.g. A, backtracking) Deduction (e.g. logic programming, predicate logic) Expert systems (e.g. knowledge generated by experts) Fuzzy control systems (fuzzy logic) Genetic algorithms (evolution of solutions) Machine Learning (e.g. reinforcement learning) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (12)

13 Types of learning (in humans) Learning from a teacher Structuring of objects Learning from experience Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (13)

14 Types of Machine Learning (ML) Learning with a teacher. Supervised Learning: Examples of input / (target-)output. Goal: generalization (in general not simply memorization) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (14)

15 Types of Machine Learning (ML) Learning with a teacher. Supervised Learning: Examples of input / (target-)output. Goal: generalization (in general not simply memorization) Structuring / recognition of correlations. Unsupervised learning: Goal: Clustering of similar data points, e.g. for preprocessing. Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (14)

16 Types of Machine Learning (ML) Learning with a teacher. Supervised Learning: Examples of input / (target-)output. Goal: generalization (in general not simply memorization) Structuring / recognition of correlations. Unsupervised learning: Goal: Clustering of similar data points, e.g. for preprocessing. Learning through reward / penalty. Reinforcement Learning: Prerequisite: Specification of target goal (or events to be avoided).... Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (14)

17 Machine Learning: ingredients 1. Type of the learning problem (given / seeked) 2. Representation of learned solution knowledge table, rules, linear mapping, neural network, Solution process (observed data solution) (heuristic) search, gradient descent, optimization technique,... Not at all: For this problem I need a neural network Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (15)

18 Emphasis of the lecture: Reinforcement Learning No information regarding the solution strategy required Independent learning of a strategy by smart trial of solutions ( trial and error ) Biggest challenge of a learning system Representation of solution knowledge by usage of a function approximator (e.g. tables, linear models, neural networks, etc.) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (16)

19 RL using the example of autonomous robots bad: Damage (fall,...) good: task done successfully better: fast / low energy / smooth movements /... optimization! Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (17)

20 Reinforcement Learning (RL) Also: Learning from evaluations, autonomous learning, neuro dynamic programming Defines a learning type and not a method! Central feature: Evaluating training signal - e.g. good / bad RL with immediate evaluation: Decision Evaluation Example: Parameter for a basketball throw RL with rewards delayed in time Decision, decision,..., decision evaluation substantially harder; interesting, because of versatile applications Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (18)

21 Delayed RL Decision, decision,..., decision evaluation Example: Robotics, control systems, games (chess, backgammon) Basic problem: Temporal credit assignment Basic architecture: Actor-critic system Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (19)

22 Multistage decision problems Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (20)

23 Actor-critic system (Barto, Sutton, 1983) Actor: In situation s choose action u (strategy π : S U) Critic: Distribution of the external signal onto single actions Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (21)

24 Reinforcement Learning 1959 Samuel s Checker-Player: Temporal difference (TD) methods 1968 Michie and Chambers: Boxes 1983 Barto, Sutton s AHC/ACE, 1987 Sutton s TD(λ) Early 90ies: Correlation between dynamic programming (DP) and RL: Werbos, Sutton, Barto, Watkins, Singh, Bertsekas DP - classic optimization technique (late 50ies: Bellman) too much effort for large tasks Advantage: Clean mathematical formulation, convergences 2000 Policy Gradient methods (Sutton et. al, Peters et. al,...) 2005 Fitted Q (Batch DP method) (Ernst et. al, Riedmiller,..) many examples of successful, at least practically relevant applications since Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (22)

25 Other examples field input goal example output (actions) games board situation winning backgammon, chess valid move robotics sensor data reference value pendulum, robot soccer control variable sequence state gain assembly line, mobile network planning candidate benchmark state goal position maze direction Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (23)

26 Goal: Autonomous learning system Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (24)

27 Approach - rough outline Formulation of the learning problem as an optimization task Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (25)

28 Approach - rough outline Formulation of the learning problem as an optimization task Solution by learning based on the optimization technique of Dynamic Programming Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (25)

29 Approach - rough outline Formulation of the learning problem as an optimization task Solution by learning based on the optimization technique of Dynamic Programming Difficulties: very large state space process behaviour unknown Application of approximation techniques (e.g. neural networks,...) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (25)

30 Outline of lecture 1. part: Introduction Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)

31 Outline of lecture 1. part: Introduction 2. part: Dynamic Programming Markov Decision Problems, Backwards DP, Value Iteration, Policy Iteration Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)

32 Outline of lecture 1. part: Introduction 2. part: Dynamic Programming Markov Decision Problems, Backwards DP, Value Iteration, Policy Iteration 3. part: Approximate DP / Reinforcement Learning Monte Carlo methods, stochastic approximation, TD(λ), Q-learning Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)

33 Outline of lecture 1. part: Introduction 2. part: Dynamic Programming Markov Decision Problems, Backwards DP, Value Iteration, Policy Iteration 3. part: Approximate DP / Reinforcement Learning Monte Carlo methods, stochastic approximation, TD(λ), Q-learning 4. part: Advanced methods of Reinforcement Learning Policy Gradient methods, hierarchic methods, POMDPs, relational Reinforcement Learning Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)

34 Outline of lecture 1. part: Introduction 2. part: Dynamic Programming Markov Decision Problems, Backwards DP, Value Iteration, Policy Iteration 3. part: Approximate DP / Reinforcement Learning Monte Carlo methods, stochastic approximation, TD(λ), Q-learning 4. part: Advanced methods of Reinforcement Learning Policy Gradient methods, hierarchic methods, POMDPs, relational Reinforcement Learning 5. part: Applications of Reinforcement Learning Robot soccer, Pendulum, RL competition Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)

35 Further courses on machine learning lecture: machine learning (summer term) lab course: deep learning (Wed., 10-12) Bachelor-/ Master theses, team projects Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (27)

36 Further readings WWW: D. P. Bertsekas and J.N. Tsitsiklis. Neuro Dynamic Programming. Athena Scientific, Belmont, Massachusetts, A. Barto and R. Sutton. Reinforcement Learning. MIT Press, Cambridge, Massachusetts, M. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, New York, L.P. Kaelbling, M.L. Littman and A.W. Moore. Reinforcement Learning: A survey. Journal of Artificial Intelligence Research, 4: , 1996 M. Wiering (ed.). Reinforcement learning : state-of-the-art. Springer, Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (28)

Sequential decision making under uncertainty

Sequential decision making under uncertainty Sequential decision making under uncertainty Matthijs Spaan Francisco S. Melo Institute for Systems and Robotics Instituto Superior Técnico Lisbon, Portugal Reading group meeting, January 4, 2007 1/20

More information

Machine Learning. Outline. Reinforcement learning 2. Defining an RL problem. Solving an RL problem. Miscellaneous. Eric Xing /15

Machine Learning. Outline. Reinforcement learning 2. Defining an RL problem. Solving an RL problem. Miscellaneous. Eric Xing /15 Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Reinforcement learning 2 Eric Xing Lecture 28, April 30, 2008 Reading: Chap. 13, T.M. book Eric Xing 1 Outline Defining an RL problem Markov Decision

More information

Reinforcement Learning with Deep Architectures

Reinforcement Learning with Deep Architectures 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Maria-Florina Balcan Carnegie Mellon University April 20, 2015 Today: Learning of control policies Markov Decision Processes Temporal difference learning Q learning Readings: Mitchell,

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Lecture 1: Introduction Vien Ngo MLR, University of Stuttgart What is Reinforcement Learning? Reinforcement Learning is a subfield of Machine Learning from David Silver s lecture

More information

A Reinforcement Learning Algorithm in Cooperative Multi-Robot Domains

A Reinforcement Learning Algorithm in Cooperative Multi-Robot Domains Journal of Intelligent and Robotic Systems (2005) 43: 161 174 Springer 2005 DOI: 10.1007/s10846-005-5137-x A Reinforcement Learning Algorithm in Cooperative Multi-Robot Domains FERNANDO FERNÁNDEZ and DANIEL

More information

CS 242 Final Project: Reinforcement Learning. Albert Robinson May 7, 2002

CS 242 Final Project: Reinforcement Learning. Albert Robinson May 7, 2002 CS 242 Final Project: Reinforcement Learning Albert Robinson May 7, 2002 Introduction Reinforcement learning is an area of machine learning in which an agent learns by interacting with its environment.

More information

Reinforcement Learning and Markov Decision Processes

Reinforcement Learning and Markov Decision Processes Reinforcement Learning and Markov Decision Processes Ronald J. Williams CSG0, Spring 007 Contains a few slides adapted from two related Andrew Moore tutorials found at http://www.cs.cmu.edu/~awm/tutorials

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Introduction Daniel Hennes 17.04.2017 University Stuttgart - IPVS - Machine Learning & Robotics 1 What is reinforcement learning? General-purpose framework for decision-making Autonomous

More information

Intro to Reinforcement Learning. Part 2: Ideas and Examples

Intro to Reinforcement Learning. Part 2: Ideas and Examples Intro to Reinforcement Learning Part 2: Ideas and Examples Psychology Artificial Intelligence Reinforcement Learning Neuroscience Control Theory Reinforcement learning The engineering endeavor most closely

More information

Learning Agents: Introduction

Learning Agents: Introduction Learning Agents: Introduction S Luz luzs@cs.tcd.ie October 28, 2014 Learning in agent architectures Agent Learning in agent architectures Agent Learning in agent architectures Agent perception Learning

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Exploration vs. Exploitation. CS 473: Artificial Intelligence Reinforcement Learning II. How to Explore? Exploration Functions

Exploration vs. Exploitation. CS 473: Artificial Intelligence Reinforcement Learning II. How to Explore? Exploration Functions CS 473: Artificial Intelligence Reinforcement Learning II Exploration vs. Exploitation Dieter Fox / University of Washington [Most slides were taken from Dan Klein and Pieter Abbeel / CS188 Intro to AI

More information

ECE 517: Reinforcement Learning in Artificial Intelligence

ECE 517: Reinforcement Learning in Artificial Intelligence ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 14: Planning and Learning October 27, 2015 Dr. Itamar Arel College of Engineering Department of Electrical Engineering and Computer Science

More information

REINFORCEMENT LEARNING OF STRATEGIES FOR SETTLERS OF CATAN

REINFORCEMENT LEARNING OF STRATEGIES FOR SETTLERS OF CATAN REINFORCEMENT LEARNING OF STRATEGIES FOR SETTLERS OF CATAN Michael Pfeiffer Institute for Theoretical Computer Science Graz University of Technology A 8010, Graz Austria E-mail: pfeiffer@igi.tugraz.at

More information

Neural Reinforcement Learning to Swing-up and Balance a Real Pole

Neural Reinforcement Learning to Swing-up and Balance a Real Pole Neural Reinforcement Learning to Swing-up and Balance a Real Pole Martin Riedmiller Neuroinformatics Group University of Osnabrueck 49069 Osnabrueck martin.riedmiller@uos.de Abstract This paper proposes

More information

Lecture Overview. Introduction to Artificial Intelligence COMP 3501 / COMP Lecture 1. Artificial Intelligence.

Lecture Overview. Introduction to Artificial Intelligence COMP 3501 / COMP Lecture 1. Artificial Intelligence. Lecture Overview COMP 3501 / COMP 4704-4 Lecture 1 Prof. JGH 318 What is AI? AI History Views/goals of AI Course Overview Artificial Intelligence As humans we have intelligence But what is intelligence?

More information

CPSC 533 Reinforcement Learning. Paul Melenchuk Eva Wong Winson Yuen Kenneth Wong

CPSC 533 Reinforcement Learning. Paul Melenchuk Eva Wong Winson Yuen Kenneth Wong CPSC 533 Reinforcement Learning Paul Melenchuk Eva Wong Winson Yuen Kenneth Wong Outline Introduction Passive Learning in an Known Environment Passive Learning in an Unknown Environment Active Learning

More information

ICRA 2012 Tutorial on Reinforcement Learning 4. Value Function Methods

ICRA 2012 Tutorial on Reinforcement Learning 4. Value Function Methods ICRA 2012 Tutorial on Reinforcement Learning 4. Value Function Methods Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt A Reinforcement Learning Ontology Prior Knowledge Data { (x t, u t, x t+1, r t )

More information

based on Q-Learning and Self-organizing Control

based on Q-Learning and Self-organizing Control ICROS-SICE International Joint Conference 2009 August 18-21, 2009, Fukuoka International Congress Center, Japan Intelligent Navigation and Control of an Autonomous Underwater Vehicle based on Q-Learning

More information

Axiom 2013 Team Description Paper

Axiom 2013 Team Description Paper Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association

More information

Models. Chapter 9: Planning and Learning. Planning Cont. Planning. for all s, s!, and a "A(s)! Sample model: produces sample experiences

Models. Chapter 9: Planning and Learning. Planning Cont. Planning. for all s, s!, and a A(s)! Sample model: produces sample experiences Chapter 9: Planning and Learning Models Objectives of this chapter:! Use of environment models! Integration of planning and learning methods! Model: anything the agent can use to predict how the environment

More information

Reinforcement Learning by Comparing Immediate Reward

Reinforcement Learning by Comparing Immediate Reward Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate

More information

Play Ms. Pac-Man using an advanced reinforcement learning agent

Play Ms. Pac-Man using an advanced reinforcement learning agent Play Ms. Pac-Man using an advanced reinforcement learning agent Nikolaos Tziortziotis Konstantinos Tziortziotis Konstantinos Blekas March 3, 2014 Abstract Reinforcement Learning (RL) algorithms have been

More information

A Distriubuted Implementation for Reinforcement Learning

A Distriubuted Implementation for Reinforcement Learning A Distriubuted Implementation for Reinforcement Learning Yi-Chun Chen 1 and Yu-Sheng Chen 1 1 ICME, Stanford University Abstract. In this CME323 project, we implement a distributed algorithm for model-free

More information

Deep reinforcement learning

Deep reinforcement learning Deep reinforcement learning Function approximation So far, we ve assumed a lookup table representation for utility function U(s) or actionutility function Q(s,a) This does not work if the state space is

More information

10 Markov Decision Process

10 Markov Decision Process 10 Markov Decision Process This chapter is an introduction to a generalization of supervised learning where feedback is only given, possibly with delay, in form of reward or punishment. The goal of this

More information

Chapter 11: Case Studies

Chapter 11: Case Studies Chapter 11: Case Studies Objectives of this chapter: Illustrate trade-offs and issues that arise in real applications Illustrate use of domain knowledge Illustrate representation development Some historical

More information

20.3 The EM algorithm

20.3 The EM algorithm 20.3 The EM algorithm Many real-world problems have hidden (latent) variables, which are not observable in the data that are available for learning Including a latent variable into a Bayesian network may

More information

An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning

An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning Michael Bowling Manuela Veloso October, 2000 CMU-CS-00-165 School of Computer Science Carnegie Mellon University Pittsburgh,

More information

Artificial Intelligence Recap. Mausam

Artificial Intelligence Recap. Mausam Artificial Intelligence Recap Mausam What is intelligence? (bounded) Rationality We have a performance measure to optimize Given our state of knowledge Choose optimal action Given limited computational

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Policy Op4miza4on and Planning (Material not examinable) Subramanian Ramamoorthy School of Informa4cs 31 March, 2017 Plan for Lecture: Policies and Plans Policy Op5miza5on Policies

More information

Mark Hammond Co-founder / CEO. Performant deep reinforcement learning: latency, hazards, and pipeline stalls in the GPU era and how to avoid them 0

Mark Hammond Co-founder / CEO. Performant deep reinforcement learning: latency, hazards, and pipeline stalls in the GPU era and how to avoid them 0 Performant deep reinforcement learning: latency, hazards, and pipeline stalls in the GPU era and how to avoid them Mark Hammond Co-founder / CEO Performant deep reinforcement learning: latency, hazards,

More information

Multi-Agent Systems. Bernhard Nebel, Felix Lindner, and Thorsten Engesser. Summer Term Albert-Ludwigs-Universität Freiburg

Multi-Agent Systems. Bernhard Nebel, Felix Lindner, and Thorsten Engesser. Summer Term Albert-Ludwigs-Universität Freiburg Multi-Agent Systems Albert-Ludwigs-Universität Freiburg Bernhard Nebel, Felix Lindner, and Thorsten Engesser Summer Term 2017 Lecturers Prof. Dr. Bernhard Nebel Room 52-00-028 Phone: 0761/203-8221 email:

More information

Reinforcement Learning

Reinforcement Learning Artificial Intelligence Topic 8 Reinforcement Learning passive learning in a known environment passive learning in unknown environments active learning exploration learning action-value functions generalisation

More information

Brief Overview of Adaptive and Learning Control

Brief Overview of Adaptive and Learning Control 1.10.2007 Outline Introduction Outline Introduction Introduction Outline Introduction Introduction Definition of Adaptive Control Definition of Adaptive Control Zames (reported by Dumont&Huzmezan): A non-adaptive

More information

Robot Learning. Denition. Robot Learning Systems

Robot Learning. Denition. Robot Learning Systems Robot Learning Jan Peters, Max Planck Institute for Biological Cybernetics Russ Tedrake, Massachusetts Institute of Technology Nick Roy, Massachusetts Institute of Technology Jun Morimoto, Advanced Telecommunication

More information

CS 520: Introduction to Artificial Intelligence CS 520

CS 520: Introduction to Artificial Intelligence CS 520 CS 520: Introduction to Artificial Intelligence Prof. Louis Steinberg 1 Prof. Louis Steinberg CS 520 401 Hill, 445-3581, lou@cs Office hours: Thursday 1-3pm and by appointment TA: Xiaolei Huang (xiaolei@paul)

More information

Neural Dynamics and Reinforcement Learning

Neural Dynamics and Reinforcement Learning Neural Dynamics and Reinforcement Learning Presented By: Matthew Luciw DFT SUMMER SCHOOL, 2013 IDSIA Istituto Dalle Molle Di Studi sull Intelligenza Artificiale IDSIA Lugano, Switzerland www.idsia.ch Our

More information

r t +1 s t +1 TD Prediction Chapter 6: Temporal Difference Learning [ ] [ ] Simplest TD Method Simple Monte Carlo

r t +1 s t +1 TD Prediction Chapter 6: Temporal Difference Learning [ ] [ ] Simplest TD Method Simple Monte Carlo Chapter 6: emporal Difference Learning D Prediction Objectives of this chapter: Policy Evaluation (the prediction problem: for a given policy!, compute the state-value function V!! Introduce emporal Difference

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2011-2012 Lecture 11: 21 May 2012 Unsupervised Learning (cont ) Slides

More information

A Production Scheduling Strategy for an Assembly Plant based on Reinforcement Learning

A Production Scheduling Strategy for an Assembly Plant based on Reinforcement Learning A Production Scheduling Strategy for an Assembly Plant based on Reinforcement Learning DRANIDIS D., KEHRIS E. Computer Science Department CITY LIBERAL STUDIES - Affiliated College of the University of

More information

TD Gammon. Chapter 11: Case Studies. A Few Details. Multi-layer Neural Network. Tesauro 1992, 1994, 1995,... Objectives of this chapter:

TD Gammon. Chapter 11: Case Studies. A Few Details. Multi-layer Neural Network. Tesauro 1992, 1994, 1995,... Objectives of this chapter: Objectives of this chapter: Chapter 11: Case Studies! Illustrate trade-offs and issues that arise in real applications! Illustrate use of domain knowledge! Illustrate representation development! Some historical

More information

Lecture 10: Reinforcement Learning

Lecture 10: Reinforcement Learning Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation

More information

Vision-Based Reinforcement Learning Using A Consolidated Actor-Critic Model

Vision-Based Reinforcement Learning Using A Consolidated Actor-Critic Model University of Tennessee, Knoxville Trace: Tennessee Research and Creative Exchange Masters Theses Graduate School 12-2009 Vision-Based Reinforcement Learning Using A Consolidated Actor-Critic Model Christopher

More information

Intelligent Tutoring Systems using Reinforcement Learning to teach Autistic Students

Intelligent Tutoring Systems using Reinforcement Learning to teach Autistic Students Intelligent Tutoring Systems using Reinforcement Learning to teach Autistic Students B. H. Sreenivasa Sarma 1 and B. Ravindran 2 Department of Computer Science and Engineering, Indian Institute of Technology

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning CITS3001 Algorithms, Agents and Artificial Intelligence Tim French School of Computer Science and Software Engineering The University of Western Australia 2017, Semester 2 Introduc)on

More information

Reinforcement Learning in Cooperative Multi Agent Systems

Reinforcement Learning in Cooperative Multi Agent Systems Reinforcement Learning in Cooperative Multi Agent Systems Hao Ren haoren@cs.ubc.ca Abstract Reinforcement Learning is used in cooperative multi agent systems differently for various problems. We provide

More information

Multi-Agent Reinforcement Learning in Games

Multi-Agent Reinforcement Learning in Games Multi-Agent Reinforcement Learning in Games by Xiaosong Lu, M.A.Sc. A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of the requirements for the degree of Doctor

More information

The Open-Source TEXPLORE Code Release for Reinforcement Learning on Robots

The Open-Source TEXPLORE Code Release for Reinforcement Learning on Robots In RoboCup-2013 Robot Soccer World Cup XVII, Lecture Notes in Artificial Intelligence, Springer Verlag, Berlin, 2013. The Open-Source TEXPLORE Code Release for Reinforcement Learning on Robots Todd Hester

More information

EECS 349 Machine Learning

EECS 349 Machine Learning EECS 349 Machine Learning Instructor: Doug Downey (some slides from Pedro Domingos, University of Washington) 1 Logistics Instructor: Doug Downey Email: ddowney@eecs.northwestern.edu Office hours: Mondays

More information

Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems

Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems Spiros Kapetanakis and Daniel Kudenko {spiros, kudenko}@cs.york.ac.uk Department of Computer Science University of

More information

Continuous reinforcement learning in cognitive robotics

Continuous reinforcement learning in cognitive robotics Continuous reinforcement learning in cognitive robotics Igor Farkaš CNC research group Department of Applied Informatics / Centre for Cognitive Science FMFI, Comenius University in Bratislava AI seminar,

More information

The Implementation of Machine Learning in the Game of Checkers

The Implementation of Machine Learning in the Game of Checkers The Implementation of Machine Learning in the Game of Checkers William Melicher Computer Systems Lab Thomas Jefferson June 9, 2009 Abstract Most games have a set algorithm that does not change. This means

More information

TD(λ) and Q-Learning Based Ludo Players

TD(λ) and Q-Learning Based Ludo Players TD(λ) and Q-Learning Based Ludo Players Majed Alhajry, Faisal Alvi, Member, IEEE and Moataz Ahmed Abstract Reinforcement learning is a popular machine learning technique whose inherent self-learning ability

More information

CMPUT 609/499: Reinforcement Learning for Artificial Intelligence. Instructor: Rich Sutton Dept of Computing Science richsutton.

CMPUT 609/499: Reinforcement Learning for Artificial Intelligence. Instructor: Rich Sutton Dept of Computing Science richsutton. CMPUT 609/499: Reinforcement Learning for Artificial Intelligence Instructor: Rich Sutton Dept of Computing Science richsutton.com 1 What is Reinforcement Learning? Agent-oriented learning learning by

More information

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning Introduction to Reinforcement Learning A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course A Bit of History From Psychology to Machine Learning A. LAZARIC Introduction

More information

Form 4.2. Faculty member + student

Form 4.2. Faculty member + student Form 4.2 Faculty member + student Course syllabus for Artificial Intelligence-CS370D 1. Faculty member information: Name of faculty member responsible for the course Dr.Abeer Mahmoud Office Hours Office

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

Reinforcement Learning with Randomization, Memory, and Prediction

Reinforcement Learning with Randomization, Memory, and Prediction Reinforcement Learning with Randomization, Memory, and Prediction Radford M. Neal, University of Toronto Dept. of Statistical Sciences and Dept. of Computer Science http://www.cs.utoronto.ca/ radford CRM

More information

EECS 349 Machine Learning

EECS 349 Machine Learning EECS 349 Machine Learning Instructor: Doug Downey (some slides from Pedro Domingos, University of Washington) 1 Logistics Instructor: Doug Downey Email: ddowney@eecs.northwestern.edu Office hours: Mondays

More information

Learning. Part 6 in Russell / Norvig Book

Learning. Part 6 in Russell / Norvig Book Wisdom is not the product of schooling but the lifelong attempt to acquire it. - Albert Einstein Learning Part 6 in Russell / Norvig Book Gerhard Fischer AI Course, Fall 1996, Lecture October 14 1 Overview

More information

Lecture 29: Artificial Intelligence

Lecture 29: Artificial Intelligence Lecture 29: Artificial Intelligence Marvin Zhang 08/10/2016 Some slides are adapted from CS 188 (Artificial Intelligence) Announcements Roadmap Introduction Functions Data Mutability Objects This week

More information

11. Reinforcement Learning

11. Reinforcement Learning Artificial Intelligence 11. Reinforcement Learning prof. dr. sc. Bojana Dalbelo Bašić doc. dr. sc. Jan Šnajder University of Zagreb Faculty of Electrical Engineering and Computing (FER) Academic Year 2015/2016

More information

Machine Learning. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 15 Table of contents 1 What is machine learning?

More information

University of Alberta. Reinforcement Learning and Simulation-Based Search in Computer Go. David Silver

University of Alberta. Reinforcement Learning and Simulation-Based Search in Computer Go. David Silver University of Alberta Reinforcement Learning and Simulation-Based Search in Computer Go by David Silver A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the

More information

Deep Learning for AI Yoshua Bengio. August 28th, DS3 Data Science Summer School

Deep Learning for AI Yoshua Bengio. August 28th, DS3 Data Science Summer School Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School A new revolution seems to be in the work after the industrial revolution. And Machine Learning, especially Deep Learning,

More information

Load Forecasting with Artificial Intelligence on Big Data

Load Forecasting with Artificial Intelligence on Big Data 1 Load Forecasting with Artificial Intelligence on Big Data October 9, 2016 Patrick GLAUNER and Radu STATE SnT - Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg 2

More information

Developing Focus of Attention Strategies Using Reinforcement Learning

Developing Focus of Attention Strategies Using Reinforcement Learning Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019 Developing Focus of Attention Strategies Using Reinforcement Learning Srividhya Rajendran rajendra@cse.uta.edu

More information

Reinforcement learning (Chapter 21)

Reinforcement learning (Chapter 21) Reinforcement learning (Chapter 21) Reinforcement learning Regular MDP Given: Transition model P(s s, a) Reward function R(s) Find: Policy π(s) Reinforcement learning Transition model and reward function

More information

Learning and Planning with Tabular Methods

Learning and Planning with Tabular Methods Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Learning and Planning with Tabular Methods Lecture 6, CMU 10703 Katerina Fragkiadaki What can I learn by interacting with

More information

Reinforcement Learning in Multidimensional Continuous Action Spaces

Reinforcement Learning in Multidimensional Continuous Action Spaces Reinforcement Learning in Multidimensional Continuous Action Spaces Jason Pazis Department of Computer Science Duke University Durham, NC 27708 0129, USA Email: jpazis@cs.duke.edu Michail G. Lagoudakis

More information

Reinforcement Learning II

Reinforcement Learning II CSC411 Fall 2015 Machine Learning & Data Mining Reinforcement Learning II Slides from Rich Zemel Formula(ng Reinforcement Learning World described by a discrete, 0inite set of states and actions At every

More information

Intelligent monitoring and maintenance of power plants

Intelligent monitoring and maintenance of power plants Intelligent monitoring and maintenance of power plants Dimitrios Kalles 1, Anna Stathaki 1 and Robert E. King 2 1 Computer Technology Institute, PO Box 1122, 261 10, Patras 2 Department of Electrical &

More information

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com

More information

Contents. Chapter 1: Introduction to Artificial Intelligence and Soft Computing

Contents. Chapter 1: Introduction to Artificial Intelligence and Soft Computing Contents Chapter 1: Introduction to Artificial Intelligence and Soft Computing 1.1 Evolution of Computing 1.2 Defining AI 1.3 General Problem Solving Approaches in AI 1.4 The Disciplines of AI 1.4.1 The

More information

THE DESIGN OF A LEARNING SYSTEM Lecture 2

THE DESIGN OF A LEARNING SYSTEM Lecture 2 THE DESIGN OF A LEARNING SYSTEM Lecture 2 Challenge: Design a Learning System for Checkers What training experience should the system have? A design choice with great impact on the outcome Choice #1: Direct

More information

M. R. Ahmadzadeh Isfahan University of Technology. M. R. Ahmadzadeh Isfahan University of Technology

M. R. Ahmadzadeh Isfahan University of Technology. M. R. Ahmadzadeh Isfahan University of Technology 1 2 M. R. Ahmadzadeh Isfahan University of Technology Ahmadzadeh@cc.iut.ac.ir M. R. Ahmadzadeh Isfahan University of Technology Textbooks 3 Introduction to Machine Learning - Ethem Alpaydin Pattern Recognition

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning based Dialog Manager Speech Group Department of Signal Processing and Acoustics Katri Leino User Interface Group Department of Communications and Networking Aalto University, School

More information

Scaling Up RL Using Evolution Strategies. Tim Salimans, Jonathan Ho, Peter Chen, Szymon Sidor, Ilya Sutskever

Scaling Up RL Using Evolution Strategies. Tim Salimans, Jonathan Ho, Peter Chen, Szymon Sidor, Ilya Sutskever Scaling Up RL Using Evolution Strategies Tim Salimans, Jonathan Ho, Peter Chen, Szymon Sidor, Ilya Sutskever Reinforcement Learning = AI? Definition of RL broad enough to capture all that is needed for

More information

Deep Reinforcement Learning CS

Deep Reinforcement Learning CS Deep Reinforcement Learning CS 294-112 Course logistics Class Information & Resources Sergey Levine Assistant Professor UC Berkeley Abhishek Gupta PhD Student UC Berkeley Josh Achiam PhD Student UC Berkeley

More information

Learning to Communicate and Act using Hierarchical Reinforcement Learning

Learning to Communicate and Act using Hierarchical Reinforcement Learning Learning to Communicate and Act using Hierarchical Reinforcement Learning Mohammad Ghavamzadeh & Sridhar Mahadevan Department of Computer Science, University of Massachusetts Amherst, MA 01003-4610, USA

More information

Introduction to Machine Learning Reykjavík University Spring Instructor: Dan Lizotte

Introduction to Machine Learning Reykjavík University Spring Instructor: Dan Lizotte Introduction to Machine Learning Reykjavík University Spring 2007 Instructor: Dan Lizotte Logistics To contact Dan: dlizotte@cs.ualberta.ca http://www.cs.ualberta.ca/~dlizotte/teaching/ Books: Introduction

More information

Improving Convergence of Deterministic. Policy Gradient Algorithms in. Reinforcement Learning

Improving Convergence of Deterministic. Policy Gradient Algorithms in. Reinforcement Learning Department of Electronic and Electrical Engineering University College London Improving Convergence of Deterministic Policy Gradient Algorithms in Reinforcement Learning Final Report Riashat Islam Supervisor:

More information

Laboratorio di Intelligenza Artificiale e Robotica

Laboratorio di Intelligenza Artificiale e Robotica Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning

More information

Learning Policies by Imitating Optimal Control. CS : Deep Reinforcement Learning Week 3, Lecture 2 Sergey Levine

Learning Policies by Imitating Optimal Control. CS : Deep Reinforcement Learning Week 3, Lecture 2 Sergey Levine Learning Policies by Imitating Optimal Control CS 294-112: Deep Reinforcement Learning Week 3, Lecture 2 Sergey Levine Overview 1. Last time: learning models of system dynamics and using optimal control

More information

Introduction to AI & Intelligent Agents

Introduction to AI & Intelligent Agents Introduction to AI & Intelligent Agents This Lecture Chapters 1 and 2 Next Lecture Chapter 3.1 to 3.4 (Please read lecture topic material before and after each lecture on that topic) What is Artificial

More information

D-VisionDraughts: a Draughts Player Neural Network That Learns by Reinforcement in a High Performance Environment

D-VisionDraughts: a Draughts Player Neural Network That Learns by Reinforcement in a High Performance Environment D-VisionDraughts: a Draughts Player Neural Network That Learns by Reinforcement in a High Performance Environment Ayres Roberto Araújo Barcelos 1, Rita Maria Silva Julia 1 and Rivalino Matias Júnior 1

More information

Lecture I Outline. Course information and details Why do machine learning? What is machine learning? Why now? Type of Learning

Lecture I Outline. Course information and details Why do machine learning? What is machine learning? Why now? Type of Learning Lecture I Outline Course information and details Why do machine learning? What is machine learning? Why now? Type of Learning Association Classification Three types: Linear, Decision Tree, and Nearest

More information

INTRODUCTION TO DATA SCIENCE

INTRODUCTION TO DATA SCIENCE DATA11001 INTRODUCTION TO DATA SCIENCE EPISODE 6: MACHINE LEARNING TODAY S MENU 1. WHAT IS ML? 2. CLASSIFICATION AND REGRESSSION 3. EVALUATING PERFORMANCE & OVERFITTING WHAT IS MACHINE LEARNING? Definition:

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Hierarchical RL and Transfer Learning Used Materials Disclaimer: Some of the material was

More information

CS534 Machine Learning

CS534 Machine Learning CS534 Machine Learning Spring 2013 Lecture 1: Introduction to ML Course logistics Reading: The discipline of Machine learning by Tom Mitchell Course Information Instructor: Dr. Xiaoli Fern Kec 3073, xfern@eecs.oregonstate.edu

More information

Neuro-Fuzzy and Soft Computing chapter 1 J.-S.R. Jang

Neuro-Fuzzy and Soft Computing chapter 1 J.-S.R. Jang Neuro-Fuzzy and chapter 1 J.-S.R. Jang Bill Cheetham Kai Goebel 1 What is covered in this class? We will teach techniques useful in creating intelligent software systems that can deal with the uncertainty

More information

What does Shaping Mean for Computational Reinforcement Learning?

What does Shaping Mean for Computational Reinforcement Learning? What does Shaping Mean for Computational Reinforcement Learning? Tom Erez and William D. Smart Dept. of Computer Science and Engineering Washington University in St. Louis Email: {etom,wds}@cse.wustl.edu

More information

Online Robot Learning by Reward and Punishment for a Mobile Robot

Online Robot Learning by Reward and Punishment for a Mobile Robot Online Robot Learning by Reward and Punishment for a Mobile Robot Dejvuth Suwimonteerabuth, Prabhas Chongstitvatana Department of Computer Engineering Chulalongkorn University, Bangkok, Thailand prabhas@chula.ac.th

More information

Deep Reinforcement Learning From Raw Pixels in Doom

Deep Reinforcement Learning From Raw Pixels in Doom Deep Reinforcement Learning From Raw Pixels in Doom Danijar Hafner arxiv:1610.02164v1 [cs.lg] 7 Oct 2016 July 2016 A thesis submitted for the degree of Bachelor of Science Hasso Plattner Institute, Potsdam

More information

Lecture 1: Introduc4on

Lecture 1: Introduc4on CSC2515 Spring 2014 Introduc4on to Machine Learning Lecture 1: Introduc4on All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

In-depth: Deep learning (one lecture) Applied to both SL and RL above Code examples

In-depth: Deep learning (one lecture) Applied to both SL and RL above Code examples Introduction to machine learning (two lectures) Supervised learning Reinforcement learning (lab) In-depth: Deep learning (one lecture) Applied to both SL and RL above Code examples 2017-09-30 2 1 To enable

More information

Introduction to Deep Learning

Introduction to Deep Learning Introduction to Deep Learning M S Ram Dept. of Computer Science & Engg. Indian Institute of Technology Kanpur Reading of Chap. 1 from Learning Deep Architectures for AI ; Yoshua Bengio; FTML Vol. 2, No.

More information

1.5. game points #games. #games PIPE 1-Player 1.5. game points 0.5

1.5. game points #games. #games PIPE 1-Player 1.5. game points 0.5 CMAC Models Learn to Play Soccer Proceedings of the 8th International Conference on Articial Neural Networks (ICANN'98), L. Niklasson and M. Boden and T. Ziemkei (eds.), Springer-Verlag, London, pages

More information