Reinforcement Learning


 Tracy Bruce
 10 months ago
 Views:
Transcription
1 Reinforcement Learning LU 1  Introduction Dr. Joschka Bödecker AG Maschinelles Lernen und Natürlichsprachliche Systeme AlbertLudwigsUniversität Freiburg Acknowledgement Slides courtesy of Martin Riedmiller and Martin Lauer Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (1)
2 Organisational issues Dr. Joschka Boedecker Room 00010, building 079 Office hours: Tuesday 23 pm no script  slides available online Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (2)
3 Dates winter term 2015/ Lecture Monday, 14:00 (c.t.)  15:30, SR , building 052 Wednesday, 16:00 (s.t)  17:30, SR , building 052 Exercise sessions on Wednesday, 16:0017:30, interleaved with lecture starting at Oct. 28 held by Jan Wülfing, Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (3)
4 Goal of this lecture Introduction of learning problem type Reinforcement Learning Introduction to the mathematical basics of an independently learning system. Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (4)
5 Goal of the 1. unit Motivation, definition and differentiation Outline Examples Solution approaches Machine Learning Reinforcement Learning Overview Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (5)
6 Example Backgammon Can a program independently learn Backgammon? Learning from success (win) and failure (loss) NeuroBackgammon: Playing at world champion level (Tesauro, 1992) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (6)
7 Example pole balancing (control engineering) Can a program independently learn balancing? Learning from success and failure Neural RL Controller: Noise, inaccuracies, unknown behaviour, nonlinearities,... (Riedmiller et.al. ) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (7)
8 Example robot soccer Can programs independently learn how to cooperate? Learning from success and failure Cooperative RL Agents: Complexity, distributed intelligence,... (Riedmiller et.al. ) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (8)
9 Example: Autonomous (e.g. humanoid) robots Task: Movement control similar to humans (walking, running, playing soccer, cycling, skiing,...) Input: Image from camera Output: Control signals to the joints Problems: very complex consequences of actions hard to predict interference / noise Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (9)
10 Example: Maze Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (10)
11 The Agent Concept [Russell and Norvig 1995, page 33] An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through effectors. examples: a human a robot arm an autonomous car a motor controller... Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (11)
12 Solution approaches in Artificial Intelligence (AI) Planning / search (e.g. A, backtracking) Deduction (e.g. logic programming, predicate logic) Expert systems (e.g. knowledge generated by experts) Fuzzy control systems (fuzzy logic) Genetic algorithms (evolution of solutions) Machine Learning (e.g. reinforcement learning) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (12)
13 Types of learning (in humans) Learning from a teacher Structuring of objects Learning from experience Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (13)
14 Types of Machine Learning (ML) Learning with a teacher. Supervised Learning: Examples of input / (target)output. Goal: generalization (in general not simply memorization) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (14)
15 Types of Machine Learning (ML) Learning with a teacher. Supervised Learning: Examples of input / (target)output. Goal: generalization (in general not simply memorization) Structuring / recognition of correlations. Unsupervised learning: Goal: Clustering of similar data points, e.g. for preprocessing. Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (14)
16 Types of Machine Learning (ML) Learning with a teacher. Supervised Learning: Examples of input / (target)output. Goal: generalization (in general not simply memorization) Structuring / recognition of correlations. Unsupervised learning: Goal: Clustering of similar data points, e.g. for preprocessing. Learning through reward / penalty. Reinforcement Learning: Prerequisite: Specification of target goal (or events to be avoided).... Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (14)
17 Machine Learning: ingredients 1. Type of the learning problem (given / seeked) 2. Representation of learned solution knowledge table, rules, linear mapping, neural network, Solution process (observed data solution) (heuristic) search, gradient descent, optimization technique,... Not at all: For this problem I need a neural network Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (15)
18 Emphasis of the lecture: Reinforcement Learning No information regarding the solution strategy required Independent learning of a strategy by smart trial of solutions ( trial and error ) Biggest challenge of a learning system Representation of solution knowledge by usage of a function approximator (e.g. tables, linear models, neural networks, etc.) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (16)
19 RL using the example of autonomous robots bad: Damage (fall,...) good: task done successfully better: fast / low energy / smooth movements /... optimization! Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (17)
20 Reinforcement Learning (RL) Also: Learning from evaluations, autonomous learning, neuro dynamic programming Defines a learning type and not a method! Central feature: Evaluating training signal  e.g. good / bad RL with immediate evaluation: Decision Evaluation Example: Parameter for a basketball throw RL with rewards delayed in time Decision, decision,..., decision evaluation substantially harder; interesting, because of versatile applications Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (18)
21 Delayed RL Decision, decision,..., decision evaluation Example: Robotics, control systems, games (chess, backgammon) Basic problem: Temporal credit assignment Basic architecture: Actorcritic system Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (19)
22 Multistage decision problems Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (20)
23 Actorcritic system (Barto, Sutton, 1983) Actor: In situation s choose action u (strategy π : S U) Critic: Distribution of the external signal onto single actions Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (21)
24 Reinforcement Learning 1959 Samuel s CheckerPlayer: Temporal difference (TD) methods 1968 Michie and Chambers: Boxes 1983 Barto, Sutton s AHC/ACE, 1987 Sutton s TD(λ) Early 90ies: Correlation between dynamic programming (DP) and RL: Werbos, Sutton, Barto, Watkins, Singh, Bertsekas DP  classic optimization technique (late 50ies: Bellman) too much effort for large tasks Advantage: Clean mathematical formulation, convergences 2000 Policy Gradient methods (Sutton et. al, Peters et. al,...) 2005 Fitted Q (Batch DP method) (Ernst et. al, Riedmiller,..) many examples of successful, at least practically relevant applications since Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (22)
25 Other examples field input goal example output (actions) games board situation winning backgammon, chess valid move robotics sensor data reference value pendulum, robot soccer control variable sequence state gain assembly line, mobile network planning candidate benchmark state goal position maze direction Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (23)
26 Goal: Autonomous learning system Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (24)
27 Approach  rough outline Formulation of the learning problem as an optimization task Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (25)
28 Approach  rough outline Formulation of the learning problem as an optimization task Solution by learning based on the optimization technique of Dynamic Programming Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (25)
29 Approach  rough outline Formulation of the learning problem as an optimization task Solution by learning based on the optimization technique of Dynamic Programming Difficulties: very large state space process behaviour unknown Application of approximation techniques (e.g. neural networks,...) Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (25)
30 Outline of lecture 1. part: Introduction Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)
31 Outline of lecture 1. part: Introduction 2. part: Dynamic Programming Markov Decision Problems, Backwards DP, Value Iteration, Policy Iteration Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)
32 Outline of lecture 1. part: Introduction 2. part: Dynamic Programming Markov Decision Problems, Backwards DP, Value Iteration, Policy Iteration 3. part: Approximate DP / Reinforcement Learning Monte Carlo methods, stochastic approximation, TD(λ), Qlearning Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)
33 Outline of lecture 1. part: Introduction 2. part: Dynamic Programming Markov Decision Problems, Backwards DP, Value Iteration, Policy Iteration 3. part: Approximate DP / Reinforcement Learning Monte Carlo methods, stochastic approximation, TD(λ), Qlearning 4. part: Advanced methods of Reinforcement Learning Policy Gradient methods, hierarchic methods, POMDPs, relational Reinforcement Learning Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)
34 Outline of lecture 1. part: Introduction 2. part: Dynamic Programming Markov Decision Problems, Backwards DP, Value Iteration, Policy Iteration 3. part: Approximate DP / Reinforcement Learning Monte Carlo methods, stochastic approximation, TD(λ), Qlearning 4. part: Advanced methods of Reinforcement Learning Policy Gradient methods, hierarchic methods, POMDPs, relational Reinforcement Learning 5. part: Applications of Reinforcement Learning Robot soccer, Pendulum, RL competition Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (26)
35 Further courses on machine learning lecture: machine learning (summer term) lab course: deep learning (Wed., 1012) Bachelor/ Master theses, team projects Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (27)
36 Further readings WWW: D. P. Bertsekas and J.N. Tsitsiklis. Neuro Dynamic Programming. Athena Scientific, Belmont, Massachusetts, A. Barto and R. Sutton. Reinforcement Learning. MIT Press, Cambridge, Massachusetts, M. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, New York, L.P. Kaelbling, M.L. Littman and A.W. Moore. Reinforcement Learning: A survey. Journal of Artificial Intelligence Research, 4: , 1996 M. Wiering (ed.). Reinforcement learning : stateoftheart. Springer, Prof. Dr. M. Riedmiller, Dr. M. Lauer, Dr. J. Boedecker Machine Learning Lab, University of Freiburg Reinforcement Learning (28)
Sequential decision making under uncertainty
Sequential decision making under uncertainty Matthijs Spaan Francisco S. Melo Institute for Systems and Robotics Instituto Superior Técnico Lisbon, Portugal Reading group meeting, January 4, 2007 1/20
More informationMachine Learning. Outline. Reinforcement learning 2. Defining an RL problem. Solving an RL problem. Miscellaneous. Eric Xing /15
Machine Learning 10701/15 701/15781, 781, Spring 2008 Reinforcement learning 2 Eric Xing Lecture 28, April 30, 2008 Reading: Chap. 13, T.M. book Eric Xing 1 Outline Defining an RL problem Markov Decision
More informationReinforcement Learning with Deep Architectures
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationReinforcement Learning
Reinforcement Learning MariaFlorina Balcan Carnegie Mellon University April 20, 2015 Today: Learning of control policies Markov Decision Processes Temporal difference learning Q learning Readings: Mitchell,
More informationReinforcement Learning
Reinforcement Learning Lecture 1: Introduction Vien Ngo MLR, University of Stuttgart What is Reinforcement Learning? Reinforcement Learning is a subfield of Machine Learning from David Silver s lecture
More informationA Reinforcement Learning Algorithm in Cooperative MultiRobot Domains
Journal of Intelligent and Robotic Systems (2005) 43: 161 174 Springer 2005 DOI: 10.1007/s108460055137x A Reinforcement Learning Algorithm in Cooperative MultiRobot Domains FERNANDO FERNÁNDEZ and DANIEL
More informationCS 242 Final Project: Reinforcement Learning. Albert Robinson May 7, 2002
CS 242 Final Project: Reinforcement Learning Albert Robinson May 7, 2002 Introduction Reinforcement learning is an area of machine learning in which an agent learns by interacting with its environment.
More informationReinforcement Learning and Markov Decision Processes
Reinforcement Learning and Markov Decision Processes Ronald J. Williams CSG0, Spring 007 Contains a few slides adapted from two related Andrew Moore tutorials found at http://www.cs.cmu.edu/~awm/tutorials
More informationReinforcement Learning
Reinforcement Learning Introduction Daniel Hennes 17.04.2017 University Stuttgart  IPVS  Machine Learning & Robotics 1 What is reinforcement learning? Generalpurpose framework for decisionmaking Autonomous
More informationIntro to Reinforcement Learning. Part 2: Ideas and Examples
Intro to Reinforcement Learning Part 2: Ideas and Examples Psychology Artificial Intelligence Reinforcement Learning Neuroscience Control Theory Reinforcement learning The engineering endeavor most closely
More informationLearning Agents: Introduction
Learning Agents: Introduction S Luz luzs@cs.tcd.ie October 28, 2014 Learning in agent architectures Agent Learning in agent architectures Agent Learning in agent architectures Agent perception Learning
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationExploration vs. Exploitation. CS 473: Artificial Intelligence Reinforcement Learning II. How to Explore? Exploration Functions
CS 473: Artificial Intelligence Reinforcement Learning II Exploration vs. Exploitation Dieter Fox / University of Washington [Most slides were taken from Dan Klein and Pieter Abbeel / CS188 Intro to AI
More informationECE 517: Reinforcement Learning in Artificial Intelligence
ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 14: Planning and Learning October 27, 2015 Dr. Itamar Arel College of Engineering Department of Electrical Engineering and Computer Science
More informationREINFORCEMENT LEARNING OF STRATEGIES FOR SETTLERS OF CATAN
REINFORCEMENT LEARNING OF STRATEGIES FOR SETTLERS OF CATAN Michael Pfeiffer Institute for Theoretical Computer Science Graz University of Technology A 8010, Graz Austria Email: pfeiffer@igi.tugraz.at
More informationNeural Reinforcement Learning to Swingup and Balance a Real Pole
Neural Reinforcement Learning to Swingup and Balance a Real Pole Martin Riedmiller Neuroinformatics Group University of Osnabrueck 49069 Osnabrueck martin.riedmiller@uos.de Abstract This paper proposes
More informationLecture Overview. Introduction to Artificial Intelligence COMP 3501 / COMP Lecture 1. Artificial Intelligence.
Lecture Overview COMP 3501 / COMP 47044 Lecture 1 Prof. JGH 318 What is AI? AI History Views/goals of AI Course Overview Artificial Intelligence As humans we have intelligence But what is intelligence?
More informationCPSC 533 Reinforcement Learning. Paul Melenchuk Eva Wong Winson Yuen Kenneth Wong
CPSC 533 Reinforcement Learning Paul Melenchuk Eva Wong Winson Yuen Kenneth Wong Outline Introduction Passive Learning in an Known Environment Passive Learning in an Unknown Environment Active Learning
More informationICRA 2012 Tutorial on Reinforcement Learning 4. Value Function Methods
ICRA 2012 Tutorial on Reinforcement Learning 4. Value Function Methods Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt A Reinforcement Learning Ontology Prior Knowledge Data { (x t, u t, x t+1, r t )
More informationbased on QLearning and Selforganizing Control
ICROSSICE International Joint Conference 2009 August 1821, 2009, Fukuoka International Congress Center, Japan Intelligent Navigation and Control of an Autonomous Underwater Vehicle based on QLearning
More informationAxiom 2013 Team Description Paper
Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association
More informationModels. Chapter 9: Planning and Learning. Planning Cont. Planning. for all s, s!, and a "A(s)! Sample model: produces sample experiences
Chapter 9: Planning and Learning Models Objectives of this chapter:! Use of environment models! Integration of planning and learning methods! Model: anything the agent can use to predict how the environment
More informationReinforcement Learning by Comparing Immediate Reward
Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate
More informationPlay Ms. PacMan using an advanced reinforcement learning agent
Play Ms. PacMan using an advanced reinforcement learning agent Nikolaos Tziortziotis Konstantinos Tziortziotis Konstantinos Blekas March 3, 2014 Abstract Reinforcement Learning (RL) algorithms have been
More informationA Distriubuted Implementation for Reinforcement Learning
A Distriubuted Implementation for Reinforcement Learning YiChun Chen 1 and YuSheng Chen 1 1 ICME, Stanford University Abstract. In this CME323 project, we implement a distributed algorithm for modelfree
More informationDeep reinforcement learning
Deep reinforcement learning Function approximation So far, we ve assumed a lookup table representation for utility function U(s) or actionutility function Q(s,a) This does not work if the state space is
More information10 Markov Decision Process
10 Markov Decision Process This chapter is an introduction to a generalization of supervised learning where feedback is only given, possibly with delay, in form of reward or punishment. The goal of this
More informationChapter 11: Case Studies
Chapter 11: Case Studies Objectives of this chapter: Illustrate tradeoffs and issues that arise in real applications Illustrate use of domain knowledge Illustrate representation development Some historical
More information20.3 The EM algorithm
20.3 The EM algorithm Many realworld problems have hidden (latent) variables, which are not observable in the data that are available for learning Including a latent variable into a Bayesian network may
More informationAn Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning
An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning Michael Bowling Manuela Veloso October, 2000 CMUCS00165 School of Computer Science Carnegie Mellon University Pittsburgh,
More informationArtificial Intelligence Recap. Mausam
Artificial Intelligence Recap Mausam What is intelligence? (bounded) Rationality We have a performance measure to optimize Given our state of knowledge Choose optimal action Given limited computational
More informationReinforcement Learning
Reinforcement Learning Policy Op4miza4on and Planning (Material not examinable) Subramanian Ramamoorthy School of Informa4cs 31 March, 2017 Plan for Lecture: Policies and Plans Policy Op5miza5on Policies
More informationMark Hammond Cofounder / CEO. Performant deep reinforcement learning: latency, hazards, and pipeline stalls in the GPU era and how to avoid them 0
Performant deep reinforcement learning: latency, hazards, and pipeline stalls in the GPU era and how to avoid them Mark Hammond Cofounder / CEO Performant deep reinforcement learning: latency, hazards,
More informationMultiAgent Systems. Bernhard Nebel, Felix Lindner, and Thorsten Engesser. Summer Term AlbertLudwigsUniversität Freiburg
MultiAgent Systems AlbertLudwigsUniversität Freiburg Bernhard Nebel, Felix Lindner, and Thorsten Engesser Summer Term 2017 Lecturers Prof. Dr. Bernhard Nebel Room 5200028 Phone: 0761/2038221 email:
More informationReinforcement Learning
Artificial Intelligence Topic 8 Reinforcement Learning passive learning in a known environment passive learning in unknown environments active learning exploration learning actionvalue functions generalisation
More informationBrief Overview of Adaptive and Learning Control
1.10.2007 Outline Introduction Outline Introduction Introduction Outline Introduction Introduction Definition of Adaptive Control Definition of Adaptive Control Zames (reported by Dumont&Huzmezan): A nonadaptive
More informationRobot Learning. Denition. Robot Learning Systems
Robot Learning Jan Peters, Max Planck Institute for Biological Cybernetics Russ Tedrake, Massachusetts Institute of Technology Nick Roy, Massachusetts Institute of Technology Jun Morimoto, Advanced Telecommunication
More informationCS 520: Introduction to Artificial Intelligence CS 520
CS 520: Introduction to Artificial Intelligence Prof. Louis Steinberg 1 Prof. Louis Steinberg CS 520 401 Hill, 4453581, lou@cs Office hours: Thursday 13pm and by appointment TA: Xiaolei Huang (xiaolei@paul)
More informationNeural Dynamics and Reinforcement Learning
Neural Dynamics and Reinforcement Learning Presented By: Matthew Luciw DFT SUMMER SCHOOL, 2013 IDSIA Istituto Dalle Molle Di Studi sull Intelligenza Artificiale IDSIA Lugano, Switzerland www.idsia.ch Our
More informationr t +1 s t +1 TD Prediction Chapter 6: Temporal Difference Learning [ ] [ ] Simplest TD Method Simple Monte Carlo
Chapter 6: emporal Difference Learning D Prediction Objectives of this chapter: Policy Evaluation (the prediction problem: for a given policy!, compute the statevalue function V!! Introduce emporal Difference
More informationMachine Learning: Algorithms and Applications
Machine Learning: Algorithms and Applications Floriano Zini Free University of BozenBolzano Faculty of Computer Science Academic Year 20112012 Lecture 11: 21 May 2012 Unsupervised Learning (cont ) Slides
More informationA Production Scheduling Strategy for an Assembly Plant based on Reinforcement Learning
A Production Scheduling Strategy for an Assembly Plant based on Reinforcement Learning DRANIDIS D., KEHRIS E. Computer Science Department CITY LIBERAL STUDIES  Affiliated College of the University of
More informationTD Gammon. Chapter 11: Case Studies. A Few Details. Multilayer Neural Network. Tesauro 1992, 1994, 1995,... Objectives of this chapter:
Objectives of this chapter: Chapter 11: Case Studies! Illustrate tradeoffs and issues that arise in real applications! Illustrate use of domain knowledge! Illustrate representation development! Some historical
More informationLecture 10: Reinforcement Learning
Lecture 1: Reinforcement Learning Cognitive Systems II  Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation
More informationVisionBased Reinforcement Learning Using A Consolidated ActorCritic Model
University of Tennessee, Knoxville Trace: Tennessee Research and Creative Exchange Masters Theses Graduate School 122009 VisionBased Reinforcement Learning Using A Consolidated ActorCritic Model Christopher
More informationIntelligent Tutoring Systems using Reinforcement Learning to teach Autistic Students
Intelligent Tutoring Systems using Reinforcement Learning to teach Autistic Students B. H. Sreenivasa Sarma 1 and B. Ravindran 2 Department of Computer Science and Engineering, Indian Institute of Technology
More informationReinforcement Learning
Reinforcement Learning CITS3001 Algorithms, Agents and Artificial Intelligence Tim French School of Computer Science and Software Engineering The University of Western Australia 2017, Semester 2 Introduc)on
More informationReinforcement Learning in Cooperative Multi Agent Systems
Reinforcement Learning in Cooperative Multi Agent Systems Hao Ren haoren@cs.ubc.ca Abstract Reinforcement Learning is used in cooperative multi agent systems differently for various problems. We provide
More informationMultiAgent Reinforcement Learning in Games
MultiAgent Reinforcement Learning in Games by Xiaosong Lu, M.A.Sc. A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of the requirements for the degree of Doctor
More informationThe OpenSource TEXPLORE Code Release for Reinforcement Learning on Robots
In RoboCup2013 Robot Soccer World Cup XVII, Lecture Notes in Artificial Intelligence, Springer Verlag, Berlin, 2013. The OpenSource TEXPLORE Code Release for Reinforcement Learning on Robots Todd Hester
More informationEECS 349 Machine Learning
EECS 349 Machine Learning Instructor: Doug Downey (some slides from Pedro Domingos, University of Washington) 1 Logistics Instructor: Doug Downey Email: ddowney@eecs.northwestern.edu Office hours: Mondays
More informationReinforcement learning of coordination in heterogeneous cooperative multiagent systems
Reinforcement learning of coordination in heterogeneous cooperative multiagent systems Spiros Kapetanakis and Daniel Kudenko {spiros, kudenko}@cs.york.ac.uk Department of Computer Science University of
More informationContinuous reinforcement learning in cognitive robotics
Continuous reinforcement learning in cognitive robotics Igor Farkaš CNC research group Department of Applied Informatics / Centre for Cognitive Science FMFI, Comenius University in Bratislava AI seminar,
More informationThe Implementation of Machine Learning in the Game of Checkers
The Implementation of Machine Learning in the Game of Checkers William Melicher Computer Systems Lab Thomas Jefferson June 9, 2009 Abstract Most games have a set algorithm that does not change. This means
More informationTD(λ) and QLearning Based Ludo Players
TD(λ) and QLearning Based Ludo Players Majed Alhajry, Faisal Alvi, Member, IEEE and Moataz Ahmed Abstract Reinforcement learning is a popular machine learning technique whose inherent selflearning ability
More informationCMPUT 609/499: Reinforcement Learning for Artificial Intelligence. Instructor: Rich Sutton Dept of Computing Science richsutton.
CMPUT 609/499: Reinforcement Learning for Artificial Intelligence Instructor: Rich Sutton Dept of Computing Science richsutton.com 1 What is Reinforcement Learning? Agentoriented learning learning by
More informationIntroduction to Reinforcement Learning
Introduction to Reinforcement Learning A. LAZARIC (SequeL Team @INRIALille) ENS Cachan  Master 2 MVA SequeL INRIA Lille MVARL Course A Bit of History From Psychology to Machine Learning A. LAZARIC Introduction
More informationForm 4.2. Faculty member + student
Form 4.2 Faculty member + student Course syllabus for Artificial IntelligenceCS370D 1. Faculty member information: Name of faculty member responsible for the course Dr.Abeer Mahmoud Office Hours Office
More informationLearning Methods for Fuzzy Systems
Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8
More informationReinforcement Learning with Randomization, Memory, and Prediction
Reinforcement Learning with Randomization, Memory, and Prediction Radford M. Neal, University of Toronto Dept. of Statistical Sciences and Dept. of Computer Science http://www.cs.utoronto.ca/ radford CRM
More informationEECS 349 Machine Learning
EECS 349 Machine Learning Instructor: Doug Downey (some slides from Pedro Domingos, University of Washington) 1 Logistics Instructor: Doug Downey Email: ddowney@eecs.northwestern.edu Office hours: Mondays
More informationLearning. Part 6 in Russell / Norvig Book
Wisdom is not the product of schooling but the lifelong attempt to acquire it.  Albert Einstein Learning Part 6 in Russell / Norvig Book Gerhard Fischer AI Course, Fall 1996, Lecture October 14 1 Overview
More informationLecture 29: Artificial Intelligence
Lecture 29: Artificial Intelligence Marvin Zhang 08/10/2016 Some slides are adapted from CS 188 (Artificial Intelligence) Announcements Roadmap Introduction Functions Data Mutability Objects This week
More information11. Reinforcement Learning
Artificial Intelligence 11. Reinforcement Learning prof. dr. sc. Bojana Dalbelo Bašić doc. dr. sc. Jan Šnajder University of Zagreb Faculty of Electrical Engineering and Computing (FER) Academic Year 2015/2016
More informationMachine Learning. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Machine Learning Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 15 Table of contents 1 What is machine learning?
More informationUniversity of Alberta. Reinforcement Learning and SimulationBased Search in Computer Go. David Silver
University of Alberta Reinforcement Learning and SimulationBased Search in Computer Go by David Silver A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the
More informationDeep Learning for AI Yoshua Bengio. August 28th, DS3 Data Science Summer School
Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School A new revolution seems to be in the work after the industrial revolution. And Machine Learning, especially Deep Learning,
More informationLoad Forecasting with Artificial Intelligence on Big Data
1 Load Forecasting with Artificial Intelligence on Big Data October 9, 2016 Patrick GLAUNER and Radu STATE SnT  Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg 2
More informationDeveloping Focus of Attention Strategies Using Reinforcement Learning
Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019 Developing Focus of Attention Strategies Using Reinforcement Learning Srividhya Rajendran rajendra@cse.uta.edu
More informationReinforcement learning (Chapter 21)
Reinforcement learning (Chapter 21) Reinforcement learning Regular MDP Given: Transition model P(s s, a) Reward function R(s) Find: Policy π(s) Reinforcement learning Transition model and reward function
More informationLearning and Planning with Tabular Methods
Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Learning and Planning with Tabular Methods Lecture 6, CMU 10703 Katerina Fragkiadaki What can I learn by interacting with
More informationReinforcement Learning in Multidimensional Continuous Action Spaces
Reinforcement Learning in Multidimensional Continuous Action Spaces Jason Pazis Department of Computer Science Duke University Durham, NC 27708 0129, USA Email: jpazis@cs.duke.edu Michail G. Lagoudakis
More informationReinforcement Learning II
CSC411 Fall 2015 Machine Learning & Data Mining Reinforcement Learning II Slides from Rich Zemel Formula(ng Reinforcement Learning World described by a discrete, 0inite set of states and actions At every
More informationIntelligent monitoring and maintenance of power plants
Intelligent monitoring and maintenance of power plants Dimitrios Kalles 1, Anna Stathaki 1 and Robert E. King 2 1 Computer Technology Institute, PO Box 1122, 261 10, Patras 2 Department of Electrical &
More informationLearning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for
Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com
More informationContents. Chapter 1: Introduction to Artificial Intelligence and Soft Computing
Contents Chapter 1: Introduction to Artificial Intelligence and Soft Computing 1.1 Evolution of Computing 1.2 Defining AI 1.3 General Problem Solving Approaches in AI 1.4 The Disciplines of AI 1.4.1 The
More informationTHE DESIGN OF A LEARNING SYSTEM Lecture 2
THE DESIGN OF A LEARNING SYSTEM Lecture 2 Challenge: Design a Learning System for Checkers What training experience should the system have? A design choice with great impact on the outcome Choice #1: Direct
More informationM. R. Ahmadzadeh Isfahan University of Technology. M. R. Ahmadzadeh Isfahan University of Technology
1 2 M. R. Ahmadzadeh Isfahan University of Technology Ahmadzadeh@cc.iut.ac.ir M. R. Ahmadzadeh Isfahan University of Technology Textbooks 3 Introduction to Machine Learning  Ethem Alpaydin Pattern Recognition
More informationReinforcement Learning
Reinforcement Learning based Dialog Manager Speech Group Department of Signal Processing and Acoustics Katri Leino User Interface Group Department of Communications and Networking Aalto University, School
More informationScaling Up RL Using Evolution Strategies. Tim Salimans, Jonathan Ho, Peter Chen, Szymon Sidor, Ilya Sutskever
Scaling Up RL Using Evolution Strategies Tim Salimans, Jonathan Ho, Peter Chen, Szymon Sidor, Ilya Sutskever Reinforcement Learning = AI? Definition of RL broad enough to capture all that is needed for
More informationDeep Reinforcement Learning CS
Deep Reinforcement Learning CS 294112 Course logistics Class Information & Resources Sergey Levine Assistant Professor UC Berkeley Abhishek Gupta PhD Student UC Berkeley Josh Achiam PhD Student UC Berkeley
More informationLearning to Communicate and Act using Hierarchical Reinforcement Learning
Learning to Communicate and Act using Hierarchical Reinforcement Learning Mohammad Ghavamzadeh & Sridhar Mahadevan Department of Computer Science, University of Massachusetts Amherst, MA 010034610, USA
More informationIntroduction to Machine Learning Reykjavík University Spring Instructor: Dan Lizotte
Introduction to Machine Learning Reykjavík University Spring 2007 Instructor: Dan Lizotte Logistics To contact Dan: dlizotte@cs.ualberta.ca http://www.cs.ualberta.ca/~dlizotte/teaching/ Books: Introduction
More informationImproving Convergence of Deterministic. Policy Gradient Algorithms in. Reinforcement Learning
Department of Electronic and Electrical Engineering University College London Improving Convergence of Deterministic Policy Gradient Algorithms in Reinforcement Learning Final Report Riashat Islam Supervisor:
More informationLaboratorio di Intelligenza Artificiale e Robotica
Laboratorio di Intelligenza Artificiale e Robotica A.A. 20082009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms GeneticsBased Machine Learning
More informationLearning Policies by Imitating Optimal Control. CS : Deep Reinforcement Learning Week 3, Lecture 2 Sergey Levine
Learning Policies by Imitating Optimal Control CS 294112: Deep Reinforcement Learning Week 3, Lecture 2 Sergey Levine Overview 1. Last time: learning models of system dynamics and using optimal control
More informationIntroduction to AI & Intelligent Agents
Introduction to AI & Intelligent Agents This Lecture Chapters 1 and 2 Next Lecture Chapter 3.1 to 3.4 (Please read lecture topic material before and after each lecture on that topic) What is Artificial
More informationDVisionDraughts: a Draughts Player Neural Network That Learns by Reinforcement in a High Performance Environment
DVisionDraughts: a Draughts Player Neural Network That Learns by Reinforcement in a High Performance Environment Ayres Roberto Araújo Barcelos 1, Rita Maria Silva Julia 1 and Rivalino Matias Júnior 1
More informationLecture I Outline. Course information and details Why do machine learning? What is machine learning? Why now? Type of Learning
Lecture I Outline Course information and details Why do machine learning? What is machine learning? Why now? Type of Learning Association Classification Three types: Linear, Decision Tree, and Nearest
More informationINTRODUCTION TO DATA SCIENCE
DATA11001 INTRODUCTION TO DATA SCIENCE EPISODE 6: MACHINE LEARNING TODAY S MENU 1. WHAT IS ML? 2. CLASSIFICATION AND REGRESSSION 3. EVALUATING PERFORMANCE & OVERFITTING WHAT IS MACHINE LEARNING? Definition:
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Hierarchical RL and Transfer Learning Used Materials Disclaimer: Some of the material was
More informationCS534 Machine Learning
CS534 Machine Learning Spring 2013 Lecture 1: Introduction to ML Course logistics Reading: The discipline of Machine learning by Tom Mitchell Course Information Instructor: Dr. Xiaoli Fern Kec 3073, xfern@eecs.oregonstate.edu
More informationNeuroFuzzy and Soft Computing chapter 1 J.S.R. Jang
NeuroFuzzy and chapter 1 J.S.R. Jang Bill Cheetham Kai Goebel 1 What is covered in this class? We will teach techniques useful in creating intelligent software systems that can deal with the uncertainty
More informationWhat does Shaping Mean for Computational Reinforcement Learning?
What does Shaping Mean for Computational Reinforcement Learning? Tom Erez and William D. Smart Dept. of Computer Science and Engineering Washington University in St. Louis Email: {etom,wds}@cse.wustl.edu
More informationOnline Robot Learning by Reward and Punishment for a Mobile Robot
Online Robot Learning by Reward and Punishment for a Mobile Robot Dejvuth Suwimonteerabuth, Prabhas Chongstitvatana Department of Computer Engineering Chulalongkorn University, Bangkok, Thailand prabhas@chula.ac.th
More informationDeep Reinforcement Learning From Raw Pixels in Doom
Deep Reinforcement Learning From Raw Pixels in Doom Danijar Hafner arxiv:1610.02164v1 [cs.lg] 7 Oct 2016 July 2016 A thesis submitted for the degree of Bachelor of Science Hasso Plattner Institute, Potsdam
More informationLecture 1: Introduc4on
CSC2515 Spring 2014 Introduc4on to Machine Learning Lecture 1: Introduc4on All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationIndepth: Deep learning (one lecture) Applied to both SL and RL above Code examples
Introduction to machine learning (two lectures) Supervised learning Reinforcement learning (lab) Indepth: Deep learning (one lecture) Applied to both SL and RL above Code examples 20170930 2 1 To enable
More informationIntroduction to Deep Learning
Introduction to Deep Learning M S Ram Dept. of Computer Science & Engg. Indian Institute of Technology Kanpur Reading of Chap. 1 from Learning Deep Architectures for AI ; Yoshua Bengio; FTML Vol. 2, No.
More information1.5. game points #games. #games PIPE 1Player 1.5. game points 0.5
CMAC Models Learn to Play Soccer Proceedings of the 8th International Conference on Articial Neural Networks (ICANN'98), L. Niklasson and M. Boden and T. Ziemkei (eds.), SpringerVerlag, London, pages
More information