Introduction to Reinforcement Learning

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Accelerated Learning Course Outline

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Reinforcement Learning by Comparing Immediate Reward

Accelerated Learning Online. Course Outline

Breaking the Habit of Being Yourself Workshop for Quantum University

Laboratorio di Intelligenza Artificiale e Robotica

Artificial Neural Networks

Neuroscience I. BIOS/PHIL/PSCH 484 MWF 1:00-1:50 Lecture Center F6. Fall credit hours

Artificial Neural Networks written examination

Assessing Student Learning in the Major

University of Victoria School of Exercise Science, Physical and Health Education EPHE 245 MOTOR LEARNING. Calendar Description Units: 1.

File # for photo

Axiom 2013 Team Description Paper

Laboratorio di Intelligenza Artificiale e Robotica

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Lecture 10: Reinforcement Learning

Learning Methods for Fuzzy Systems

Lecture 1: Machine Learning Basics

Evolution of Symbolisation in Chimpanzees and Neural Nets

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Forget catastrophic forgetting: AI that learns after deployment

A Reinforcement Learning Variant for Control Scheduling

Python Machine Learning

(Sub)Gradient Descent

Learning Prospective Robot Behavior

Bayley scales of Infant and Toddler Development Third edition

XXII BrainStorming Day

Adult Education and Learning Theories Georgios Giannoukos, Georgios Besas

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

UDL AND LANGUAGE ARTS LESSON OVERVIEW

CALIFORNIA STATE UNIVERSITY, SAN MARCOS SCHOOL OF EDUCATION

Speeding Up Reinforcement Learning with Behavior Transfer

Lecture 6: Applications

Special Education Program Continuum

Lecture 1: Basic Concepts of Machine Learning

The Open Access Institutional Repository at Robert Gordon University

Exploration. CS : Deep Reinforcement Learning Sergey Levine

TD(λ) and Q-Learning Based Ludo Players

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

STUDENT NUMBER Letter Figures Words PSYCHOLOGY. Written examination 2. Thursday 3 November 2005

Rule Learning With Negation: Issues Regarding Effectiveness

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

LEGO MINDSTORMS Education EV3 Coding Activities

Financial Accounting Concepts and Research

The Complete Brain Exercise Book: Train Your Brain - Improve Memory, Language, Motor Skills And More By Fraser Smith

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Using focal point learning to improve human machine tacit coordination

Neural pattern formation via a competitive Hebbian mechanism

UNESCO Bangkok Asia-Pacific Programme of Education for All. Embracing Diversity: Toolkit for Creating Inclusive Learning-Friendly Environments

CSL465/603 - Machine Learning

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Effect of Treadmill Training Protocols on Locomotion Recovery in Spinalized Rats

EUROPEAN UNIVERSITIES LOOKING FORWARD WITH CONFIDENCE PRAGUE DECLARATION 2009

THE UNIVERSITY OF WESTERN ONTARIO. Department of Psychology

ADHD Classroom Accommodations for Specific Behaviour

A Case-Based Approach To Imitation Learning in Robotic Agents

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

The Impact of Neuroscience on Foreign Languages in School

Source-monitoring judgments about anagrams and their solutions: Evidence for the role of cognitive operations information in memory

California Professional Standards for Education Leaders (CPSELs)

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Human Emotion Recognition From Speech

COMPUTER-AIDED DESIGN TOOLS THAT ADAPT

HUMAN LEARNING ORMROD PDF

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Reading Horizons. A Look At Linguistic Readers. Nicholas P. Criscuolo APRIL Volume 10, Issue Article 5

Cognitive Self- Regulation

Beyond Classroom Solutions: New Design Perspectives for Online Learning Excellence

Results In. Planning Questions. Tony Frontier Five Levers to Improve Learning 1

Math 1313 Section 2.1 Example 2: Given the following Linear Program, Determine the vertices of the feasible set. Subject to:

EFFECTIVE CLASSROOM MANAGEMENT UNDER COMPETENCE BASED EDUCATION SCHEME

Executive Council Manual

AMULTIAGENT system [1] can be defined as a group of

Regret-based Reward Elicitation for Markov Decision Processes

A Review of the MDE Policy for the Emergency Use of Seclusion and Restraint:

INPE São José dos Campos

Denbigh School. Sex Education and Relationship Policy

Characteristics of Collaborative Network Models. ed. by Line Gry Knudsen

Neuro-Symbolic Approaches for Knowledge Representation in Expert Systems

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Outline for Session III

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Person Centered Positive Behavior Support Plan (PC PBS) Report Scoring Criteria & Checklist (Rev ) P. 1 of 8

Seminar - Organic Computing

Occupational Therapist (Temporary Position)

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

An OO Framework for building Intelligence and Learning properties in Software Agents

Communication and Cybernetics 17

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

Development of goal-directed action selection guided by intrinsic motivations: an experiment with children

Encoding. Retrieval. Forgetting. Physiology of Memory. Systems and Types of Memory

Lecture 2: Quantifiers and Approximation

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

Self Study Report Computer Science

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Transcription:

Introduction to Reinforcement Learning A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course

A Bit of History From Psychology to Machine Learning A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-2/14

The law of effect [Thorndike, 1911] Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond. A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-3/14

Experimental psychology Classical (human and) animal conditioning: the magnitude and timing of the conditioned response changes as a result of the contingency between the conditioned stimulus and the unconditioned stimulus [Pavlov, 1927]. A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-4/14

Experimental psychology Classical (human and) animal conditioning: the magnitude and timing of the conditioned response changes as a result of the contingency between the conditioned stimulus and the unconditioned stimulus [Pavlov, 1927]. Operant conditioning (or instrumental conditioning): process by which humans and animals learn to behave in such a way as to obtain rewards and avoid punishments [Skinner, 1938]. A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-4/14

Experimental psychology Classical (human and) animal conditioning: the magnitude and timing of the conditioned response changes as a result of the contingency between the conditioned stimulus and the unconditioned stimulus [Pavlov, 1927]. Operant conditioning (or instrumental conditioning): process by which humans and animals learn to behave in such a way as to obtain rewards and avoid punishments [Skinner, 1938]. Remark: reinforcement denotes any form of conditioning, either positive (rewards) or negative (punishments). A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-4/14

Computational neuroscience Hebbian learning: development of formal models of how the synaptic weights between neurons are reinforced by simultaneous activation. Cells that fire together, wire together. [Hebb, 1961]. A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-5/14

Computational neuroscience Hebbian learning: development of formal models of how the synaptic weights between neurons are reinforced by simultaneous activation. Cells that fire together, wire together. [Hebb, 1961]. Emotions theory: model on how the emotional process can bias the decision process [Damasio, 1994]. A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-5/14

Computational neuroscience Hebbian learning: development of formal models of how the synaptic weights between neurons are reinforced by simultaneous activation. Cells that fire together, wire together. [Hebb, 1961]. Emotions theory: model on how the emotional process can bias the decision process [Damasio, 1994]. Dopamine and basal ganglia model: direct link with motor control and decision-making (e.g., [Doya, 1999]). A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-5/14

Computational neuroscience Hebbian learning: development of formal models of how the synaptic weights between neurons are reinforced by simultaneous activation. Cells that fire together, wire together. [Hebb, 1961]. Emotions theory: model on how the emotional process can bias the decision process [Damasio, 1994]. Dopamine and basal ganglia model: direct link with motor control and decision-making (e.g., [Doya, 1999]). Remark: reinforcement denotes the effect of dopamine (and surprise). A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-5/14

Optimal control theory and dynamic programming Optimal control: formal framework to define optimization methods to derive control policies in continuous time control problems [Pontryagin and Neustadt, 1962]. A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-6/14

Optimal control theory and dynamic programming Optimal control: formal framework to define optimization methods to derive control policies in continuous time control problems [Pontryagin and Neustadt, 1962]. Dynamic programming: set of methods used to solve control problems by decomposing them into subproblems so that the optimal solution to the global problem is the conjunction of the solutions to the subproblems [Bellman, 2003]. A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-6/14

Optimal control theory and dynamic programming Optimal control: formal framework to define optimization methods to derive control policies in continuous time control problems [Pontryagin and Neustadt, 1962]. Dynamic programming: set of methods used to solve control problems by decomposing them into subproblems so that the optimal solution to the global problem is the conjunction of the solutions to the subproblems [Bellman, 2003]. Remark: reinforcement denotes an objective function to maximize (or minimize). A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-6/14

Reinforcement learning Reinforcement learning is learning what to do how to map situations to actions so as to maximize a numerical reward signal in an unknown uncertain environment. The learner is not told which actions to take, as in most forms of machine learning, but she must discover which actions yield the most reward by trying them (trial and error). In the most interesting and challenging cases, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards (delayed reward). An introduction to reinforcement learning, Sutton and Barto (1998). A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-7/14

A Bit of History: From Psychology to Machine Learning Reinforcement learning Reinforcement learning is learning what to do how to map situations to actions so as to maximize a numerical reward signal in an unknown uncertain environment. The learner is not told which actions to take, as in most forms of machine learning, but she must discover which actions yield the most reward by trying them (trial and error). In the most interesting and challenging cases, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards (delayed reward). An introduction to reinforcement learning, Sutton and Barto (1998). A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-8/14

A Multi-disciplinary Field A.I. Clustering Statistical Learning Statistics Cognitives Sciences Neural Networks Learning Theory Applied Math Neuroscience Reinforcement Learning Approximation Theory Dynamic Programming Categorization Optimal Control Automatic Control Psychology Active Learning A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-9/14

A Machine Learning Paradigm Supervised learning: an expert (supervisor) provides examples of the right strategy (e.g., classification of clinical images). Supervision is expensive. A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-10/14

A Machine Learning Paradigm Supervised learning: an expert (supervisor) provides examples of the right strategy (e.g., classification of clinical images). Supervision is expensive. Unsupervised learning: different objects are clustered together by similarity (e.g., clustering of images on the basis of their content). No actual performance is optimized. A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-10/14

A Machine Learning Paradigm Supervised learning: an expert (supervisor) provides examples of the right strategy (e.g., classification of clinical images). Supervision is expensive. Unsupervised learning: different objects are clustered together by similarity (e.g., clustering of images on the basis of their content). No actual performance is optimized. Reinforcement learning: learning by direct interaction (e.g., autonomous robotics). Minimum level of supervision (reward) and maximization of long term performance. A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-10/14

The Problems How to model an RL problem A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-11/14

The Problems How to model an RL problem How to solve exactly an RL problem A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-11/14

The Problems How to model an RL problem How to solve exactly an RL problem How to solve incrementally an RL problem A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-11/14

The Problems How to model an RL problem How to solve exactly an RL problem How to solve incrementally an RL problem How to efficiently explore in an RL problem A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-11/14

The Problems How to model an RL problem How to solve exactly an RL problem How to solve incrementally an RL problem How to efficiently explore in an RL problem How to solve approximately an RL problem A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-11/14

Bibliography I Bellman, R. (2003). Dynamic Programming. Dover Books on Computer Science Series. Dover Publications, Incorporated. Damasio, A. R. (1994). Descartes Error: Emotion, Reason and the Human Brain. Grosset/Putnam. Doya, K. (1999). What are the computations of the cerebellum, the basal ganglia, and the cerebral cortex. Neural Networks, 12:961 974. Hebb, D. O. (1961). Distinctive features of learning in the higher animal. In Delafresnaye, J. F., editor, Brain Mechanisms and Learning. Oxford University Press. Pavlov, I. (1927). Conditioned reflexes. Oxford University Press. A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-12/14

Bibliography II Pontryagin, L. and Neustadt, L. (1962). The Mathematical Theory of Optimal Processes. Number v. 4 in Classics of Soviet Mathematics. Gordon and Breach Science Publishers. Skinner, B. F. (1938). The behavior of organisms. Appleton-Century-Crofts. Thorndike, E. (1911). Animal Intelligence: Experimental Studies. The animal behaviour series. Macmillan. A. LAZARIC Introduction to Reinforcement Learning Sept 29th, 2015-13/14

Reinforcement Learning Alessandro Lazaric alessandro.lazaric@inria.fr sequel.lille.inria.fr