Reinforcement Learning

Similar documents
Lecture 10: Reinforcement Learning

Reinforcement Learning by Comparing Immediate Reward

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Laboratorio di Intelligenza Artificiale e Robotica

Stochastic Calculus for Finance I (46-944) Spring 2008 Syllabus

AMULTIAGENT system [1] can be defined as a group of

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Laboratorio di Intelligenza Artificiale e Robotica

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Intelligent Agents. Chapter 2. Chapter 2 1

Agents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators

Artificial Neural Networks written examination

TD(λ) and Q-Learning Based Ludo Players

Georgetown University at TREC 2017 Dynamic Domain Track

High-level Reinforcement Learning in Strategy Games

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Planning with External Events

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Automatic Discretization of Actions and States in Monte-Carlo Tree Search

(Sub)Gradient Descent

Lecture 1: Basic Concepts of Machine Learning

CSL465/603 - Machine Learning

Axiom 2013 Team Description Paper

Lecture 1: Machine Learning Basics

An OO Framework for building Intelligence and Learning properties in Software Agents

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Speeding Up Reinforcement Learning with Behavior Transfer

FF+FPG: Guiding a Policy-Gradient Planner

A Reinforcement Learning Variant for Control Scheduling

Regret-based Reward Elicitation for Markov Decision Processes

Chapter 2. Intelligent Agents. Outline. Agents and environments. Rationality. PEAS (Performance measure, Environment, Actuators, Sensors)

Probabilistic Latent Semantic Analysis

Python Machine Learning

Task Completion Transfer Learning for Reward Inference

Learning Prospective Robot Behavior

Introduction to Simulation

Improving Action Selection in MDP s via Knowledge Transfer

An investigation of imitation learning algorithms for structured prediction

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Evolutive Neural Net Fuzzy Filtering: Basic Description

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Task Completion Transfer Learning for Reward Inference

A Neural Network GUI Tested on Text-To-Phoneme Mapping

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics

T Seminar on Internetworking

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

On the Combined Behavior of Autonomous Resource Management Agents

Learning and Transferring Relational Instance-Based Policies

Case Acquisition Strategies for Case-Based Reasoning in Real-Time Strategy Games

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Learning Methods in Multilingual Speech Recognition

Action Models and their Induction

Radius STEM Readiness TM

Seminar - Organic Computing

Rule Learning With Negation: Issues Regarding Effectiveness

Stopping rules for sequential trials in high-dimensional data

Lecture 6: Applications

Knowledge Transfer in Deep Convolutional Neural Nets

Major Milestones, Team Activities, and Individual Deliverables

An empirical study of learning speed in backpropagation

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

The Strong Minimalist Thesis and Bounded Optimality

Cal s Dinner Card Deals

CS Machine Learning

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

ECE-492 SENIOR ADVANCED DESIGN PROJECT

A Case-Based Approach To Imitation Learning in Robotic Agents

Secret Code for Mazes

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

TOKEN-BASED APPROACH FOR SCALABLE TEAM COORDINATION. by Yang Xu PhD of Information Sciences

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

DOCTOR OF PHILOSOPHY HANDBOOK

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Learning goal-oriented strategies in problem solving

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Sustainable Software Development: Evolving Extreme Programming

An Introduction to Simio for Beginners

Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners

Interactive Whiteboard

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

LEGO MINDSTORMS Education EV3 Coding Activities

Surprise-Based Learning for Autonomous Systems

Guru: A Computer Tutor that Models Expert Human Tutors

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

Lesson plan for Maze Game 1: Using vector representations to move through a maze Time for activity: homework for 20 minutes

Visual CP Representation of Knowledge

Statewide Framework Document for:

Fall Classes At A Glance

XXII BrainStorming Day

Transcription:

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Reinforcement Learning Matt Gormley Lecture 25 April 11, 2018 1

Homework 7: HMMs Reminders Out: Wed, Apr 04 Due: Mon, Apr 16 at 11:59pm Schedule Changes Lecture on Fri, Apr 13 Recitation on Mon, Apr 23 2

Learning Paradigms Whiteboard Supervised Regression Classification Binary Classification Structured Prediction Unsupervised Semi-supervised Online Active Learning Reinforcement Learning 3

REINFORCEMENT LEARNING 4

Examples of Reinforcement Learning How should a robot behave so as to optimize its performance? (Robotics) How to automate the motion of a helicopter? (Control Theory) How to make a good chess-playing program? (Artificial Intelligence) Eric Xing Eric Xing @ CMU, 2006-2011 5

Autonomous Helicopter Video: https://www.youtube.com/watch?v=vcdxqn0fcne 6

Robot in a room what s the strategy to achieve max reward? what if the actions were NOT deterministic? Eric Xing Eric Xing @ CMU, 2006-2011 7

History of Reinforcement Learning Roots in the psychology of animal learning (Thorndike,1911). Another independent thread was the problem of optimal control, and its solution using dynamic programming (Bellman, 1957). Idea of temporal difference learning (on-line method), e.g., playing board games (Samuel, 1959). A major breakthrough was the discovery of Q- learning (Watkins, 1989). Eric Xing Eric Xing @ CMU, 2006-2011 8

What is special about RL? RL is learning how to map states to actions, so as to maximize a numerical reward over time. Unlike other forms of learning, it is a multistage decision-making process (often Markovian). An RL agent must learn by trial-and-error. (Not entirely supervised, but interactive) Actions may affect not only the immediate reward but also subsequent rewards (Delayed effect). Eric Xing Eric Xing @ CMU, 2006-2011 9

Elements of RL A policy - A map from state space to action space. - May be stochastic. A reward function - It maps each state (or, state-action pair) to a real number, called reward. A value function - Value of a state (or, state-action pair) is the total expected reward, starting from that state (or, state-action pair). Eric Xing Eric Xing @ CMU, 2006-2011 10

Policy Eric Xing Eric Xing @ CMU, 2006-2011 11

Reward for each step -2 Eric Xing Eric Xing @ CMU, 2006-2011 12

Reward for each step: -0.1 Eric Xing Eric Xing @ CMU, 2006-2011 13

The Precise Goal To find a policy that maximizes the Value function. transitions and rewards usually not available There are different approaches to achieve this goal in various situations. Value iteration and Policy iteration are two more classic approaches to this problem. But essentially both are dynamic programming. Q-learning is a more recent approaches to this problem. Essentially it is a temporal-difference method. Eric Xing Eric Xing @ CMU, 2006-2011 14

MARKOV DECISION PROCESSES 15

Whiteboard Markov Decision Process Components: states, actions, state transition probabilities, reward function Markovian assumption MDP Model MDP Goal: Infinite-horizon Discounted Reward deterministic vs. nondeterministic MDP deterministic vs. stochastic policy 16

Exploration vs. Exploitation Whiteboard Explore vs. Exploit Tradeoff Ex: k-armed Bandits Ex: Traversing a Maze 17

FIXED POINT ITERATION 18

Fixed Point Iteration for Optimization Fixed point iteration is a general tool for solving systems of equations It can also be applied to optimization. J( ) dj( ) =0=f( ) d i 0=f( ) ) i = g( ) (t+1) i = g( (t) ) 1. Given objective function: 2. Compute derivative, set to zero (call this function f ). 3. Rearrange the equation s.t. one of parameters appears on the LHS. 4. Initialize the parameters. 5. For i in {1,...,K}, update each parameter and increment t: 6. Repeat #5 until convergence 19

Fixed Point Iteration for Optimization Fixed point iteration is a general tool for solving systems of equations It can also be applied to optimization. J(x) = x3 3 + 3 2 x2 +2x dj(x) dx = f(x) =x2 3x +2=0 ) x = x2 +2 3 x 2 +2 x 3 = g(x) 1. Given objective function: 2. Compute derivative, set to zero (call this function f ). 3. Rearrange the equation s.t. one of parameters appears on the LHS. 4. Initialize the parameters. 5. For i in {1,...,K}, update each parameter and increment t: 6. Repeat #5 until convergence 20

Fixed Point Iteration for Optimization J(x) = x3 3 + 3 2 x2 +2x dj(x) dx = f(x) =x2 3x +2=0 ) x = x2 +2 3 x 2 +2 x 3 = g(x) We can implement our example in a few lines of python. 21

Fixed Point Iteration for Optimization J(x) = x3 3 + 3 2 x2 +2x dj(x) dx = f(x) =x2 3x +2=0 ) x = x2 +2 3 x 2 +2 x 3 = g(x) $ python fixed-point-iteration.py i= 0 x=0.0000 f(x)=2.0000 i= 1 x=0.6667 f(x)=0.4444 i= 2 x=0.8148 f(x)=0.2195 i= 3 x=0.8880 f(x)=0.1246 i= 4 x=0.9295 f(x)=0.0755 i= 5 x=0.9547 f(x)=0.0474 i= 6 x=0.9705 f(x)=0.0304 i= 7 x=0.9806 f(x)=0.0198 i= 8 x=0.9872 f(x)=0.0130 i= 9 x=0.9915 f(x)=0.0086 i=10 x=0.9944 f(x)=0.0057 i=11 x=0.9963 f(x)=0.0038 i=12 x=0.9975 f(x)=0.0025 i=13 x=0.9983 f(x)=0.0017 i=14 x=0.9989 f(x)=0.0011 i=15 x=0.9993 f(x)=0.0007 i=16 x=0.9995 f(x)=0.0005 i=17 x=0.9997 f(x)=0.0003 i=18 x=0.9998 f(x)=0.0002 i=19 x=0.9999 f(x)=0.0001 i=20 x=0.9999 f(x)=0.0001 22

VALUE ITERATION 23

Definitions for Value Iteration Whiteboard State trajectory Value function Bellman equations Optimal policy Optimal value function Computing the optimal policy Ex: Path Planning 24