Goals for the Course

Similar documents
TD(λ) and Q-Learning Based Ludo Players

Axiom 2013 Team Description Paper

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Reinforcement Learning by Comparing Immediate Reward

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

An OO Framework for building Intelligence and Learning properties in Software Agents

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

High-level Reinforcement Learning in Strategy Games

Speeding Up Reinforcement Learning with Behavior Transfer

Laboratorio di Intelligenza Artificiale e Robotica

Improving Action Selection in MDP s via Knowledge Transfer

Case Acquisition Strategies for Case-Based Reasoning in Real-Time Strategy Games

Artificial Neural Networks written examination

Laboratorio di Intelligenza Artificiale e Robotica

Introduction to Simulation

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Lecture 6: Applications

Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners

A Reinforcement Learning Variant for Control Scheduling

Georgetown University at TREC 2017 Dynamic Domain Track

Lecture 10: Reinforcement Learning

Learning Prospective Robot Behavior

How long did... Who did... Where was... When did... How did... Which did...

WHY GRADUATE SCHOOL? Turning Today s Technical Talent Into Tomorrow s Technology Leaders

Automatic Discretization of Actions and States in Monte-Carlo Tree Search

ECE-492 SENIOR ADVANCED DESIGN PROJECT

FF+FPG: Guiding a Policy-Gradient Planner

AMULTIAGENT system [1] can be defined as a group of

Managerial Decision Making

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

University of Victoria School of Exercise Science, Physical and Health Education EPHE 245 MOTOR LEARNING. Calendar Description Units: 1.

A Case-Based Approach To Imitation Learning in Robotic Agents

Intelligent Agents. Chapter 2. Chapter 2 1

InTraServ. Dissemination Plan INFORMATION SOCIETY TECHNOLOGIES (IST) PROGRAMME. Intelligent Training Service for Management Training in SMEs

Activities for School

LONGVIEW LOBOS HIGH SCHOOL SOCCER MANUAL

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

TEACH 3: Engage Students at All Levels in Rigorous Work

Learning and Transferring Relational Instance-Based Policies

Tutor Guidelines Fall 2016

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

Surprise-Based Learning for Autonomous Systems

TEAM-BUILDING GAMES, ACTIVITIES AND IDEAS

Lecture 1: Machine Learning Basics

Planning with External Events

Probabilistic Latent Semantic Analysis

Evidence for Reliability, Validity and Learning Effectiveness

Uncertainty concepts, types, sources

Evolutive Neural Net Fuzzy Filtering: Basic Description

Predicting Future User Actions by Observing Unmodified Applications

How do adults reason about their opponent? Typologies of players in a turn-taking game

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Learning Methods for Fuzzy Systems

CSL465/603 - Machine Learning

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

DOCTOR OF PHILOSOPHY HANDBOOK

Probability estimates in a scenario tree

A Metacognitive Approach to Support Heuristic Solution of Mathematical Problems

SARDNET: A Self-Organizing Feature Map for Sequences

Getting Started with Deliberate Practice

Fundraising 101 Introduction to Autism Speaks. An Orientation for New Hires

An investigation of imitation learning algorithms for structured prediction

Developmental coordination disorder DCD. Overview. Gross & fine motor skill. Elisabeth Hill The importance of motor development

P a g e 1. Grade 4. Grant funded by: MS Exemplar Unit English Language Arts Grade 4 Edition 1

CS/SE 3341 Spring 2012

Fall Classes At A Glance

B. How to write a research paper

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

CHAPTER V: CONCLUSIONS, CONTRIBUTIONS, AND FUTURE RESEARCH

M-Learning. Hauptseminar E-Learning Sommersemester Michael Kellerer LFE Medieninformatik

Title:A Flexible Simulation Platform to Quantify and Manage Emergency Department Crowding

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Soft Computing based Learning for Cognitive Radio

Process to Identify Minimum Passing Criteria and Objective Evidence in Support of ABET EC2000 Criteria Fulfillment

Analyzing Linguistically Appropriate IEP Goals in Dual Language Programs

Affecting Factors to Improve Adversity Quotient in Children through Game-based Learning

How to Do Research. Jeff Chase Duke University

give every teacher everything they need to teach mathematics

Softprop: Softmax Neural Network Backpropagation Learning

SC 16 - Salt Lake City, Utah

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

Faculty Schedule Preference Survey Results

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Two heads can be better than one

ACC : Accounting Transaction Processing Systems COURSE SYLLABUS Spring 2011, MW 3:30-4:45 p.m. Bryan 202

Work Stations 101: Grades K-5 NCTM Regional Conference &

Regret-based Reward Elicitation for Markov Decision Processes

Agents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators

95723 Managing Disruptive Technologies

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

Curriculum Design Project with Virtual Manipulatives. Gwenanne Salkind. George Mason University EDCI 856. Dr. Patricia Moyer-Packenham

Navigating the PhD Options in CMS

Emergency Management Games and Test Case Utility:

Lecture 1: Basic Concepts of Machine Learning

MATH Study Skills Workshop

Transcription:

Goals for the Course Learn the methods and foundational ideas of RL Prepare to apply RL Prepare to do research in RL Learn some new ways of thinking about AI research The agent perspective The skeptical perspective

Complete Agent Temporally situated Continual learning and planning Object is to affect the environment Environment is stochastic and uncertain Environment state action reward Agent

What is Reinforcement Learning? An approach to Artificial Intelligence Learning from interaction Goal-oriented learning Learning about, from, and while interacting with an external environment Learning what to do how to map situations to actions so as to maximize a numerical reward signal

Chapter 1: Introduction Artificial Intelligence Psychology Reinforcement Learning (RL) Control Theory and Operations Research Neuroscience Artificial Neural Networks

Key Features of RL Learner is not told which actions to take Trial-and-Error search Possibility of delayed reward Sacrifice short-term gains for greater longterm gains The need to explore and exploit Considers the whole problem of a goal-directed agent interacting with an uncertain environment

Examples of Reinforcement Learning Robocup Soccer Teams Stone & Veloso, Reidmiller et al. World s best player of simulated soccer, 1999; Runner-up 2000 Inventory Management Van Roy, Bertsekas, Lee & Tsitsiklis 10-15% improvement over industry standard methods Dynamic Channel Assignment Singh & Bertsekas, Nie & Haykin World's best assigner of radio channels to mobile telephone calls Elevator Control Crites & Barto (Probably) world's best down-peak elevator controller Many Robots navigation, bi-pedal walking, grasping, switching between skills... TD-Gammon and Jellyfish Tesauro, Dahl World's best backgammon player

Get out a pen and paper Please write down several things (maybe up to 5) that you hope to learn in this course Any other expectations that you have of me for this course

Supervised Learning Training Info = desired (target) outputs Inputs Supervised Learning System Outputs Error = (target output actual output)

Reinforcement Learning Training Info = evaluations ( rewards / penalties ) Inputs RL System Outputs ( actions ) Objective: get as much reward as possible

For next time Get a copy of the textbook Read chapter 1 thru page 9 (up thru section 1.3) Jot down some questions, bring them to class Please consider committing some serious time and thought to this class

Today Give an overview of the whole RL problem Before we break it up into parts to study individually Introduce the cast of characters Experience (reward) Policies Value functions Models of the environment Tic-Tac-Toe example Thought questions

Elements of RL Policy Reward Value Model of environment Policy: what to do Reward: what is good Value: what is good because it predicts reward Model: what follows what

A Somewhat Less Misleading View external sensations memory reward RL agent state internal sensations actions

An Extended Example: Tic-Tac-Toe O O O O O O O O O O O O x o x x......... o x x............... x... o x } x s move } o s move } x s move } o s move Assume an imperfect opponent: he/she sometimes makes mistakes } x s move x o x x o

An RL Approach to Tic-Tac-Toe 1. Make a table with one entry per state: State x x x x o o x x o o o o x o o x x x o o V(s) estimated probability of winning.5?.5? 1 win 0 loss 0 draw 2. Now play lots of games. To pick our moves, look ahead one step: * current state various possible next states Just pick the next state with the highest estimated prob. of winning the largest V(s); a greedy move. But 10% of the time pick a move at random; an exploratory move.

RL Learning Rule for Tic-Tac-Toe Opponent's Move{ Our Move{ Opponent's Move{ Our Move{ Opponent's Move{ Our Move{ e' * Starting P osition c c* d a b e f g g* Exploratory move s the state before our greedy move s the state after our greedy move We increment each V(s) toward V( s ) a backup : V(s) V(s) V( s ) V(s) a small positive fraction, e.g.,.1 the step - size parameter

How can we improve this T.T.T. player? Take advantage of symmetries representation/generalization How might this backfire? Do we need random moves? Why? Do we always need a full 10%? Can we learn from random moves? Can we learn offline? Pre-training from self play? Using learned models of opponent?...

e.g. Generalization Table Generalizing Function Approximator State V State V s s s... 1 2 3 Train here s N

e.g. Generalization Table Generalizing Function Approximator State V State V s s s... 1 2 3 Train here s N

How is Tic-Tac-Toe Too Easy? Finite, small number of states One-step look-ahead is always possible State completely observable...

The Book Part I: The Problem Introduction Evaluative Feedback The Reinforcement Learning Problem Part II: Elementary Solution Methods Dynamic Programming Monte Carlo Methods Temporal Difference Learning Part III: A Unified View Eligibility Traces Generalization and Function Approximation Planning and Learning Dimensions of Reinforcement Learning Case Studies

Next Classes Tuesday: Read Chapter 2 Evaluative Feedback 2 thought questions due One week from today Chapter 2 exercises due, as in the schedule Additional exercises 2.25 and 2.55 are given off of the main course page