Introduction to Reinforcement Learning

Similar documents
Reinforcement Learning by Comparing Immediate Reward

(Sub)Gradient Descent

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Lecture 10: Reinforcement Learning

Exploration. CS : Deep Reinforcement Learning Sergey Levine

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Python Machine Learning

Axiom 2013 Team Description Paper

Laboratorio di Intelligenza Artificiale e Robotica

TD(λ) and Q-Learning Based Ludo Players

Georgetown University at TREC 2017 Dynamic Domain Track

ECO 3101: Intermediate Microeconomics

Generative models and adversarial training

Laboratorio di Intelligenza Artificiale e Robotica

Lecture 6: Applications

Lecture 1: Machine Learning Basics

Lesson plan for Maze Game 1: Using vector representations to move through a maze Time for activity: homework for 20 minutes

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

A Reinforcement Learning Variant for Control Scheduling

On the Combined Behavior of Autonomous Resource Management Agents

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Welcome to ACT Brain Boot Camp

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Speeding Up Reinforcement Learning with Behavior Transfer

AP Calculus AB. Nevada Academic Standards that are assessable at the local level only.

LEGO MINDSTORMS Education EV3 Coding Activities

Hentai High School A Game Guide

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

AI Agent for Ice Hockey Atari 2600

1.11 I Know What Do You Know?

Universal Design for Learning Lesson Plan

Experience Corps. Mentor Toolkit

RI.2.4 Determine the meaning of words and phrases in a text relevant to a grade 2 topic or subject area.

CSL465/603 - Machine Learning

UDL AND LANGUAGE ARTS LESSON OVERVIEW

Multiple Intelligences 1

Entrepreneurship: Running the Business Side of Your Voice Acting Career Graeme Spicer - Edge Studio

Assessing Functional Relations: The Utility of the Standard Celeration Chart

Seinäjoki Vocational Education Centre. Ähtäri, Koulutie. Koulutie 16A, ÄHTÄRI Phone

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

Results In. Planning Questions. Tony Frontier Five Levers to Improve Learning 1

Participant s Journal. Fun and Games with Systems Theory. BPD Conference March 19, 2009 Phoenix AZ

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

SESSION 2: HELPING HAND

Modeling user preferences and norms in context-aware systems

ONBOARDING NEW TEACHERS: WHAT THEY NEED TO SUCCEED. MSBO Spring 2017

Pedagogical Content Knowledge for Teaching Primary Mathematics: A Case Study of Two Teachers

What to Do When Conflict Happens

PILLAR 2 CHAMPIONSHIP CULTURE

WHAT DOES IT REALLY MEAN TO PAY ATTENTION?

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Spring 2016 Stony Brook University Instructor: Dr. Paul Fodor

CAFE ESSENTIAL ELEMENTS O S E P P C E A. 1 Framework 2 CAFE Menu. 3 Classroom Design 4 Materials 5 Record Keeping

San José State University Department of Psychology PSYC , Human Learning, Spring 2017

Intelligent Agents. Chapter 2. Chapter 2 1

Community Power Simulation

The Evolution of Random Phenomena

Abstractions and the Brain

ALL-IN-ONE MEETING GUIDE THE ECONOMICS OF WELL-BEING

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Mathematics Program Assessment Plan

Motivation to e-learn within organizational settings: What is it and how could it be measured?

Common Core Exemplar for English Language Arts and Social Studies: GRADE 1

BA 130 Introduction to International Business

TEAM-BUILDING GAMES, ACTIVITIES AND IDEAS

Leader s Guide: Dream Big and Plan for Success

KIEI-903: Corporate Innovation and New Ventures. Syllabus. Fall Professors Dean DeBiase & Paul Earle TA - J.J. Malfettone

Math 1313 Section 2.1 Example 2: Given the following Linear Program, Determine the vertices of the feasible set. Subject to:

MADERA SCIENCE FAIR 2013 Grades 4 th 6 th Project due date: Tuesday, April 9, 8:15 am Parent Night: Tuesday, April 16, 6:00 8:00 pm

How To Take Control In Your Classroom And Put An End To Constant Fights And Arguments

Tracy Dudek & Jenifer Russell Trinity Services, Inc. *Copyright 2008, Mark L. Sundberg

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

MTH 215: Introduction to Linear Algebra

MYCIN. The MYCIN Task

Lecture 1: Basic Concepts of Machine Learning

Science Fair Project Handbook

AGENDA Symposium on the Recruitment and Retention of Diverse Populations

FINN FINANCIAL MANAGEMENT Spring 2014

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS

Executive Summary. Lincoln Middle Academy of Excellence

Safe & Civil Schools Series Overview

GUIDELINES FOR HUMAN GENETICS

INTERMEDIATE ALGEBRA PRODUCT GUIDE

OUCH! That Stereotype Hurts Cultural Competence & Linguistic Training Summary of Evaluation Results June 30, 2014

Artificial Neural Networks written examination

ADHD Classroom Accommodations for Specific Behaviour

CS 100: Principles of Computing

CS/SE 3341 Spring 2012

AMULTIAGENT system [1] can be defined as a group of

An Introduction to Simio for Beginners

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Kindergarten Lessons for Unit 7: On The Move Me on the Map By Joan Sweeney

Science Fair Rules and Requirements

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Faculty Schedule Preference Survey Results

Ohio s Learning Standards-Clear Learning Targets

Transcription:

Introduction to Reinforcement Learning Kevin Chen and Zack Khan

Outline 1. Course Logistics 2. What is Reinforcement Learning? 3. Influences of Reinforcement Learning 4. Agent-Environment Framework 5. Summary 6. Reinforcement Learning Framework

Course Logistics

Course Information and Resources - Course website: cmsc389f.umd.edu (not ready yet) - Piazza: piazza.com/umd/spring2018/cmsc389f - Book (optional): Reinforcement Learning, an Introduction by Sutton & Barto, 2018

Prerequisites Minimum Prerequisites: CMSC216 and CMSC250 Recommended Background: - Basic Statistics - Basic Python - Familiarity with UNIX - Interest in Reinforcement Learning!

Course Topics For the full (tentative) schedule of topics, visit cmsc389f.umd.edu Intuition Theory Application Lecture 2: Reinforcement Learning Framework Lecture 3: Markov Decision Processes Lecture 4: OpenAI Gym and Universe Lecture 5: Bellman Expectation Equations Lecture 6: Optimal Policy through Policy and Value Iteration Lecture 7: Policy Iteration and Value Iteration in Gridworld Lecture 8: Model-Free Methods (Monte Carlo) Lecture 9: Monte Carlo Prediction and Control Lecture 10: Temporal Difference Learning Lecture 11: SARSA and Q-Learning Lecture 12: Value Function Approximation Lecture 13: Linear Approximation in Mountain Car Lecture 14: Deep Reinforcement Learning

Assignments - - Weekly problem sets - Short and simple - Graded on completion - Due 1 hour before class (email to cmsc389f@gmail.com) One final research project - Create an RL implementation or tackle a RL research problem - Write up a 3-6 page research paper - Focused on exploration, doesn t need to be too complex

Grading - Problem Sets: 50% - Take-home Midterm: 20% - Research Project: 30%

You ll Be Able To... 1. Understand modern RL research papers 2. Create your own RL AIs in a variety of games 3. Take further advanced machine learning classes

What is Reinforcement Learning?

Comparison with Other Methods Three categories of machine learning: Reinforcement Learning Silver (2017) Supervised Learning Unsupervised Learning

Comparison with Other Methods: Supervised Learning Supervised Learning: learn a model (a function) to accurately classify data into categories. To learn this model, we teach our model using data that has already been correctly categorized.

Comparison with Other Methods: Unsupervised Learning Unsupervised Learning: finding structure and relationships within unlabelled datasets

Reinforcement Learning Reinforcement Learning is an area of machine-learning that utilizes the concept of learning through interacting with a surrounding environment. - Decision-making Goal-oriented learning

Example: Teaching a dog a trick How can we teach a Fluffy a trick?

Example: Teaching a dog a trick How can we teach a Fluffy a trick? Give Fluffy treats!

Example: Teaching a dog a trick How can we teach a Fluffy a trick? Give Fluffy treats! We teach Fluffy how to best behave in an environment, by giving him treats, so he knows how to adjust his behavior.

Example: Teaching a dog a trick Takeaway 1: We found a way of teaching Fluffy behavior!

Example: Teaching a dog a trick Takeaway 2: We re not explicitly telling Fluffy what to do. Fluffy is learning what to do, based on reward that he encounters.

Example: Teaching a dog a trick Question: How is Fluffy figuring out how to adjust his behavior based on the reward?

Example: Teaching a dog a trick Idea: What if we make a software Fluffy? Something that can learn in an environment on its own... (as long as there s reward)

Videos 1. How to Walk: https://www.youtube.com/watch?v=gn4nrcc9twq 2. Autonomous Stunt Helicopters: https://www.youtube.com/watch?v=vcdxqn0fcne&t=5s

The Reinforcement Learning Problem How should software agents take actions in an environment, to maximize cumulative reward?

Comparison with Other Methods: Overview Reinforcement Learning Supervised Learning Unsupervised Learning reward signal affects environment delayed feedback actions affect later data supervisor doesn t affect environment instant feedback no supervisor/reward doesn t affect environment no feedback

Comparison with Other Methods: Pros/Cons Con: requires a huge amount of data, often more than Supervised Learning Con: environments can be hard to describe RL is useful when. We do not know the optimal actions to take We are dealing with large state spaces. (ex: Go)

Reward Hypothesis Reward Hypothesis: We can formulate any goal as the maximization of some reward

Influences of Reinforcement Learning

Psychology: Law of Effect Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur... The great the satisfaction or discomfort, the greater the strengthening or weakening of the bond. (Thorndike, 1911, p. 244)

Optimal Control Finding a control law to achieve some optimality criterion in a system - Related to reinforcement learning Richer history

Example: Optimal Control Example: Say Jim is driving back from I-270 after a long day of classes, and he wants to get home as fast as possible. Problem: How much should Jim accelerate to get home as fast as possible?. System: Jim and the road Optimality criterion: minimization of the Jim s travel time (under constraints)

Example: Animal Learning Example: 5-year-old Jim walks into the kitchen. Little Jim sees a glowing red circle on the stove. Little Jim reaches out his hand and touches it. Ouch, that hurt! Little Jim decides to never touch the red-hot stove ever again.

Reinforcement Learning in Context Silver (2017)

Why Study RL Now? 1. Computation Power 2. Deep Learning 3. New Ideas in Reinforcement Learning

Reinforcement Learning Today - One of MIT Technology Review s 10 Breakthrough Technologies of 2017. - Main driver of innovation behind industry titans such as Google DeepMind (AlphaGo), OpenAI (Video Games), and Tesla (Self-Driving Cars)

Examples of RL in the Real World Google uses RL to decrease energy used in data centres by 40%, finding optimal conditions that optimize energy efficiency. https://environment.google/projects/machine-learning/ More examples can be found at: https://www.oreilly.com/ideas/practical-applications-of-reinforcement-learning-in-industry

Agent-environment Framework

Agent-environment Framework IMPORTANT NOTE: There is no actual learning described in this section. We are only setting up the framework in which learning will occur.

Agent and Environment Two key parts of an RL system: Agent and Environment Agents take actions within an environment Environment responds to agent actions with rewards (or no reward)

Agent and Environment Two key parts of an RL system: Agent and Environment Agents take actions within an environment Environment responds to agent actions with rewards (or no reward)

Example 1

Example 2

Example 3

Example 3 Sparse Reward Money is not rewarded until far in the future, too far for us to predict. Since we do not see this reward very often, we call this a Sparse Reward, which should be avoided

Example 4 Grades would be a more efficient reward as the rewards come in more frequently in relation to the action of studying

Agent-environment Framework II

Agent and Environment II Environment can be represented as a set of states that the agent exists in. When an agent takes an action, it will move into a new state.

Agent and Environment II Environment can be represented as a set of states that the agent exists in. When an agent takes an action, it will move into a new state, and receive a reward. State Agent State State State State T=0

Agent and Environment II Environment can be represented as a set of states that the agent exists in. State When an agent takes an action, it will move into a new state, and receive a reward. To model time: after every action, time t increases by 1 State State Agent State State T=1

Agent and Environment II Environment can be represented as a set of states that the agent exists in. When an agent takes an action, it will move into a new state, and receive a reward. State State State To model time: after every action, time t increases by 1 Agent State State T=2

Agent and Environment II Environment can be represented as a set of states that the agent exists in. When an agent takes an action, it will move into a new state, and receive a reward. State State To model time: after every action, time t increases by 1 State Agent State State T=3

Agent Behavior What if we tell the agent which actions to take, based on the state that they are in?

Agent Behavior Example: If the paddle is in a state where it is below the maximum height, take the move up action

Agent Behavior Example: If the paddle is in a state where it is below the maximum height, take the move up action This is an AI!

Agent Behavior Example: If the paddle is in a state where it is below the maximum height, take the move up action This is an AI! (a really dumb one)

Agent Behavior Example 2: If the paddle is in a state where it is below the ball, we say take the move up action If the paddle is in a state where it is above the ball, we say take the move down action

Agent Behavior Example 2: If the paddle is in a state where it is below the ball, we say take the move up action If the paddle is in a state where it is above the ball, we say take the move down action This is also an AI! (a smart one)

Agent Behavior What if we tell the agent which actions to take, based on the state that they are in? Answer: We get an AI! What if we tell the agent which actions to take, based on the state that they are in, in such a way that those actions will result in maximizing reward? Answer: We get a smart AI! Figuring out how to do the above is what Reinforcement Learning is about!

Pong Example

Pong Example Environment: Pong Game (clock, game physics, etc) Environment Reward: Scoring a Point Goal: Winning the Game

Pong Example Environment: Pong Game (clock, game physics, etc) Environment Reward: Scoring a Point Goal: Winning the Game Agent: Paddle Agent Actions: Move up, Move down

Agent and Environment Goal of Reinforcement Learning: Figure out which actions the agent can take in the environment, to maximize some cumulative reward, in order to achieve a goal

Pong Example Agent: Move paddle up Environment: Move paddle into new state

Pong Example Agent: Move paddle up Environment: Move paddle into new state New State: - One pixel above - Time increases by 1

Pong Example Example: Paddle is in State 1: (height 6, time 0) Paddle takes action: Move up Environment moves Paddle to State 2 Paddle is in State 2: (height 7, time 1) Paddle takes action: Move down Environment moves Paddle to State 3 Paddle is in State 3: (height 6, time 2) NOTE: State numbering is arbitrary

Summary 1. Reinforcement Learning (RL) is about an agent maximizing reward by interacting with its surrounding environment 2. RL has distinct advantages over other AI methods, but often requires more data or understanding of the problem/situation 3. Agents take actions within an environment. Environment responds with rewards (or no reward) 4. After an action, the agent moves into a new state of the environment 5. Figuring out how to tell an agent what actions to take, in order to maximize reward, is the key to reinforcement learning and creating a good AI

What s Next Next week, we ll learn build on our understanding of the Reinforcement Learning Framework Then, we ll start formalizing the concept of states, rewards, etc., mathematically After that, we ll start to construct a solution for how to solve the Reinforcement Learning Problem HOMEWORK: Join Piazza! Problem Set 1 is out on the website! Due by next class, send solutions to cmsc389f@gmail.com

Additional Resources Machine Learning at Maryland - Undergraduate Journal Club (Feb. 7th, 6:00pm, Location: TBD) Machine Learning Faculty - Computer Vision Department, Computational Linguistics (CLIP) Department, etc