Deep Reinforcement Learning CS

Similar documents
Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Exploration. CS : Deep Reinforcement Learning Sergey Levine

content First Introductory book to cover CAPM First to differentiate expected and required returns First to discuss the intrinsic value of stocks

Laboratorio di Intelligenza Artificiale e Robotica

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Python Machine Learning

CS 100: Principles of Computing

Laboratorio di Intelligenza Artificiale e Robotica

Computers Change the World

Innovative Methods for Teaching Engineering Courses

Virtually Anywhere Episodes 1 and 2. Teacher s Notes

Neuroscience I. BIOS/PHIL/PSCH 484 MWF 1:00-1:50 Lecture Center F6. Fall credit hours

An Introduction to Simio for Beginners

White Paper. The Art of Learning

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

LEARNER VARIABILITY AND UNIVERSAL DESIGN FOR LEARNING

What is PDE? Research Report. Paul Nichols

MATH Study Skills Workshop

Kindergarten Lessons for Unit 7: On The Move Me on the Map By Joan Sweeney

babysign 7 Answers to 7 frequently asked questions about how babysign can help you.

Axiom 2013 Team Description Paper

MYCIN. The MYCIN Task

INTERMEDIATE ALGEBRA PRODUCT GUIDE

(Sub)Gradient Descent

Reinforcement Learning by Comparing Immediate Reward

Guide to Teaching Computer Science

CS Course Missive

STUDENTS' RATINGS ON TEACHER

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Introduction, Organization Overview of NLP, Main Issues

CEE 2050: Introduction to Green Engineering

Grade 6: Module 2A: Unit 2: Lesson 8 Mid-Unit 3 Assessment: Analyzing Structure and Theme in Stanza 4 of If

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

ASTR 102: Introduction to Astronomy: Stars, Galaxies, and Cosmology

Making Sales Calls. Watertown High School, Watertown, Massachusetts. 1 hour, 4 5 days per week

Top US Tech Talent for the Top China Tech Company

Professional Learning Suite Framework Edition Domain 3 Course Index

MGT/MGP/MGB 261: Investment Analysis

Parents as Partners in Schooling

ACCOMMODATIONS MANUAL. How to Select, Administer, and Evaluate Use of Accommodations for Instruction and Assessment of Students with Disabilities

IN THIS UNIT YOU LEARN HOW TO: SPEAKING 1 Work in pairs. Discuss the questions. 2 Work with a new partner. Discuss the questions.

The Entrepreneurial Mindset Syllabus

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

K5 Math Practice. Free Pilot Proposal Jan -Jun Boost Confidence Increase Scores Get Ahead. Studypad, Inc.

Course Content Concepts

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Language and Literacy: Exploring Examples of the Language and Literacy Foundations

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

CLASSROOM PROCEDURES FOR MRS.

Multiple Intelligence Teaching Strategy Response Groups

Practical Strategies for Using Guided Math to Help Your Students Meet or Exceed the

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Accelerated Learning Online. Course Outline

AI Agent for Ice Hockey Atari 2600

Coaching Others for Top Performance 16 Hour Workshop

FINN FINANCIAL MANAGEMENT Spring 2014

Unpacking a Standard: Making Dinner with Student Differences in Mind

ACCT 100 Introduction to Accounting Course Syllabus Course # on T Th 12:30 1:45 Spring, 2016: Debra L. Schmidt-Johnson, CPA

Lesson plan for Maze Game 1: Using vector representations to move through a maze Time for activity: homework for 20 minutes

THINKING SKILLS, STUDENT ENGAGEMENT BRAIN-BASED LEARNING LOOKING THROUGH THE EYES OF THE LEARNER AND SCHEMA ACTIVATOR ENGAGEMENT POINT

Common Core Exemplar for English Language Arts and Social Studies: GRADE 1

Let's Learn English Lesson Plan

Getting Started with Deliberate Practice

Introduction to the Common European Framework (CEF)

No Parent Left Behind

Lecturing in the Preclinical Curriculum A GUIDE FOR FACULTY LECTURERS

EDIT 576 (2 credits) Mobile Learning and Applications Fall Semester 2015 August 31 October 18, 2015 Fully Online Course

Speeding Up Reinforcement Learning with Behavior Transfer

ADHD Classroom Accommodations for Specific Behaviour

Navigating the PhD Options in CMS

MGMT 479 (Hybrid) Strategic Management

been each get other TASK #1 Fry Words TASK #2 Fry Words Write the following words in ABC order: Write the following words in ABC order:

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes

CAFE ESSENTIAL ELEMENTS O S E P P C E A. 1 Framework 2 CAFE Menu. 3 Classroom Design 4 Materials 5 Record Keeping

Science Fair Rules and Requirements

College of Engineering and Applied Science Department of Computer Science

Synthesis Essay: The 7 Habits of a Highly Effective Teacher: What Graduate School Has Taught Me By: Kamille Samborski

Ericsson Wallet Platform (EWP) 3.0 Training Programs. Catalog of Course Descriptions

Introduction to Personality Daily 11:00 11:50am

Integrating Blended Learning into the Classroom

Strategic Management (MBA 800-AE) Fall 2010

BIOS 104 Biology for Non-Science Majors Spring 2016 CRN Course Syllabus

Evolution of Symbolisation in Chimpanzees and Neural Nets

BIODIVERSITY: CAUSES, CONSEQUENCES, AND CONSERVATION

4. Long title: Emerging Technologies for Gaming, Animation, and Simulation

INCORPORATING CHOICE AND PREFERRED

Hentai High School A Game Guide

ENEE 302h: Digital Electronics, Fall 2005 Prof. Bruce Jacob

Tutor Guidelines Fall 2016

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

SIMPLY THE BEST! AND MINDSETS. (Growth or fixed?)

WHAT DOES IT REALLY MEAN TO PAY ATTENTION?

Scott Foresman Addison Wesley. envisionmath

Introduce yourself. Change the name out and put your information here.

Accelerated Learning Course Outline

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

EDIT 576 DL1 (2 credits) Mobile Learning and Applications Fall Semester 2014 August 25 October 12, 2014 Fully Online Course

Computer Science is more important than Calculus: The challenge of living up to our potential

Transcription:

Deep Reinforcement Learning CS 294-112

Today 1. Course logistics (the boring stuff) 2. 20-minute introductions from each instructor

Course Staff Chelsea Finn PhD Student UC Berkeley John Schulman Research Scientist OpenAI Sergey Levine Assistant Professor UC Berkeley

Class Information & Resources Course website: rll.berkeley.edu/deeprlcourse/ Piazza: UC Berkeley, CS294-112 Subreddit (for non-enrolled students): www.reddit.com/r/berkeleydeeprlcourse/ Office hours: after class each day (but not today), sign up in advance for a 10-minute slot on the course website

Prerequisites & Enrollment All enrolled students must have taken CS189, CS289, or CS281A Please contact Sergey Levine if you haven t Please enroll for 3 units Wait list is (very) full, everyone near the top has been notified Lectures will be recorded Since the class is full, please watch the lectures online if you are not enrolled

What you should know Assignments will require training neural networks with standard automatic differentiation packages (TensorFlow or Theano) Review Section Chelsea Finn will teach a review section in week 2 Please fill out the poll here to help us choose a time: tinyurl.com/tfsection You should be able to at least do the TensorFlow MNIST tutorial (if not, come to the review section and ask questions!)

What we ll cover Full syllabus on course website 1. From supervised learning to decision making 2. Basic reinforcement learning: Q-learning and policy gradients 3. Advanced model learning and prediction, distillation, reward learning 4. Advanced deep RL: trust region policy gradients, actor-critic methods, exploration 5. Open problems, research talks, invited lectures

Assignments 1. Homework 1: Imitation learning (control via supervised learning) 2. Homework 2: Basic (shallow) RL 3. Homework 3: Deep Q learning 4. Homework 4: Deep policy gradients 5. Final project: Research-level project of your choice (form a group of up to 2-3 students, you re welcome to start early!) Grading: 40% homework (10% each), 50% project, 10% participation

How do we building intelligent machines? Imagine you have to build an intelligent machine, where do you start?

Learning as the basis of intelligence Some things we can all do (e.g. walking) Some things we can only learn (e.g. driving a car) We can learn a huge variety of things, including very difficult things Therefore our learning mechanism(s) are likely powerful enough to do everything we associate with intelligence Though it may still be very convenient to hard-code a few really important things

A single algorithm? An algorithm for each module? Or a single flexible algorithm? Seeing with your tongue Auditory Cortex Human echolocation (sonar) [BrainPort; Martinez et al; Roe et al.] adapted from A. Ng

What must that single algorithm do? Interpret rich sensory inputs Choose complex actions

Why deep reinforcement learning? Deep = can process complex sensory input and also compute really complex functions Reinforcement learning = can choose complex actions

Some evidence in favor of deep learning

Some evidence for reinforcement learning Percepts that anticipate reward become associated with similar firing patterns as the reward itself Basal ganglia appears to be related to reward system Model-free RL-like adaptation is often a good fit for experimental data of animal adaptation But not always

What can deep learning & RL do well now? Acquire high degree of proficiency in domains governed by simple, known rules Learn simple skills with raw sensory inputs, given enough experience Learn from imitating enough humanprovided expert behavior

What has proven challenging so far? Humans can learn incredibly quickly Deep RL methods are usually slow Humans can reuse past knowledge Transfer learning in deep RL is an open problem Not clear what the reward function should be Not clear what the role of prediction should be

observations actions Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child's? If this were then subjected to an appropriate course of education one would obtain the adult brain. general learning algorithm - Alan Turing environment