Meta Learning & Self Play

Similar documents
Laboratorio di Intelligenza Artificiale e Robotica

Axiom 2013 Team Description Paper

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Laboratorio di Intelligenza Artificiale e Robotica

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

TD(λ) and Q-Learning Based Ludo Players

Reinforcement Learning by Comparing Immediate Reward

Georgetown University at TREC 2017 Dynamic Domain Track

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

A Reinforcement Learning Variant for Control Scheduling

Artificial Neural Networks written examination

Lecture 1: Machine Learning Basics

The Strong Minimalist Thesis and Bounded Optimality

Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Evolution of Symbolisation in Chimpanzees and Neural Nets

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

AI Agent for Ice Hockey Atari 2600

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

A Neural Network GUI Tested on Text-To-Phoneme Mapping

While you are waiting... socrative.com, room number SIMLANG2016

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

A Case-Based Approach To Imitation Learning in Robotic Agents

Lecture 10: Reinforcement Learning

Lecture 6: Applications

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Machine Learning and Development Policy

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Seminar - Organic Computing

FF+FPG: Guiding a Policy-Gradient Planner

Selling Skills. Tailored to Your Needs. Consultants & trainers in sales, presentations, negotiations and influence

The dilemma of Saussurean communication

Practice Examination IREB

Speeding Up Reinforcement Learning with Behavior Transfer

Learning Prospective Robot Behavior

Getting Started with Deliberate Practice

(Sub)Gradient Descent

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Abstractions and the Brain

K5 Math Practice. Free Pilot Proposal Jan -Jun Boost Confidence Increase Scores Get Ahead. Studypad, Inc.

An Introduction to Simio for Beginners

SCHOOL WITHOUT CLASSROOMS BERLIN ARCHITECTURE COMPETITION TO

Knowledge-Based - Systems

Python Machine Learning

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

The Enterprise Knowledge Portal: The Concept

Improving Action Selection in MDP s via Knowledge Transfer

Strategic Management and Business Policy Globalization, Innovation, and Sustainability Fourteenth Edition

Evolutive Neural Net Fuzzy Filtering: Basic Description

A non-profit educational institution dedicated to making the world a better place to live

Introduction to Simulation

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

WELCOME! Of Social Competency. Using Social Thinking and. Social Thinking and. the UCLA PEERS Program 5/1/2017. My Background/ Who Am I?

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

LTHS Summer Reading Study Packet

An OO Framework for building Intelligence and Learning properties in Software Agents

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

New Venture Financing

Probabilistic Latent Semantic Analysis

STUDENTS' RATINGS ON TEACHER

Experience Corps. Mentor Toolkit

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Learning From the Past with Experiment Databases

MYCIN. The MYCIN Task

Case Acquisition Strategies for Case-Based Reasoning in Real-Time Strategy Games

A Finnish Academic Libraries Perspective on the Information Literacy Framework

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

Australian Journal of Basic and Applied Sciences

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

State Budget Update February 2016

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

Go fishing! Responsibility judgments when cooperation breaks down

University of Groningen. Peer influence in clinical workplace learning Raat, Adriana

High-level Reinforcement Learning in Strategy Games

Navigating the PhD Options in CMS

COMPUTER-AIDED DESIGN TOOLS THAT ADAPT

How the Guppy Got its Spots:

Success Factors for Creativity Workshops in RE

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

LEARNING TO PLAY IN A DAY: FASTER DEEP REIN-

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Community Power Simulation

A Genetic Irrational Belief System

Chapter 2. Intelligent Agents. Outline. Agents and environments. Rationality. PEAS (Performance measure, Environment, Actuators, Sensors)

Predicting Future User Actions by Observing Unmodified Applications

Guidelines for the Master s Thesis Project in Biomedicine BIMM60 (30 hp): planning, writing and presentation.

Active Ingredients of Instructional Coaching Results from a qualitative strand embedded in a randomized control trial

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Probability estimates in a scenario tree

SSIS SEL Edition Overview Fall 2017

Making Sales Calls. Watertown High School, Watertown, Massachusetts. 1 hour, 4 5 days per week

Social Emotional Learning in High School: How Three Urban High Schools Engage, Educate, and Empower Youth

Learning Cases to Resolve Conflicts and Improve Group Behavior

XXII BrainStorming Day

Computerized Adaptive Psychological Testing A Personalisation Perspective

Transcription:

Meta Learning & Self Play Ilya Sutskever MARCH 24, 2018

The Reinforcement Learning Problem

Reinforcement Learning (RL) Good framework for building intelligent agents Acting to achieve goals is a key part of intelligence Can specify nearly any AI problem RL is interesting because interesting RL algorithms exist Agent action Environment observation reward

Reinforcement Learning Formulation: find a policy that maximizes expected reward Rewards are given by the environment In the real world, environments don t specify rewards. It is up to the agent to determine that a reward has occurred. Agent action observation reward Environment

Reinforcement Learning Agent = neural network action Environment observation reward

Reinforcement Learning algorithms in a nutshell Add randomness to your actions If the result was better than expected, do more of the same in the future

RL s potential An agent running a really good RL algorithm can accomplish an overwhelming variety of tasks The goal achiever A truly good RL algorithm will combine elements supervised learning unsupervised and representation learning reasoning and inference and test time and more! Today s RL algorithms have a very long way to go But it doesn t mean that progress will be slow

Hindsight Experience Replay [Andrychowicz et al., 2017]

Exploration can be hard When rewards are spares, most random attempts result in failure, and thus no learning Can we learn from failure?

Learn From Failure Setup: build a system that can reach any state Goal: reach state A Any trajectory ends up in some other state B Use this as training data to reach state B? Try to reach A A Starting point The result: how to reach B B

Cool visual explanation of HER

Dynamics randomization for Sim2Real [Peng et al., 2017]

Sim2Real with meta learning [Peng et al., 2017] It would be nice to train robots in simulation And have the policies succeed on the real robot

Key idea : simulation randomization Randomize simulation parameters Gravity Friction Torques Width and length of different geometric shapes Type of contact simulation Etc. Train a policy that can adapt to all settings of simulation parameters

This is a meta learning approach Policy quickly infers simulation parameters Could it infer the simulation parameters of the real world?

Baseline

Results

Learning a hierarchy of actions with meta learning [Frans et al., 2017]

It would be nice if learning was hierarchical Current RL learns by trying out random actions at each timestep Downsides: Hard to explore in a persistent direction Hard to do credit assignment over long horizons Example: Suppose all your agents want to maximize your GDP Should each agent decide if it should go to work on the basis of GDP fluctuations? May require a real model to really solve this problem

Meta learning approach to hierarchy Ingredients: a distribution over tasks Goal: learn a set of meta-actions that solve training tasks as quickly as possible

Evolved Policy Gradients [Houthooft et al., 2018]

Goal: learn a cost function that leads to rapid learning Train a cost function such that RL on this cost function learns very quickly Ingredients: a distribution over tasks Use evolution strategies to learn the cost function

Result: a single learning trial

Result: a single learning trial Learned cost: never learned to move right

Self Play

Self Play: TD-Gammon TD-Gammon (Tesauro, 1992) Incredibly old work: Q-learning + neural networks + self-play Beat all humans, discovered unconventional strategies that were deemed to be better! Approach was dormant until DQN for Atari

Self Play: AlphaGo Zero

Self Play: Dota 2 Pure self play Popular competitive online e-sports game Serious professional scene: $140M awarded in prizes in 2016 5v5 is main variant; 1v1 also played OpenAI beat all the pros 1v1

Appealing properties of Self Play Simple environment extremely complex strategy Convert compute into data Perfect curriculum

Carl Sims, 1994 Self Play: Artificial Life

Carl Sims, 1994 Self Play: Artificial Life

Self Play for physicality and dexterity Environment is simple, behavior is very complex Pre-train general dexterity by competing against an opponent [Bansal et al., 2017]

What s next? Main open question: design the self play environment so that the result will be useful to some external task

Can Self Play lead all the way to AGI? Social life incentivizes evolution of intelligence Homo sapiens Because corvids and apes share these cognitive tools, we argue that complex cognitive abilities evolved multiple times in Homo neanderthalensis 1500 cm 3 distantly related species with vastly different brain structures in order to solve similar socioecological problems. Science, Vol. 306, Issue 5703, pp. Homo erectus Homo habilis 1000 cm 3 Cranial Capacity 1903-1907 Australopithecus Mate selection Sahelanthropus 500 cm 3 Open-ended self play produces: Theory of mind, negotiation, social skills, empathy, real language understanding -7-6 -5-4 -3-2 -1.7-1 0-0.7 Millions of Years Ago

AI Alignment: Learning from human feedback [Christiano et al., 2017]

How to communicate goals quickly? One approach: have humans judge the behavior of an algorithm

Human judges select good behavior

Fit a scalar reward function to the human feedback Optimize a triplet loss: if a human judge deems that A > B Learn a real-valued reward consistent with the human feedback PREDICTED REWARD REWARD PREDICTOR HUMAN FEEDBACK RL ALGORITHM OBSERVATION ACTION ENVIRONMENT

500 bits of interaction It works

It works Several thousand bits of interactions to solve Atari games

Drive right behind the competitor Can easily convey unusual goals

Alignment: the future The technical problem of subtle communication will likely be solved But what are the right goals? Political problem

Thanks! Visit openai.com to learn more.