# Exploration vs. Exploitation. CS 473: Artificial Intelligence Reinforcement Learning II. How to Explore? Exploration Functions

Save this PDF as:

Size: px
Start display at page:

Download "Exploration vs. Exploitation. CS 473: Artificial Intelligence Reinforcement Learning II. How to Explore? Exploration Functions"

## Transcription

1 CS 473: Artificial Intelligence Reinforcement Learning II Exploration vs. Exploitation Dieter Fox / University of Washington [Most slides were taken from Dan Klein and Pieter Abbeel / CS188 Intro to AI at UC Berkeley. All CS188 materials are available at How to Explore? Video of Demo Q-learning Manual Exploration Bridge Grid Several schemes for forcing exploration Simplest: random actions (ε-greedy) Every time step, flip a coin With (small) probability ε, act randomly With (large) probability 1-ε, act on current policy Problems with random actions? You do eventually explore the space, but keep thrashing around once learning is done One solution: lower ε over time Another solution: exploration functions Video of Demo Q-learning Epsilon-Greedy Crawler Exploration Functions When to explore? Random actions: explore a fixed amount Better idea: explore areas whose badness is not (yet) established, eventually stop exploring Exploration function Takes a value estimate u and a visit count n, and returns an optimistic utility, e.g. Regular Q-Update: Modified Q-Update: Note: this propagates the bonus back to states that lead to unknown states as well! 1

2 Video of Demo Q-learning Exploration Function Crawler Regret Even if you learn the optimal policy, you still make mistakes along the way! Regret is a measure of your total mistake cost: the difference between your (expected) rewards, including youthful suboptimality, and optimal (expected) rewards Minimizing regret goes beyond learning to be optimal it requires optimally learning to be optimal Example: random exploration and exploration functions both end up optimal, but random exploration has higher regret Approximate Q-Learning Generalizing Across States Basic Q-Learning keeps a table of all q-values In realistic situations, we cannot possibly learn about every single state! Too many states to visit them all in training Too many states to hold the q-tables in memory Instead, we want to generalize: Learn about some small number of training states from experience Generalize that experience to new, similar situations This is a fundamental idea in machine learning, and we ll see it over and over again [demo RL pacman] Example: Pacman Video of Demo Q-Learning Pacman Tiny Watch All Let s say we discover through experience that this state is bad: In naïve q-learning, we know nothing about this state: Or even this one! [Demo: Q-learning pacman tiny watch all (L11D5)] [Demo: Q-learning pacman tiny silent train (L11D6)] [Demo: Q-learning pacman tricky watch all (L11D7)] 2

3 Video of Demo Q-Learning Pacman Tiny Silent Train Video of Demo Q-Learning Pacman Tricky Watch All Feature-Based Representations Linear Value Functions Solution: describe a state using a vector of features (aka properties ) Features are functions from states to real numbers (often /1) that capture important properties of the state Example features: Distance to closest ghost Distance to closest dot Number of ghosts 1 / (dist to dot) 2 Is Pacman in a tunnel? (/1) etc. Is it the exact state on this slide? Can also describe a q-state (s, a) with features (e.g. action moves closer to food) Using a feature representation, we can write a q function (or value function) for any state using a few weights: Advantage: our experience is summed up in a few powerful numbers Disadvantage: states may share features but actually be very different in value! Approximate Q-Learning Example: Q-Pacman Q-learning with linear Q-functions: Exact Q s Approximate Q s Intuitive interpretation: Adjust weights of active features E.g., if something unexpectedly bad happens, blame the features that were on: dispreferall states with that state s features Formal justification: online least squares [Demo: approximate Q- learning pacman (L11D1)] 3

4 Video of Demo Approximate Q-Learning -- Pacman Q-Learning and Least Squares Linear Approximation: Regression* Optimization: Least Squares* Observation Error or residual Prediction Prediction: Prediction: 2 Minimizing Error* Overfitting: Why Limiting Capacity Can Help* Imagine we had only one point x, with features f(x), target value y, and weights w: Degree 15 polynomial Approximate q update explained: -5 target prediction

5 Problem: often the feature-based policies that work well (win games, maximize utilities) aren t the ones that approximate V / Q best E.g. your value functions from project 2 were probably horrible estimates of future rewards, but they still produced good decisions Q-learning s priority: get Q-values close (modeling) Action selection priority: get ordering of Q-values right (prediction) Solution: learn policies that maximize rewards, not the values that predict them Policy search: start with an ok solution (e.g. Q-learning) then fine-tune by hill climbing on feature weights Simplest policy search: Start with an initial linear value function or Q-function Nudge each feature weight up and down and see if your policy is better than before Problems: How do we tell the policy got better? Need to run many sample episodes! If there are a lot of features, this can be impractical Better methods exploit lookahead structure, sample wisely, change multiple parameters [Andrew Ng] PILCO (Probabilistic Inference for Learning Control) Model-based policy search to minimize given cost function Policy: mapping from state to control Rollout: plan using current policy and GP dynamics model Policy parameter update via CG/BFGS Highly data efficient [Video: HELICOPTER] Demo: Standard Benchmark Problem Swing pendulum up and balance in inverted position Learn nonlinear control from scratch 4D state space, 3 controller parameters 7 trials/17.5 sec experience Control freq.: 1 Hz [Deisenroth-etal, ICML-11, RSS-11, ICRA-14, PAMI-14] 5

6 Controlling a Low-Cost Robotic Manipulator Playing Atari with Deep Reinforcement Learning Volodymyr Mnih Koray Kavukcuoglu David Silver Daan Wierstra Alex Graves Ioannis Antonoglou Martin Riedmiller DeepMind Technologies Low-cost system (\$5 for robot arm and Kinect) Very noisy No sensor information about robot s joint configuration used Goal: Learn to stack tower of 5 blocks from scratch Kinect camera for tracking block in end-effector State: coordinates (3D) of block center (from Kinect camera) 4 controlled DoF 2 learning trials for stacking 5 blocks (5 seconds long each) Account for system noise, e.g., deepmind.com Abstract We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 26 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them. 1 Introduction Learning to control agents directly from high-dimensional sensory inputs like vision and speech is one of the long-standing challenges of reinforcement learning (RL). Most successful RL applications that operate on these domains have relied on hand-crafted features combined with linear value functions or policy representations. Clearly, the performance of such systems heavily relies on the quality of the feature representation. Recent advances in deep learning have made it possible to extract high-level features from raw sensory data, leading to breakthroughs in computer vision [11, 22, 16] and speech recognition [6, 7]. These methods utilise a range of neural network architectures, including convolutional networks, multilayer perceptrons, restricted Boltzmann machines and recurrent neural networks, and have exploited both supervised and unsupervised learning. It seems natural to ask whether similar techniques could also be beneficial for RL with sensory data. Robot arm Image processing However reinforcement learning presents several challenges from a deep learning perspective. Firstly, most successful deep learning applications to date have required large amounts of handlabelled training data. RL algorithms, on the other hand, must be able to learn from a scalar reward signal that is frequently sparse, noisy and delayed. The delay between actions and resulting rewards, which can be thousands of timesteps long, seems particularly daunting when compared to the direct association between inputs and targets found in supervised learning. Another issue is that most deep learning algorithms assume the data samples to be independent, while in reinforcement learning one typically encounters sequences of highly correlated states. Furthermore, in RL the data distribution changes as the algorithm learns new behaviours, which can be problematic for deep learning methods that assume a fixed underlying distribution. This paper demonstrates that a convolutional neural network can overcome these challenges to learn successful control policies from raw video data in complex RL environments. The network is trained with a variant of the Q-learning [26] algorithm, with stochastic gradient descent to update the weights. To alleviate the problems of correlated data and non-stationary distributions, we use 1 Deepmind AI Playing Atari That s all for Reinforcement Learning! Data (experiences with environment) Reinforcement Learning Agent Policy (how to act in the future) Very tough problem: How to perform any task well in an unknown, noisy environment! Traditionally used mostly for robotics, but becoming more widely used Lots of open research areas: How to best balance exploration and exploitation? How to deal with cases where we don t know a good state/feature representation? Conclusion We re done with Part I: Search and Planning! We ve seen how AI methods can solve problems in: Search Constraint Satisfaction Problems Games Markov Decision Problems Reinforcement Learning Next up: Part II: Uncertainty and Learning! 6

### ICRA 2012 Tutorial on Reinforcement Learning 4. Value Function Methods

ICRA 2012 Tutorial on Reinforcement Learning 4. Value Function Methods Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt A Reinforcement Learning Ontology Prior Knowledge Data { (x t, u t, x t+1, r t )

### Deep reinforcement learning

Deep reinforcement learning Function approximation So far, we ve assumed a lookup table representation for utility function U(s) or actionutility function Q(s,a) This does not work if the state space is

### Title Comparison between different Reinforcement Learning algorithms on Open AI Gym environment (CartPole-v0)

Title Comparison between different Reinforcement Learning algorithms on Open AI Gym environment (CartPole-v0) Author: KIM Zi Won Date: 2017. 11. 24. Table of Contents 1. Introduction... 2 (1) Q-Learning...

### Deep Reinforcement Learning for Flappy Bird Kevin Chen

Deep Reinforcement Learning for Flappy Bird Kevin Chen Abstract Reinforcement learning is essential for applications where there is no single correct way to solve a problem. In this project, we show that

### Reinforcement learning (Chapter 21)

Reinforcement learning (Chapter 21) Reinforcement learning Regular MDP Given: Transition model P(s s, a) Reward function R(s) Find: Policy π(s) Reinforcement learning Transition model and reward function

### Deep Reinforcement Learning CS

Deep Reinforcement Learning CS 294-112 Course logistics Class Information & Resources Sergey Levine Assistant Professor UC Berkeley Abhishek Gupta PhD Student UC Berkeley Josh Achiam PhD Student UC Berkeley

### 11. Reinforcement Learning

Artificial Intelligence 11. Reinforcement Learning prof. dr. sc. Bojana Dalbelo Bašić doc. dr. sc. Jan Šnajder University of Zagreb Faculty of Electrical Engineering and Computing (FER) Academic Year 2015/2016

### In-depth: Deep learning (one lecture) Applied to both SL and RL above Code examples

Introduction to machine learning (two lectures) Supervised learning Reinforcement learning (lab) In-depth: Deep learning (one lecture) Applied to both SL and RL above Code examples 2017-09-30 2 1 To enable

### Pixel to Pinball: Using Deep Q Learning to Play Atari

Pixel to Pinball: Using Deep Q Learning to Play Atari Adam Rosenberg School of Engineering and Applied Science University of Virginia Charlottesville, Virginia 22904 Email: ahr7ee@virginia.edu Gautam Somappa

### Reinforcement Learning

Reinforcement Learning LU 1 - Introduction Dr. Joschka Bödecker AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg jboedeck@informatik.uni-freiburg.de Acknowledgement

### Reinforcement Learning with Randomization, Memory, and Prediction

Reinforcement Learning with Randomization, Memory, and Prediction Radford M. Neal, University of Toronto Dept. of Statistical Sciences and Dept. of Computer Science http://www.cs.utoronto.ca/ radford CRM

### Reinforcement Learning. CS 188: Artificial Intelligence Fall Example: Backgammon. Example: Animal Learning. Example: Direct Estimation

CS 188: Artificial Intelligence Fall 8 Lecture 11: Reinforcement Learning 1/2/8 Reinforcement Learning Reinforcement learning: Still have an MDP: A et of tate S A et of action (per tate) A A model T(,a,

### Advanced Imitation Learning Challenges and Open Problems. CS : Deep Reinforcement Learning Sergey Levine

Advanced Imitation Learning Challenges and Open Problems CS 294-112: Deep Reinforcement Learning Sergey Levine Imitation Learning training data supervised learning Reinforcement Learning Imitation vs.

### Lecture 29: Artificial Intelligence

Lecture 29: Artificial Intelligence Marvin Zhang 08/10/2016 Some slides are adapted from CS 188 (Artificial Intelligence) Announcements Roadmap Introduction Functions Data Mutability Objects This week

### Classification with Deep Belief Networks. HussamHebbo Jae Won Kim

Classification with Deep Belief Networks HussamHebbo Jae Won Kim Table of Contents Introduction... 3 Neural Networks... 3 Perceptron... 3 Backpropagation... 4 Deep Belief Networks (RBM, Sigmoid Belief

### Reinforcement Learning

Reinforcement Learning CITS3001 Algorithms, Agents and Artificial Intelligence Tim French School of Computer Science and Software Engineering The University of Western Australia 2017, Semester 2 Introduc)on

### Exploration. CS : Deep Reinforcement Learning Sergey Levine

Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?

### Artificial Neural Networks. Andreas Robinson 12/19/2012

Artificial Neural Networks Andreas Robinson 12/19/2012 Introduction Artificial Neural Networks Machine learning technique Learning from past experience/data Predicting/classifying novel data Biologically

### Fundamentals of Reinforcement Learning

Fundamentals of Reinforcement Learning December 9, 2013 - Techniques of AI Yann-Michaël De Hauwere - ydehauwe@vub.ac.be December 9, 2013 - Techniques of AI Course material Slides online T. Mitchell Machine

### Deep Neural Networks for Acoustic Modelling. Bajibabu Bollepalli Hieu Nguyen Rakshith Shetty Pieter Smit (Mentor)

Deep Neural Networks for Acoustic Modelling Bajibabu Bollepalli Hieu Nguyen Rakshith Shetty Pieter Smit (Mentor) Introduction Automatic speech recognition Speech signal Feature Extraction Acoustic Modelling

### Computational Science and Engineering (Int. Master s Program) Deep Reinforcement Learning for Superhuman Performance in Doom

Computational Science and Engineering (Int. Master s Program) Technische Universität München Master s Thesis Deep Reinforcement Learning for Superhuman Performance in Doom Ivan Rodríguez Computational

### Mitigating Catastrophic Forgetting in Temporal Difference Learning with Function Approximation

Mitigating Catastrophic Forgetting in Temporal Difference Learning with Function Approximation Benjamin Goodrich Department of Electrical Engineering and Computer Science University of Tennessee Knoxville,

### Asynchronous & Parallel Algorithms. Sergey Levine UC Berkeley

Asynchronous & Parallel Algorithms Sergey Levine UC Berkeley Overview 1. We learned about a number of policy search methods 2. These algorithms have all been sequential 3. Is there a natural way to parallelize

### Reinforcement Learning with Deep Architectures

000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

### Lecture 3.1. Reinforcement Learning. Slide 0 Jonathan Shapiro Department of Computer Science, University of Manchester.

Lecture 3.1 Rinforcement Learning Slide 0 Jonathan Shapiro Department of Computer Science, University of Manchester February 4, 2003 References: Reinforcement Learning Slide 1 Reinforcement Learning: An

### Reinforcement Learning

Reinforcement Learning Lecture 1: Introduction Vien Ngo MLR, University of Stuttgart What is Reinforcement Learning? Reinforcement Learning is a subfield of Machine Learning from David Silver s lecture

### Introduction to Deep Learning

Introduction to Deep Learning M S Ram Dept. of Computer Science & Engg. Indian Institute of Technology Kanpur Reading of Chap. 1 from Learning Deep Architectures for AI ; Yoshua Bengio; FTML Vol. 2, No.

### ADVERSARIAL ATTACKS ON NEURAL NETWORK POLICIES ABSTRACT 1 INTRODUCTION 2 RELATED WORK. Workshop track - ICLR 2017

Workshop track - ICLR 217 ADVERSARIAL ATTACKS ON NEURAL NETWORK POLICIES Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, Pieter Abbeel University of California, Berkeley, Department of Electrical

### Machine Learning (Decision Trees and Intro to Neural Nets) CSCI 3202, Fall 2010

Machine Learning (Decision Trees and Intro to Neural Nets) CSCI 3202, Fall 2010 Assignments To read this week: Chapter 18, sections 1-4 and 7 Problem Set 3 due next week! Learning a Decision Tree We look

### CMU e Real Life Reinforcement Learning

CMU 15-889e Real Life Reinforcement Learning Emma Brunskill Fall 2015 Class Logistics Instructor: Emma Brunskill TA: Christoph Dann Time: Monday/Wednesday 1:30-2:50pm Website: http://www.cs.cmu.edu/~ebrun/15889e/index.

### Hierarchical Bayesian Methods for Reinforcement Learning

Hierarchical Bayesian Methods for Reinforcement Learning David Wingate wingated@mit.edu Joint work with Noah Goodman, Dan Roy, Leslie Kaelbling and Joshua Tenenbaum My Research: Agents Rich sensory data

### Deep Learning for AI Yoshua Bengio. August 28th, DS3 Data Science Summer School

Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School A new revolution seems to be in the work after the industrial revolution. And Machine Learning, especially Deep Learning,

### Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

### Reinforcement Learning

Reinforcement Learning Introduction Daniel Hennes 17.04.2017 University Stuttgart - IPVS - Machine Learning & Robotics 1 What is reinforcement learning? General-purpose framework for decision-making Autonomous

### arxiv: v1 [cs.ai] 15 Sep 2017

Deep Reinforcement Learning for Conversational AI Mahipal Jadeja mahipaljadeja5@gmail.com Neelanshi Varia neelanshiv2@gmail.com Agam Shah shahagam4@gmail.com arxiv:1709.05067v1 [cs.ai] 15 Sep 2017 ABSTRACT

### Recommender Systems. Sargur N. Srihari

Recommender Systems Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Recommender Systems Types of Recommender

### Meta-Learning. CS : Deep Reinforcement Learning Sergey Levine

Meta-Learning CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Two weeks until the project milestone! 2. Guest lectures start next week, be sure to attend! 3. Today: part 1: meta-learning

### Reinforcement Learning or, Learning and Planning with Markov Decision Processes

Reinforcement Learning or, Learning and Planning with Markov Decision Processes 295 Seminar, Winter 2018 Rina Dechter Slides will follow David Silver s, and Sutton s book Goals: To learn together the basics

### Brief Overview of Adaptive and Learning Control

1.10.2007 Outline Introduction Outline Introduction Introduction Outline Introduction Introduction Definition of Adaptive Control Definition of Adaptive Control Zames (reported by Dumont&Huzmezan): A non-adaptive

### Accelerating the Power of Deep Learning With Neural Networks and GPUs

Accelerating the Power of Deep Learning With Neural Networks and GPUs AI goes beyond image recognition. Abstract Deep learning using neural networks and graphics processing units (GPUs) is starting to

### Introduction to Reinforcement Learning. MAL Seminar

Introduction to Reinforcement Learning MAL Seminar 2013-2014 RL Background Learning by interacting with the environment Reward good behavior, punish bad behavior Combines ideas from psychology and control

### Learning Agents: Introduction

Learning Agents: Introduction S Luz luzs@cs.tcd.ie October 28, 2014 Learning in agent architectures Agent Learning in agent architectures Agent Learning in agent architectures Agent perception Learning

### Training artificial neural networks to learn a nondeterministic game

Training artificial neural networks to learn a nondeterministic game Abstract. Thomas E. Portegys DigiPen Institute of Technology 9931 Willows Rd. NE, Redmond, WA, 98052 USA portegys@gmail.com It is well

### Lecture 1: Machine Learning Basics

1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

### Exploration (Part 2) and Transfer Learning. CS : Deep Reinforcement Learning Sergey Levine

Exploration (Part 2) and Transfer Learning CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due today! Last one! Recap: classes of exploration methods in deep RL Optimistic

### Artificial Intelligence with DNN

Artificial Intelligence with DNN Jean-Sylvain Boige Aricie jsboige@aricie.fr Please support our valuable sponsors Summary Introduction to AI What is AI? Agent systems DNN environment A Tour of AI in DNN

### Linear Regression. Chapter Introduction

Chapter 9 Linear Regression 9.1 Introduction In this class, we have looked at a variety of di erent models and learning methods, such as finite state machines, sequence models, and classification methods.

### Reinforcement Learning

Reinforcement Learning based Dialog Manager Speech Group Department of Signal Processing and Acoustics Katri Leino User Interface Group Department of Communications and Networking Aalto University, School

### Article from. Predictive Analytics and Futurism December 2015 Issue 12

Article from Predictive Analytics and Futurism December 2015 Issue 12 The Third Generation of Neural Networks By Jeff Heaton Neural networks are the phoenix of artificial intelligence. Right now neural

### Python Machine Learning

Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

### Reinforcement Learning

Reinforcement Learning Slides from R.S. Sutton and A.G. Barto Reinforcement Learning: An Introduction http://www.cs.ualberta.ca/~sutton/book/the-book.html http://rlai.cs.ualberta.ca/rlai/rlaicourse/rlaicourse.html

### Learning Policies by Imitating Optimal Control. CS : Deep Reinforcement Learning Week 3, Lecture 2 Sergey Levine

Learning Policies by Imitating Optimal Control CS 294-112: Deep Reinforcement Learning Week 3, Lecture 2 Sergey Levine Overview 1. Last time: learning models of system dynamics and using optimal control

### Machine Learning. Outline. Reinforcement learning 2. Defining an RL problem. Solving an RL problem. Miscellaneous. Eric Xing /15

Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Reinforcement learning 2 Eric Xing Lecture 28, April 30, 2008 Reading: Chap. 13, T.M. book Eric Xing 1 Outline Defining an RL problem Markov Decision

### Computer Vision for Card Games

Computer Vision for Card Games Matias Castillo matiasct@stanford.edu Benjamin Goeing bgoeing@stanford.edu Jesper Westell jesperw@stanford.edu Abstract For this project, we designed a computer vision program

### Learning and Planning with Tabular Methods

Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Learning and Planning with Tabular Methods Lecture 6, CMU 10703 Katerina Fragkiadaki What can I learn by interacting with

### CSC321 Lecture 1: Introduction

CSC321 Lecture 1: Introduction Roger Grosse Roger Grosse CSC321 Lecture 1: Introduction 1 / 26 What is machine learning? For many problems, it s difficult to program the correct behavior by hand recognizing

### Programming Social Robots for Human Interaction. Lecture 4: Machine Learning and Pattern Recognition

Programming Social Robots for Human Interaction Lecture 4: Machine Learning and Pattern Recognition Zheng-Hua Tan Dept. of Electronic Systems, Aalborg Univ., Denmark zt@es.aau.dk, http://kom.aau.dk/~zt

### Robot Learning. Denition. Robot Learning Systems

Robot Learning Jan Peters, Max Planck Institute for Biological Cybernetics Russ Tedrake, Massachusetts Institute of Technology Nick Roy, Massachusetts Institute of Technology Jun Morimoto, Advanced Telecommunication

### Machine Learning and Applications in Finance

Machine Learning and Applications in Finance Christian Hesse 1,2,* 1 Autobahn Equity Europe, Global Markets Equity, Deutsche Bank AG, London, UK christian-a.hesse@db.com 2 Department of Computer Science,

### Neural Reinforcement Learning to Swing-up and Balance a Real Pole

Neural Reinforcement Learning to Swing-up and Balance a Real Pole Martin Riedmiller Neuroinformatics Group University of Osnabrueck 49069 Osnabrueck martin.riedmiller@uos.de Abstract This paper proposes

### Multiple scales of task and reward-based learning

Multiple scales of task and reward-based learning Jane Wang Zeb Kurth-Nelson, Sam Ritter, Hubert Soyer, Remi Munos, Charles Blundell, Joel Leibo, Dhruva Tirumala, Dharshan Kumaran, Matt Botvinick NIPS

### Neural Dynamics and Reinforcement Learning

Neural Dynamics and Reinforcement Learning Presented By: Matthew Luciw DFT SUMMER SCHOOL, 2013 IDSIA Istituto Dalle Molle Di Studi sull Intelligenza Artificiale IDSIA Lugano, Switzerland www.idsia.ch Our

### Scheduling Tasks under Constraints CS229 Final Project

Scheduling Tasks under Constraints CS229 Final Project Mike Yu myu3@stanford.edu Dennis Xu dennisx@stanford.edu Kevin Moody kmoody@stanford.edu Abstract The project is based on the principle of unconventional

### Reinforcement Learning

Artificial Intelligence Topic 8 Reinforcement Learning passive learning in a known environment passive learning in unknown environments active learning exploration learning action-value functions generalisation

### Scaling Up RL Using Evolution Strategies. Tim Salimans, Jonathan Ho, Peter Chen, Szymon Sidor, Ilya Sutskever

Scaling Up RL Using Evolution Strategies Tim Salimans, Jonathan Ho, Peter Chen, Szymon Sidor, Ilya Sutskever Reinforcement Learning = AI? Definition of RL broad enough to capture all that is needed for

### Reinforcement Learning

Reinforcement learning is learning what to do--how to map situations to actions--so as to maximize a numerical reward signal Sutton & Barto, Reinforcement learning, 1998. Reinforcement learning is learning

### Topics in Theoretical CS: Bandits, Experts, and Games

Topics in Theoretical CS: Bandits, Experts, and Games CMSC 858G Fall 2016 University of Maryland Alex Slivkins Microsoft Research NYC What the course is about? algorithms for making sequential decisions

### Lecture 6: Course Project Introduction and Deep Learning Preliminaries

CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 6: Course Project Introduction and Deep Learning Preliminaries Outline for Today Course projects What

### Exploration Methods for Connectionist Q-Learning in Bomberman

Exploration Methods for Connectionist Q-Learning in Bomberman Joseph Groot Kormelink 1, Madalina M. Drugan 2 and Marco A. Wiering 1 1 Institute of Artificial Intelligence and Cognitive Engineering, University

### Intelligent Tutoring Systems using Reinforcement Learning to teach Autistic Students

Intelligent Tutoring Systems using Reinforcement Learning to teach Autistic Students B. H. Sreenivasa Sarma 1 and B. Ravindran 2 Department of Computer Science and Engineering, Indian Institute of Technology

### Introduction to Machine Learning Reykjavík University Spring Instructor: Dan Lizotte

Introduction to Machine Learning Reykjavík University Spring 2007 Instructor: Dan Lizotte Logistics To contact Dan: dlizotte@cs.ualberta.ca http://www.cs.ualberta.ca/~dlizotte/teaching/ Books: Introduction

### Reinforcement Learning in Continuous Environments

Reinforcement Learning in Continuous Environments 64.425 Integrated Seminar: Intelligent Robotics Oke Martensen University of Hamburg Faculty of Mathematics, Informatics and Natural Sciences Technical

### Vision-Based Reinforcement Learning Using A Consolidated Actor-Critic Model

University of Tennessee, Knoxville Trace: Tennessee Research and Creative Exchange Masters Theses Graduate School 12-2009 Vision-Based Reinforcement Learning Using A Consolidated Actor-Critic Model Christopher

### Introduction to Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Outline 1. Course Logistics 2. What is Reinforcement Learning? 3. Influences of Reinforcement Learning 4. Agent-Environment Framework 5.

### CS534 Machine Learning

CS534 Machine Learning Spring 2013 Lecture 1: Introduction to ML Course logistics Reading: The discipline of Machine learning by Tom Mitchell Course Information Instructor: Dr. Xiaoli Fern Kec 3073, xfern@eecs.oregonstate.edu

### On June 15, 2017, we hosted an after-work event dedicated to «Artificial Intelligence The Technology of the Future.

On June 15, 2017, we hosted an after-work event dedicated to «Artificial Intelligence The Technology of the Future. We do realize that sometimes the terminology and key concepts around AI are hard to understand

### Machine Learning y Deep Learning con MATLAB

Machine Learning y Deep Learning con MATLAB Lucas García 2015 The MathWorks, Inc. 1 Deep Learning is Everywhere & MATLAB framework makes Deep Learning Easy and Accessible 2 Deep Learning is Everywhere

### P(A, B) = P(A B) = P(A) + P(B) - P(A B)

AND Probability P(A, B) = P(A B) = P(A) + P(B) - P(A B) P(A B) = P(A) + P(B) - P(A B) Area = Probability of Event AND Probability P(A, B) = P(A B) = P(A) + P(B) - P(A B) If, and only if, A and B are independent,

### THE DESIGN OF A LEARNING SYSTEM Lecture 2

THE DESIGN OF A LEARNING SYSTEM Lecture 2 Challenge: Design a Learning System for Checkers What training experience should the system have? A design choice with great impact on the outcome Choice #1: Direct

### Reinforcement Learning

Reinforcement Learning Maria-Florina Balcan Carnegie Mellon University April 20, 2015 Today: Learning of control policies Markov Decision Processes Temporal difference learning Q learning Readings: Mitchell,

### Trust Region Policy Optimization

Trust Region Policy Optimization TINGWU WANG MACHINE LEARNING GROUP, UNIVERSITY OF TORONTO Contents 1. Introduction 1. Problem Domain: Locomotion 2. Related Work 2. TRPO Step-by-step 1. The Preliminaries

### CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh February 28, 2017

CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh February 28, 2017 HW2 due Thursday Announcements Office hours on Thursday: 4:15pm-5:45pm Talk at 3pm: http://www.sam.pitt.edu/arc-

### Deep QWOP Learning. Hung-Wei Wu

Deep QWOP Learning Hung-Wei Wu Submitted under the supervision of Maria Gini and James Parker to the University Honors Program at the University of Minnesota-Twin Cities in partial fulfillment of the requirements

### A Reinforcement Learning Approach for the Dynamic Container Relocation Problem

A Reinforcement Learning Approach for the Dynamic Container Relocation Problem Paul Alexandru Bucur Philipp Hungerländer July 21, 2017 Abstract Given an initial configuration of a container bay and an

### Deep Learning Introduction

Deep Learning Introduction Christian Szegedy Geoffrey Irving Google Research Machine Learning Supervised Learning Task Assume Ground truth G Model architecture f Prediction metric σ Training samples Find

### What is wrong with apps and web models? Conversation as an emerging paradigm for mobile UI Bots as intelligent conversational interface agents

What is wrong with apps and web models? Conversation as an emerging paradigm for mobile UI Bots as intelligent conversational interface agents Major types of conversational bots: ChatBots (e.g. XiaoIce)

### CPSC 340: Machine Learning and Data Mining. Course Review/Preview Fall 2015

CPSC 340: Machine Learning and Data Mining Course Review/Preview Fall 2015 Admin Assignment 6 due now. We will have office hours as usual next week. Final exam details: December 15: 8:30-11 (WESB 100).

### 20.3 The EM algorithm

20.3 The EM algorithm Many real-world problems have hidden (latent) variables, which are not observable in the data that are available for learning Including a latent variable into a Bayesian network may

### arxiv: v3 [cs.lg] 9 Mar 2014

Learning Factored Representations in a Deep Mixture of Experts arxiv:1312.4314v3 [cs.lg] 9 Mar 2014 David Eigen 1,2 Marc Aurelio Ranzato 1 Ilya Sutskever 1 1 Google, Inc. 2 Dept. of Computer Science, Courant

### Linear Models Continued: Perceptron & Logistic Regression

Linear Models Continued: Perceptron & Logistic Regression CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Linear Models for Classification Feature function

### 18 LEARNING FROM EXAMPLES

18 LEARNING FROM EXAMPLES An intelligent agent may have to learn, for instance, the following components: A direct mapping from conditions on the current state to actions A means to infer relevant properties

### Welcome to CMPS 142 and 242: Machine Learning

Welcome to CMPS 142 and 242: Machine Learning Instructor: David Helmbold, dph@soe.ucsc.edu Office hours: Monday 1:30-2:30, Thursday 4:15-5:00 TA: Aaron Michelony, amichelo@soe.ucsc.edu Web page: www.soe.ucsc.edu/classes/cmps242/fall13/01

### M. R. Ahmadzadeh Isfahan University of Technology. M. R. Ahmadzadeh Isfahan University of Technology

1 2 M. R. Ahmadzadeh Isfahan University of Technology Ahmadzadeh@cc.iut.ac.ir M. R. Ahmadzadeh Isfahan University of Technology Textbooks 3 Introduction to Machine Learning - Ethem Alpaydin Pattern Recognition

### Deep Reinforcement Learning using Memory-based Approaches

Deep Reinforcement Learning using Memory-based Approaches Manish Pandey Synopsys, Inc. 690 Middlefield Rd., Mountain View mpandey2@stanford.edu Dai Shen Stanford University 450 Serra Mall, Stanford dai2@stanford.edu

### CS 242 Final Project: Reinforcement Learning. Albert Robinson May 7, 2002

CS 242 Final Project: Reinforcement Learning Albert Robinson May 7, 2002 Introduction Reinforcement learning is an area of machine learning in which an agent learns by interacting with its environment.

### Automated Curriculum Learning for Neural Networks

Automated Curriculum Learning for Neural Networks Alex Graves, Marc G. Bellemare, Jacob Menick, Remi Munos, Koray Kavukcuoglu DeepMind ICML 2017 Presenter: Jack Lanchantin Alex Graves, Marc G. Bellemare,

### Introduction to Reinforcement Learning

Introduction to Reinforcement Learning sequential decision making under uncertainty? How Can I...? Move around in the physical world (e.g. driving, navigation) Play and win a game Retrieve information

### Intro to Reinforcement Learning. Part 2: Ideas and Examples

Intro to Reinforcement Learning Part 2: Ideas and Examples Psychology Artificial Intelligence Reinforcement Learning Neuroscience Control Theory Reinforcement learning The engineering endeavor most closely