Reinforcement Learning of Artificial Intelligence B659. Class meets Tu & Thur 2:30pm - 3:45pm in BH 330

Size: px
Start display at page:

Download "Reinforcement Learning of Artificial Intelligence B659. Class meets Tu & Thur 2:30pm - 3:45pm in BH 330"

Transcription

1 Reinforcement Learning of Artificial Intelligence B659 Class meets Tu & Thur 2:30pm 3:45pm in BH 330 Course webpage on canvas: schedule, slides, assignment submission, info about projects (later) Instructor: Adam White, you can find me in Lindley 201I, website: adamwhite.ca, contact me via , not canvas Your AI s are Matt and Su: contact info on canvas

2 Text & other useful resources Text: Reinforcement Learning An Introduction (1998) The second edition on Sutton and Barto: Minor rearrangement of topics Changes of notation new topics free online: we will use this exclusively! Csaba Szepesvari s book: Algorithms for reinforcement learning (2010) More theory Few additional topics covered Free online:

3 Grading 50% from 5 assignments, mostly questions from Sutton & Barto, and one or more programming questions code framework will be provided 5% short midterm quiz. Gives you an idea of how well you are tracking the course contents See what type of questions I will use on the final 35% from final project report or final exam. PhD students and <MS students with permission> can do a project rest must do a final

4 Thought questions 10% of your mark. The idea is to show you have read and thought about the reading material Must ask a question! Must provide at least one possible answer! Answers cannot be found in textbook or lecture slides You are showing me that you have read the text!!

5 Thought questions/statements Good example: Setting parameters (e.g., learning rate) in ML often involves cross validation and a testing/training split, however, in RL data is produced interactively. This seems much more challenging in RL. What is a fair way compare learning algorithms in RL? <QUESTION> One idea is to discretize the parameter set, for each algorithm, and report the performance with the best parameter setting <ANSWER> Another idea would be to use some metalearning algorithm to automatically tune the parameters of each method. <ANSWER>

6 Thought questions/statements If you submit thought questions in the correct form, i.e., question with an answer you get half the marks To get more than half marks you have to ask a good question. Bad questions I don t understand Sarsa, can you explain it again? There is a typo on page 7 Chapter 3 unrealistically assume access to the model of the MDP. I think we should skip this chapter How does reinforcement learning avoid overfitting? <No answer>

7 Assignments Assignments are one of the best ways to learn about RL All assignments will be individual work. You can talk to your friends but only at the ideas level, no details, no writing on the white board etc You cannot share written answers or code You cannot submit code you found online and modified, you must write your own from scratch You cannot use ML packages like RLtoolkit, python ML packages, tensor flow, etc All programming will be done in C

8 Academic integrity If you are caught cheating, copying, working together, plagiarizing: you will be reported to the university you may get a zero on the assignment/exam/project you may get a fail in the course you may be expelled We have problems with this every single year. Last year people plagiarized and copied assignments. Don t let it happen to you!!

9 What is artificial intelligence? Get out some paper and write down a definition What is machine learning? How do they differ?

10 What is artificial intelligence? Intelligence is the most powerful phenomena in the universe Ray Kurzweil The phenomena is that there are systems in the universe that are well thought of as goal seeking systems A science of mind Sutton When people finally come to understand the principles of intelligence what it is and how it works well enough to design and create beings as intelligent as ourselves It is the science and engineering of making intelligent machines, especially intelligent computer programs. Intelligence is the computational part of the ability to achieve goals in the world. John McCarthy

11 What is machine learning? A branch of computational statistics, with a specific focus on efficiency and scalability me Machine learning is the subfield of computer science that "gives computers the ability to learn without being explicitly programmed" Arthur Samuel Often AI and ML are interchanged One way to keep it simple is AI defines a problem, a research goal and ML defines a set of tools and a computational perspective these tools can be applied to a variety of applications and can be used to help understand and replicate the principles of human intelligence

12 What is reinforcement learning?

13 Goals and ambitions for the Course Learn the methods and foundational ideas of RL Outcome: prepared to apply RL to a novel application Outcome: prepared to do research in RL Learn some new ways of thinking about AI research The agent perspective interaction between learning and decision making experience as an unending stream temporally correlated actions and agent lives a life

14 Things we will not get into Deep reinforcement learning Neural networks Nonlinear representations of state Large applications of RL When we are done, you will be able to learn these topics on your own This course will help demystify modern RL

15 What is Reinforcement Learning? An approach to Artificial Intelligence it is a problem specification, a class of methods, and a field of study Learning from interaction Goaloriented learning Learning about, from, and while interacting with an external & unknown environment Learning what to do how to map situations to actions so as to maximize a numerical reward signal

16 Key Features of RL Learner is not told which actions to take no teacher, no labels (not supervised learning) TrialandError search (learn by doing) Possibility of delayed reward Sacrifice shortterm gains for greater longterm gains e.g., games like backgammon, most interesting problems The need to tradeoff: explore and exploit Considers the whole problem of a goaldirected agent interacting with an uncertain environment

17 Supervised, Unsupervised, & RL In SL the system is told what the correct response should have been: given a set of labelled training examples (teach provides labels) objective is to do well on unlabelled examples (generalize well to new examples) Unsupervised learning is about learning/uncovering the structure of unlabelled examples: e.g., finding a lower dimensional representation of the data this certainly could be useful to an RL agent, but does not address the key problem of maximizing reward

18 RL is influenced by and is influencing Computer Science Engineering Mathematics Optimal Control Operations Research Machine Learning Reinforcement Learning Bounded Rationality Reward System Classical/Operant Conditioning Neuroscience Psychology Economics David Silver

19 Agentenvironment interaction stream Interaction produces a temporal stream of data Continual learning, acting, and planning Object is to affect the environment Environment is stochastic and uncertain Environment Agent State, Stimulus, Situation Reward, Gain, Payoff, Cost Environment (world) Action, Response, Control

20 Example: Hajime Kimura s RL Robots (slide from Sutton) Backward New Robot, Same algorithm Before After

21 RL + Deep Learing Performance on Atari Games Space Invaders Breakout Enduro

22 RL + Deep Learning, applied to Classic Atari Games Google Deepmind 2015, Bowling et al Learned to play 49 games for the Atari 2600 game console, without labels RESEARCH or LETTER human input, from selfplay and the score alone Convolution Convolution Fully connected Fully connected No input mapping raw screen pixels to predictions of final score for each of 18 joystick actions Figure 1 Schematic illustration of the convolutional neural network. The details of the architecture are explained in the Methods. The input to the neural network consists of an image produced by the preprocessing by a rectifier nonlinearity (that is, maxð0,xþ). Learned to play better than all previous algorithms map w, followed by three convolutional layers (note: snaking blue line and at human level for more than half the games symbolizes sliding of each filter across input image) and two fully connected layers with a single output for each valid action. Each hidden layer is followed Same learning algorithm applied to all 49 games! w/o human tuning

23 Classic Examples of Reinforcement Learning Elevator Control Crites & Barto (Probably) world's best downpeak elevator controller Helicopter control Ng & Abbel ( watch?v=vcdxqn0fcne#t=22) Can perform maneuvers that no human operator can; modelbased RL TDGammon and Jellyfish Tesauro, Dahl World's best backgammon player

24 More Examples of RL Robot learning to walk Schuitema ( v=sbf5efeiw) 00:45 Uses methods you will learn about in this class Octopus arm simulator Engel ( icml07_engel_demo/) 9:00, 10:55 Bayesian Temporal difference learning Keepaway Soccer Sutton & Stone Higher dimensional control using RL and tile coding Go, Hearts, other games

25 Elements of RL Policy: what to do, mapping from situations to actions Reward: what is good. Immediate Value: what is good because it predicts reward. Longer term Model: what follows what

26 Rewards Single scalar number Provided by you the designer. Not really that hard Agent seeks to maximize total future reward nextstep reward optimization is usually suboptimal The agent cannot change how rewards are generated, they are outside the agent s control In general rewards are stochastic functions of the state of the environment A good example of reward is physical pain and pleasure (positive and negative reward)

27 Value functions The value of a state specifies how good the state is in terms of future total reward The value function predicts longterm reward Reward tells us immediate goodness of moving from one state to another Value specify longterm desirability of states taking into account the states that usually follow We might incur short term negative reward to achieve higher value in the long term The environment produces rewards in response to the agent s actions The agent constructs an estimate of the value function to help select actions the yield high longterm reward

28 Policies A mapping from perceived states to actions Encoding how the agent behaves over time Policies may be stochastic They may be represented as a table, or involve some complex optimization

29 Models A model mimics the environment A model might predict the next state and reward, given the current state and action a prediction of what would happen in the environment if the agent took an action in some state. A simulation Models are often used for planning deciding a course of action by, for example, simulating different courses of action and choosing among them In this course we will consider an approach to planning that is stronglyconnected with value function learning: very different from the classic approaches considered in GOFAI

30 RLtoolkit demo

31 Limitations of course contents The methods we will study assume their exists some state, with certain properties almost all RL theory assumes this the algorithms work very well when we only have limited or incomplete access to state Most methods covered will estimate valuefunctions large chunk of modern RL other non valuefunction methods like evolution methods are possible but outside the scope we will cover policy gradient methods that learn a parameterized policy, and a value function

32 An Extended Example: TicTacToe Two player, turn based game Win if three in a row Assume draw and loss are equally bad We assume we do not know anything about the opponent s strategy, and he/she is not perfect Instead learn from experience generated from playing many games Can we build an agent to exploit our opponent? Maximize the chance of winning?

33 Possible solutions Specify or learn a model of our opponent is their strategy stationary? Evolutionary search search the space of policies maintain a population fitness measured via prob of winning Reinforcement learning we are player X

34 One way to approach this as an RL task Create a value function table of numbers for state of the game prob of winning from each state takes into account what policy does in the future all states with 3 X s in a row have value 1.0 all states with 3 O s in a row have value 0.0 all draw states have value 0.0 initially set rest to 0.5 The policy to select a move, examine next possible states from current one (look ahead) most of the time pick one with largest value greedy, or exploitive occasionally pick randomly: exploratory

35 x x x o o x x o o o o x o o x x x o o An RL Approach to TicTacToe 1. Make a table with one entry per state: State x V(s) estimated probability of winning.5?.5? 1 win loss 0 draw Just pick the next state with the highest estimated prob. of winning the largest V(s); a greedy move. 2. Now play lots of games. But 10% of the time pick a move at random; an exploratory move. To pick our moves, look ahead one step: current state * various possible next states

36 RL Learning Rule for TicTacToe Opponent's Move { Our Move { Opponent's Move { Our Move { Opponent's Move { Our Move { * e' Starting Position a c c * b d e f g g* Exploratory move s s ʹ the state before our greedy move the state after our greedy move We increment each V(s) toward V( s ʹ ) a backup : V(s) V (s) + α[ V( s ʹ ) V (s)] a small positive fraction, e.g., α =.1 the step size parameter

37 Learning rule If we reduce the stepsize over time, this approach converges, giving the optimal moves for a fixed opponent If we keep the stepsize small but constant this approach tracks and plays well against opponents that change their strategy over time Where is the reward?

38 Attributes of this simple task Learn while interacting learning affects how we play, which requires learning the values of the new policy, which changes how we play, which Clear goal Delayed consequences of action Sophisticated behavior without a model of opponent or search over action sequences Just a value function + onestep model RL methods can be applied when no model is available In the beginning the agent didn t know anything about it s action consequences how could we inject prior knowledge?

39 How can we improve this T.T.T. player? Do we need random moves? Why? Do we always need a full 10%? Can we learn from random moves? Can we learn offline? Pretraining from self play? Using learned models of opponent?...

40 What would happen if we learned from exploratory moves? Not doing so: we learn the probability of winning under optimal play prob of win from current state if we choose some action then then played optimally from then on Learning from exploratory moves: we learn the probability of winning under the policy that includes exploration the estimates take into account that we sometimes explore this will likely result in different moves (sometimes safer moves) If we continue to explore forever, this second approach may end up being better: winning more games

41 How is TicTacToe Too Easy? Finite, small number of states Backgammon for example has Go has more unique configurations than atoms in the universe! Onestep lookahead is always possible in TTT State completely observable...

42 The Course Part I: The Problem Introduction Evaluative Feedback The Reinforcement Learning Problem Part II: Elementary Solution Methods Dynamic Programming Monte Carlo Methods Temporal Difference Learning Part III: A Unified View Eligibility Traces Generalization and Function Approximation Planning and Learning Advanced topics (see canvas) RL in psychology and animal learning Case Studies

43 Next Class Thursday: Read Chapter 2 of Sutton & Barto (2016), you can skip 2.7 I find the history section at the end particularly interesting! 2 thought questions about chapter 1 & 2 are due Monday 16th (the day before that way we can discuss them in class Assignment #1 will be released today. We will talk about it next time

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Exploration. CS : Deep Reinforcement Learning Sergey Levine Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?

More information

Axiom 2013 Team Description Paper

Axiom 2013 Team Description Paper Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association

More information

TD(λ) and Q-Learning Based Ludo Players

TD(λ) and Q-Learning Based Ludo Players TD(λ) and Q-Learning Based Ludo Players Majed Alhajry, Faisal Alvi, Member, IEEE and Moataz Ahmed Abstract Reinforcement learning is a popular machine learning technique whose inherent self-learning ability

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Reinforcement Learning by Comparing Immediate Reward

Reinforcement Learning by Comparing Immediate Reward Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Laboratorio di Intelligenza Artificiale e Robotica

Laboratorio di Intelligenza Artificiale e Robotica Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning

More information

Lecture 10: Reinforcement Learning

Lecture 10: Reinforcement Learning Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation

More information

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Proceedings of 28 ISFA 28 International Symposium on Flexible Automation Atlanta, GA, USA June 23-26, 28 ISFA28U_12 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Amit Gil, Helman Stern, Yael Edan, and

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Lecture 6: Applications

Lecture 6: Applications Lecture 6: Applications Michael L. Littman Rutgers University Department of Computer Science Rutgers Laboratory for Real-Life Reinforcement Learning What is RL? Branch of machine learning concerned with

More information

Georgetown University at TREC 2017 Dynamic Domain Track

Georgetown University at TREC 2017 Dynamic Domain Track Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain

More information

Laboratorio di Intelligenza Artificiale e Robotica

Laboratorio di Intelligenza Artificiale e Robotica Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning

More information

High-level Reinforcement Learning in Strategy Games

High-level Reinforcement Learning in Strategy Games High-level Reinforcement Learning in Strategy Games Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu Guy Shani Department of Computer

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

CSL465/603 - Machine Learning

CSL465/603 - Machine Learning CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am

More information

Speeding Up Reinforcement Learning with Behavior Transfer

Speeding Up Reinforcement Learning with Behavior Transfer Speeding Up Reinforcement Learning with Behavior Transfer Matthew E. Taylor and Peter Stone Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188 {mtaylor, pstone}@cs.utexas.edu

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

AMULTIAGENT system [1] can be defined as a group of

AMULTIAGENT system [1] can be defined as a group of 156 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 38, NO. 2, MARCH 2008 A Comprehensive Survey of Multiagent Reinforcement Learning Lucian Buşoniu, Robert Babuška,

More information

Foothill College Summer 2016

Foothill College Summer 2016 Foothill College Summer 2016 Intermediate Algebra Math 105.04W CRN# 10135 5.0 units Instructor: Yvette Butterworth Text: None; Beoga.net material used Hours: Online Except Final Thurs, 8/4 3:30pm Phone:

More information

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots Varun Raj Kompella, Marijn Stollenga, Matthew Luciw, Juergen Schmidhuber The Swiss AI Lab IDSIA, USI

More information

Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Basic Concepts of Machine Learning Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010

More information

Improving Action Selection in MDP s via Knowledge Transfer

Improving Action Selection in MDP s via Knowledge Transfer In Proc. 20th National Conference on Artificial Intelligence (AAAI-05), July 9 13, 2005, Pittsburgh, USA. Improving Action Selection in MDP s via Knowledge Transfer Alexander A. Sherstov and Peter Stone

More information

Introduction to Simulation

Introduction to Simulation Introduction to Simulation Spring 2010 Dr. Louis Luangkesorn University of Pittsburgh January 19, 2010 Dr. Louis Luangkesorn ( University of Pittsburgh ) Introduction to Simulation January 19, 2010 1 /

More information

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014 UNSW Australia Business School School of Risk and Actuarial Studies ACTL5103 Stochastic Modelling For Actuaries Course Outline Semester 2, 2014 Part A: Course-Specific Information Please consult Part B

More information

The Evolution of Random Phenomena

The Evolution of Random Phenomena The Evolution of Random Phenomena A Look at Markov Chains Glen Wang glenw@uchicago.edu Splash! Chicago: Winter Cascade 2012 Lecture 1: What is Randomness? What is randomness? Can you think of some examples

More information

Automatic Discretization of Actions and States in Monte-Carlo Tree Search

Automatic Discretization of Actions and States in Monte-Carlo Tree Search Automatic Discretization of Actions and States in Monte-Carlo Tree Search Guy Van den Broeck 1 and Kurt Driessens 2 1 Katholieke Universiteit Leuven, Department of Computer Science, Leuven, Belgium guy.vandenbroeck@cs.kuleuven.be

More information

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF Read Online and Download Ebook ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF Click link bellow and free register to download

More information

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com

More information

Evolutive Neural Net Fuzzy Filtering: Basic Description

Evolutive Neural Net Fuzzy Filtering: Basic Description Journal of Intelligent Learning Systems and Applications, 2010, 2: 12-18 doi:10.4236/jilsa.2010.21002 Published Online February 2010 (http://www.scirp.org/journal/jilsa) Evolutive Neural Net Fuzzy Filtering:

More information

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering Lecture Details Instructor Course Objectives Tuesday and Thursday, 4:00 pm to 5:15 pm Information Technology and Engineering

More information

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Neural Network GUI Tested on Text-To-Phoneme Mapping A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis

More information

FF+FPG: Guiding a Policy-Gradient Planner

FF+FPG: Guiding a Policy-Gradient Planner FF+FPG: Guiding a Policy-Gradient Planner Olivier Buffet LAAS-CNRS University of Toulouse Toulouse, France firstname.lastname@laas.fr Douglas Aberdeen National ICT australia & The Australian National University

More information

A Reinforcement Learning Variant for Control Scheduling

A Reinforcement Learning Variant for Control Scheduling A Reinforcement Learning Variant for Control Scheduling Aloke Guha Honeywell Sensor and System Development Center 3660 Technology Drive Minneapolis MN 55417 Abstract We present an algorithm based on reinforcement

More information

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14) IAT 888: Metacreation Machines endowed with creative behavior Philippe Pasquier Office 565 (floor 14) pasquier@sfu.ca Outline of today's lecture A little bit about me A little bit about you What will that

More information

University of Victoria School of Exercise Science, Physical and Health Education EPHE 245 MOTOR LEARNING. Calendar Description Units: 1.

University of Victoria School of Exercise Science, Physical and Health Education EPHE 245 MOTOR LEARNING. Calendar Description Units: 1. University of Victoria School of Exercise Science, Physical and Health Education EPHE 245 MOTOR LEARNING Calendar Description Units: 1.5 Hours: 3-2 Neural and cognitive processes underlying human skilled

More information

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology Tiancheng Zhao CMU-LTI-16-006 Language Technologies Institute School of Computer Science Carnegie Mellon

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners

Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners Andrea L. Thomaz and Cynthia Breazeal Abstract While Reinforcement Learning (RL) is not traditionally designed

More information

Generative models and adversarial training

Generative models and adversarial training Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

An OO Framework for building Intelligence and Learning properties in Software Agents

An OO Framework for building Intelligence and Learning properties in Software Agents An OO Framework for building Intelligence and Learning properties in Software Agents José A. R. P. Sardinha, Ruy L. Milidiú, Carlos J. P. Lucena, Patrick Paranhos Abstract Software agents are defined as

More information

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016 AGENDA Advanced Learning Theories Alejandra J. Magana, Ph.D. admagana@purdue.edu Introduction to Learning Theories Role of Learning Theories and Frameworks Learning Design Research Design Dual Coding Theory

More information

CS 100: Principles of Computing

CS 100: Principles of Computing CS 100: Principles of Computing Kevin Molloy August 29, 2017 1 Basic Course Information 1.1 Prerequisites: None 1.2 General Education Fulfills Mason Core requirement in Information Technology (ALL). 1.3

More information

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes Instructor: Dr. Gregory L. Wiles Email Address: Use D2L e-mail, or secondly gwiles@spsu.edu Office: M

More information

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach #BaselOne7 Deep search Enhancing a search bar using machine learning Ilgün Ilgün & Cedric Reichenbach We are not researchers Outline I. Periscope: A search tool II. Goals III. Deep learning IV. Applying

More information

arxiv: v1 [cs.lg] 15 Jun 2015

arxiv: v1 [cs.lg] 15 Jun 2015 Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and

More information

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Active Learning. Yingyu Liang Computer Sciences 760 Fall Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,

More information

THE UNIVERSITY OF SYDNEY Semester 2, Information Sheet for MATH2068/2988 Number Theory and Cryptography

THE UNIVERSITY OF SYDNEY Semester 2, Information Sheet for MATH2068/2988 Number Theory and Cryptography THE UNIVERSITY OF SYDNEY Semester 2, 2017 Information Sheet for MATH2068/2988 Number Theory and Cryptography Websites: It is important that you check the following webpages regularly. Intermediate Mathematics

More information

Managerial Decision Making

Managerial Decision Making Course Business Managerial Decision Making Session 4 Conditional Probability & Bayesian Updating Surveys in the future... attempt to participate is the important thing Work-load goals Average 6-7 hours,

More information

Evolution of Symbolisation in Chimpanzees and Neural Nets

Evolution of Symbolisation in Chimpanzees and Neural Nets Evolution of Symbolisation in Chimpanzees and Neural Nets Angelo Cangelosi Centre for Neural and Adaptive Systems University of Plymouth (UK) a.cangelosi@plymouth.ac.uk Introduction Animal communication

More information

Knowledge Transfer in Deep Convolutional Neural Nets

Knowledge Transfer in Deep Convolutional Neural Nets Knowledge Transfer in Deep Convolutional Neural Nets Steven Gutstein, Olac Fuentes and Eric Freudenthal Computer Science Department University of Texas at El Paso El Paso, Texas, 79968, U.S.A. Abstract

More information

Learning Prospective Robot Behavior

Learning Prospective Robot Behavior Learning Prospective Robot Behavior Shichao Ou and Rod Grupen Laboratory for Perceptual Robotics Computer Science Department University of Massachusetts Amherst {chao,grupen}@cs.umass.edu Abstract This

More information

Seminar - Organic Computing

Seminar - Organic Computing Seminar - Organic Computing Self-Organisation of OC-Systems Markus Franke 25.01.2006 Typeset by FoilTEX Timetable 1. Overview 2. Characteristics of SO-Systems 3. Concern with Nature 4. Design-Concepts

More information

Visual CP Representation of Knowledge

Visual CP Representation of Knowledge Visual CP Representation of Knowledge Heather D. Pfeiffer and Roger T. Hartley Department of Computer Science New Mexico State University Las Cruces, NM 88003-8001, USA email: hdp@cs.nmsu.edu and rth@cs.nmsu.edu

More information

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley Challenges in Deep Reinforcement Learning Sergey Levine UC Berkeley Discuss some recent work in deep reinforcement learning Present a few major challenges Show some of our recent work toward tackling

More information

Using focal point learning to improve human machine tacit coordination

Using focal point learning to improve human machine tacit coordination DOI 10.1007/s10458-010-9126-5 Using focal point learning to improve human machine tacit coordination InonZuckerman SaritKraus Jeffrey S. Rosenschein The Author(s) 2010 Abstract We consider an automated

More information

An Introduction to Simio for Beginners

An Introduction to Simio for Beginners An Introduction to Simio for Beginners C. Dennis Pegden, Ph.D. This white paper is intended to introduce Simio to a user new to simulation. It is intended for the manufacturing engineer, hospital quality

More information

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,

More information

Planning with External Events

Planning with External Events 94 Planning with External Events Jim Blythe School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 blythe@cs.cmu.edu Abstract I describe a planning methodology for domains with uncertainty

More information

Intelligent Agents. Chapter 2. Chapter 2 1

Intelligent Agents. Chapter 2. Chapter 2 1 Intelligent Agents Chapter 2 Chapter 2 1 Outline Agents and environments Rationality PEAS (Performance measure, Environment, Actuators, Sensors) Environment types The structure of agents Chapter 2 2 Agents

More information

AI Agent for Ice Hockey Atari 2600

AI Agent for Ice Hockey Atari 2600 AI Agent for Ice Hockey Atari 2600 Emman Kabaghe (emmank@stanford.edu) Rajarshi Roy (rroy@stanford.edu) 1 Introduction In the reinforcement learning (RL) problem an agent autonomously learns a behavior

More information

Surprise-Based Learning for Autonomous Systems

Surprise-Based Learning for Autonomous Systems Surprise-Based Learning for Autonomous Systems Nadeesha Ranasinghe and Wei-Min Shen ABSTRACT Dealing with unexpected situations is a key challenge faced by autonomous robots. This paper describes a promising

More information

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits. DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE Sample 2-Year Academic Plan DRAFT Junior Year Summer (Bridge Quarter) Fall Winter Spring MMDP/GAME 124 GAME 310 GAME 318 GAME 330 Introduction to Maya

More information

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One

More information

DOCTOR OF PHILOSOPHY HANDBOOK

DOCTOR OF PHILOSOPHY HANDBOOK University of Virginia Department of Systems and Information Engineering DOCTOR OF PHILOSOPHY HANDBOOK 1. Program Description 2. Degree Requirements 3. Advisory Committee 4. Plan of Study 5. Comprehensive

More information

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014 EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014 Course Description The goals of this course are to: (1) formulate a mathematical model describing a physical phenomenon; (2) to discretize

More information

Major Milestones, Team Activities, and Individual Deliverables

Major Milestones, Team Activities, and Individual Deliverables Major Milestones, Team Activities, and Individual Deliverables Milestone #1: Team Semester Proposal Your team should write a proposal that describes project objectives, existing relevant technology, engineering

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

On the Combined Behavior of Autonomous Resource Management Agents

On the Combined Behavior of Autonomous Resource Management Agents On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science

More information

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,

More information

Learning to Schedule Straight-Line Code

Learning to Schedule Straight-Line Code Learning to Schedule Straight-Line Code Eliot Moss, Paul Utgoff, John Cavazos Doina Precup, Darko Stefanović Dept. of Comp. Sci., Univ. of Mass. Amherst, MA 01003 Carla Brodley, David Scheeff Sch. of Elec.

More information

Speaker Identification by Comparison of Smart Methods. Abstract

Speaker Identification by Comparison of Smart Methods. Abstract Journal of mathematics and computer science 10 (2014), 61-71 Speaker Identification by Comparison of Smart Methods Ali Mahdavi Meimand Amin Asadi Majid Mohamadi Department of Electrical Department of Computer

More information

Probability and Game Theory Course Syllabus

Probability and Game Theory Course Syllabus Probability and Game Theory Course Syllabus DATE ACTIVITY CONCEPT Sunday Learn names; introduction to course, introduce the Battle of the Bismarck Sea as a 2-person zero-sum game. Monday Day 1 Pre-test

More information

Learning and Transferring Relational Instance-Based Policies

Learning and Transferring Relational Instance-Based Policies Learning and Transferring Relational Instance-Based Policies Rocío García-Durán, Fernando Fernández y Daniel Borrajo Universidad Carlos III de Madrid Avda de la Universidad 30, 28911-Leganés (Madrid),

More information

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017 Instructor Syed Zahid Ali Room No. 247 Economics Wing First Floor Office Hours Email szahid@lums.edu.pk Telephone Ext. 8074 Secretary/TA TA Office Hours Course URL (if any) Suraj.lums.edu.pk FINN 321 Econometrics

More information

Cal s Dinner Card Deals

Cal s Dinner Card Deals Cal s Dinner Card Deals Overview: In this lesson students compare three linear functions in the context of Dinner Card Deals. Students are required to interpret a graph for each Dinner Card Deal to help

More information

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search Using Deep Convolutional Neural Networks in Monte Carlo Tree Search Tobias Graf (B) and Marco Platzner University of Paderborn, Paderborn, Germany tobiasg@mail.upb.de, platzner@upb.de Abstract. Deep Convolutional

More information

An investigation of imitation learning algorithms for structured prediction

An investigation of imitation learning algorithms for structured prediction JMLR: Workshop and Conference Proceedings 24:143 153, 2012 10th European Workshop on Reinforcement Learning An investigation of imitation learning algorithms for structured prediction Andreas Vlachos Computer

More information

Game-based formative assessment: Newton s Playground. Valerie Shute, Matthew Ventura, & Yoon Jeon Kim (Florida State University), NCME, April 30, 2013

Game-based formative assessment: Newton s Playground. Valerie Shute, Matthew Ventura, & Yoon Jeon Kim (Florida State University), NCME, April 30, 2013 Game-based formative assessment: Newton s Playground Valerie Shute, Matthew Ventura, & Yoon Jeon Kim (Florida State University), NCME, April 30, 2013 Fun & Games Assessment Needs Game-based stealth assessment

More information

CS177 Python Programming

CS177 Python Programming CS177 Python Programming Recitation 1 Introduction Adapted from John Zelle s Book Slides 1 Course Instructors Dr. Elisha Sacks E-mail: eps@purdue.edu Ruby Tahboub (Course Coordinator) E-mail: rtahboub@purdue.edu

More information

MAT 122 Intermediate Algebra Syllabus Summer 2016

MAT 122 Intermediate Algebra Syllabus Summer 2016 Instructor: Gary Adams Office: None (I am adjunct faculty) Phone: None Email: gary.adams@scottsdalecc.edu Office Hours: None CLASS TIME and LOCATION: Title Section Days Time Location Campus MAT122 12562

More information

A Review: Speech Recognition with Deep Learning Methods

A Review: Speech Recognition with Deep Learning Methods Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 5, May 2015, pg.1017

More information

A Case-Based Approach To Imitation Learning in Robotic Agents

A Case-Based Approach To Imitation Learning in Robotic Agents A Case-Based Approach To Imitation Learning in Robotic Agents Tesca Fitzgerald, Ashok Goel School of Interactive Computing Georgia Institute of Technology, Atlanta, GA 30332, USA {tesca.fitzgerald,goel}@cc.gatech.edu

More information

EVOLVING POLICIES TO SOLVE THE RUBIK S CUBE: EXPERIMENTS WITH IDEAL AND APPROXIMATE PERFORMANCE FUNCTIONS

EVOLVING POLICIES TO SOLVE THE RUBIK S CUBE: EXPERIMENTS WITH IDEAL AND APPROXIMATE PERFORMANCE FUNCTIONS EVOLVING POLICIES TO SOLVE THE RUBIK S CUBE: EXPERIMENTS WITH IDEAL AND APPROXIMATE PERFORMANCE FUNCTIONS by Robert Smith Submitted in partial fulfillment of the requirements for the degree of Master of

More information

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus CS 1103 Computer Science I Honors Fall 2016 Instructor Muller Syllabus Welcome to CS1103. This course is an introduction to the art and science of computer programming and to some of the fundamental concepts

More information

Learning goal-oriented strategies in problem solving

Learning goal-oriented strategies in problem solving Learning goal-oriented strategies in problem solving Martin Možina, Timotej Lazar, Ivan Bratko Faculty of Computer and Information Science University of Ljubljana, Ljubljana, Slovenia Abstract The need

More information

Softprop: Softmax Neural Network Backpropagation Learning

Softprop: Softmax Neural Network Backpropagation Learning Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science

More information

Welcome to. ECML/PKDD 2004 Community meeting

Welcome to. ECML/PKDD 2004 Community meeting Welcome to ECML/PKDD 2004 Community meeting A brief report from the program chairs Jean-Francois Boulicaut, INSA-Lyon, France Floriana Esposito, University of Bari, Italy Fosca Giannotti, ISTI-CNR, Pisa,

More information

Spring 2015 Natural Science I: Quarks to Cosmos CORE-UA 209. SYLLABUS and COURSE INFORMATION.

Spring 2015 Natural Science I: Quarks to Cosmos CORE-UA 209. SYLLABUS and COURSE INFORMATION. Spring 2015 Natural Science I: Quarks to Cosmos CORE-UA 209 Professor Peter Nemethy SYLLABUS and COURSE INFORMATION. Office: 707 Meyer Telephone: 8-7747 ( external 212 998 7747 ) e-mail: peter.nemethy@nyu.edu

More information

Photography: Photojournalism and Digital Media Jim Lang/B , extension 3069 Course Descriptions

Photography: Photojournalism and Digital Media Jim Lang/B , extension 3069 Course Descriptions Course Descriptions Photography: Photojournalism and Digital Media Jim Lang/B105-107 812-542-8504, extension 3069 jlang@nafcs.k12.in.us http://fcmediamatters.wordpress.com Journalism I: Journalism I is

More information

Model Ensemble for Click Prediction in Bing Search Ads

Model Ensemble for Click Prediction in Bing Search Ads Model Ensemble for Click Prediction in Bing Search Ads Xiaoliang Ling Microsoft Bing xiaoling@microsoft.com Hucheng Zhou Microsoft Research huzho@microsoft.com Weiwei Deng Microsoft Bing dedeng@microsoft.com

More information

FINANCE 3320 Financial Management Syllabus May-Term 2016 *

FINANCE 3320 Financial Management Syllabus May-Term 2016 * FINANCE 3320 Financial Management Syllabus May-Term 2016 * Instructor details: Professor Mukunthan Santhanakrishnan Office: Fincher 335 Office phone: 214-768-2260 Email: muku@smu.edu Class details: Days:

More information

On the Formation of Phoneme Categories in DNN Acoustic Models

On the Formation of Phoneme Categories in DNN Acoustic Models On the Formation of Phoneme Categories in DNN Acoustic Models Tasha Nagamine Department of Electrical Engineering, Columbia University T. Nagamine Motivation Large performance gap between humans and state-

More information

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors Master s Programme in Computer, Communication and Information Sciences, Study guide 2015-2016, ELEC Majors Sisällysluettelo PS=pääsivu, AS=alasivu PS: 1 Acoustics and Audio Technology... 4 Objectives...

More information