Reinforcement Learning. CS 188: Artificial Intelligence Reinforcement Learning. Reinforcement Learning. Example: Learning to Walk. The Crawler!

Size: px
Start display at page:

Download "Reinforcement Learning. CS 188: Artificial Intelligence Reinforcement Learning. Reinforcement Learning. Example: Learning to Walk. The Crawler!"

Transcription

1 CS 188: rtificial Intelligence Dan Klein, Pieter bbeel Univerity of California, Berkeley xample: Learning to Walk gent State: Reward: r ction: a nvironment Baic idea: Receive feedback in the form of reward gent utility i defined by the reward function Mut (learn to) act o a to maximize expected reward ll learning i baed on oberved ample of outcome! Before Learning Learning Trial fter Learning [1K Trial] [Kohl and Stone, ICR 24] The Crawler! Still aume a Markov deciion proce (MDP): et of tate S et of action (per tate) model T(,a, ) reward function R(,a, ) Still looking for a policy π() New twit: don t know T or R I.e. we don t know which tate are good or what the action do Mut actually try action and tate out to learn [You, in Project 3] 1

2 Offline (MDP) v. Online (RL) Paive Offline Solution Online Learning Paive Simplified tak: policy evaluation Input: a fixed policy π() You don t know the tranition T(,a, ) You don t know the reward R(,a, ) Goal: learn the tate value In thi cae: Learner i along for the ride No choice about what action to take Jut execute the policy and learn from experience Thi i NOT offline planning! You actually take action in the world. Direct valuation Goal: Compute value for each tate under π Idea: verage together oberved ample value ct according to π very time you viit a tate, write down what the um of dicounted reward turned out to be verage thoe ample Thi i called direct evaluation xample: Direct valuation Problem with Direct valuation Input Policy π ume: γ= 1 Oberved piode (Training) piode 1 piode 2 B, eat, C, -1 D, exit, x, +1 B, eat, C, -1 D, exit, x, +1 piode 3 piode 4, north, C, -1 D, exit, x, +1, north, C, -1 C, eat,, -1, exit, x, -1 Output Value What good about direct evaluation? It eay to undertand It doen t require any knowledge of T, R It eventually compute the correct average value, uing jut ample tranition What bad about it? It wate information about tate connection ach tate mut be learned eparately So, it take a long time to learn Output Value If B and both go to C under thi policy, how can their value be different? 2

3 Why Not Ue Policy valuation? Simplified Bellman update calculate V for a fixed policy: ach round, replace V with a one-tep-look-ahead layer over V π(), π() xample: xpected ge Goal: Compute expected age of c188 tudent Known P() Thi approach fully exploited the connection between the tate Unfortunately, we need T and R to do it!, π(), Key quetion: how can we do thi update to V without knowing T and R? In other word, how to we take a weighted average without knowing the weight? Why doe thi work? Becaue eventually you learn the right model. Without P(), intead collect ample [a 1, a 2, a N ] Unknown P(): Model Baed Unknown P(): Model Free Why doe thi work? Becaue ample appear with the right frequencie. Model-Baed Learning Model-Baed Learning Model-Baed Idea: Learn an approximate model baed on experience Solve for value a if the learned model were correct Step 1: Learn empirical MDP model Count outcome for each, a Normalize to give an etimate of Dicover each when we experience (, a, ) Step 2: Solve the learned MDP For example, ue policy evaluation xample: Model-Baed Learning Model-Free Learning Input Policy π ume: γ= 1 Oberved piode (Training) piode 1 piode 2 B, eat, C, -1 D, exit, x, +1 B, eat, C, -1 D, exit, x, +1 piode 3 piode 4, north, C, -1 D, exit, x, +1, north, C, -1 C, eat,, -1, exit, x, -1 Learned Model T(,a, ). T(B, eat, C) = 1. T(C, eat, D) =.75 T(C, eat, ) =.25 R(,a, ). R(B, eat, C) = -1 R(C, eat, D) = -1 R(D, exit, x) = +1 3

4 Sample-Baed Policy valuation? Temporal Difference Learning We want to improve our etimate of V by computing thee average: Idea: Take ample of outcome (by doing the action!) and average π() Big idea: learn from every experience! Update V() each time we experience a tranition (, a,, r) Likely outcome will contribute update more often Temporal difference learning of value Policy till fixed, till doing evaluation! Move value toward value of whatever ucceor occur: running average π(), π(), π(), π(), 2' ' 1' 3' Sample of V(): Update to V(): lmot! But we can t rewind time to get ample after ample from tate. Same update: xponential Moving verage xample: Temporal Difference Learning xponential moving average The running interpolation update: State Oberved Tranition B, eat, C, -2 C, eat, D, -2 Make recent ample more important: Forget about the pat (ditant pat value were wrong anyway) Decreaing learning rate (alpha) can give converging average ume: γ= 1, α = 1/2 Problem with TD Value Learning ctive TD value leaning i a model-free way to do policy evaluation, mimicking Bellman update with running ample average However, if we want to turn value into a (new) policy, we re unk: a, a Idea: learn Q-value, not value Make action election model-free too!,a, 4

5 ctive Full reinforcement learning: optimal policie (like value iteration) You don t know the tranition T(,a, ) You don t know the reward R(,a, ) You chooe the action now Goal: learn the optimal policy / value In thi cae: Learner make choice! Fundamental tradeoff: exploration v. exploitation Thi i NOT offline planning! You actually take action in the world and find out what happen Detour: Q-Value Iteration Value iteration: find ucceive (depth-limited) value Start with V () =, which we know i right Given V k, calculate the depth k+1 value for all tate: But Q-value are more ueful, o compute them intead Start with Q (,a) =, which we know i right Given Q k, calculate the depth k+1 q-value for all q-tate: Q-Learning Q-Learning: ample-baed Q-value iteration Learn Q(,a) value a you go Receive a ample (,a,,r) Conider your old etimate: Conider your new ample etimate: Incorporate the new etimate into a running average: Q-Learning Propertie mazing reult: Q-learning converge to optimal policy --even if you re acting uboptimally! Thi i called off-policy learning Caveat: You have to explore enough You have to eventually make the learning rate mall enough but not decreae it too quickly Baically, in the limit, it doen t matter how you elect action (!) [demo grid, crawler Q ] CS 188: rtificial Intelligence II We till aume an MDP: et of tate S et of action (per tate) model T(,a, ) reward function R(,a, ) Still looking for a policy π() Dan Klein, Pieter bbeel Univerity of California, Berkeley New twit: don t know T or R I.e. don t know which tate are good or what the action do Mut actually try action and tate out to learn 5

6 The Story So Far: MDP and RL Known MDP: Offline Solution Goal Technique Compute V*, Q*, π* Value / policy iteration valuate a fixed policy π Policy evaluation Unknown MDP: Model-Baed Unknown MDP: Model-Free Goal Technique Goal Technique Compute V*, Q*, π* VI/PI on approx. MDP Compute V*, Q*, π* Q-learning valuate a fixed policy π P on approx. MDP valuate a fixed policy π Value Learning Model-free (temporal difference) learning xperience world through epiode Update etimate each tranition Over time, update will mimic Bellman update Model-Free Learning Q-Value Iteration (model-baed, require known MDP) Q-Learning (model-free, require only experienced tranition) a, a r a, a Q-Learning We d like to do Q-value update to each Q-tate: But can t compute thi update without knowing T, R Intead, compute average a we go Receive a ample tranition (,a,r, ) Thi ample ugget But we want to average over reult from (,a) (Why?) So keep a running average Q-Learning Propertie mazing reult: Q-learning converge to optimal policy --even if you re acting uboptimally! Thi i called off-policy learning Caveat: You have to explore enough You have to eventually make the learning rate mall enough but not decreae it too quickly Baically, in the limit, it doen t matter how you elect action (!) [demo off policy] xploration v. xploitation How to xplore? Several cheme for forcing exploration Simplet: random action (ε-greedy) very time tep, flip a coin With (mall) probability ε, act randomly With (large) probability 1-ε, act on current policy Problem with random action? You do eventually explore the pace, but keep thrahing around once learning i done One olution: lower εover time nother olution: exploration function [demo crawler] 6

7 xploration Function Regret When to explore? Random action: explore a fixed amount Better idea: explore area whoe badne i not (yet) etablihed, eventually top exploring xploration function Take a value etimate u and a viit count n, and return an optimitic utility, e.g. Regular Q-Update: Modified Q-Update: Note: thi propagate the bonu back to tate that lead to unknown tate a well! [demo crawler] ven if you learn the optimal policy, you till make mitake along the way! Regret i a meaure of your total mitake cot: the difference between your (expected) reward, including youthful uboptimality, and optimal (expected) reward Minimizing regret goe beyond learning to be optimal it require optimally learning to be optimal xample: random exploration and exploration function both end up optimal, but random exploration ha higher regret pproximate Q-Learning Generalizing cro State Baic Q-Learning keep a table of all q-value In realitic ituation, we cannot poibly learn about every ingle tate! Too many tate to viit them all in training Too many tate to hold the q-table in memory Intead, we want to generalize: Learn about ome mall number of training tate from experience Generalize that experience to new, imilar ituation Thi i a fundamental idea in machine learning, and we ll ee it over and over again xample: Pacman Feature-Baed Repreentation Let ay we dicover through experience that thi tate i bad: In naïve q-learning, we know nothing about thi tate: Or even thi one! Solution: decribe a tate uing a vector of feature (propertie) Feature are function from tate to real number (often /1) that capture important propertie of the tate xample feature: Ditance to cloet ghot Ditance to cloet dot Number of ghot 1 / (dit to dot) 2 I Pacmanin a tunnel? (/1) etc. I it the exact tate on thi lide? Can alo decribe a q-tate (, a) with feature (e.g. action move cloer to food) [demo RL pacman] 7

8 Linear Value Function pproximate Q-Learning Uing a feature repreentation, we can write a q function (or value function) for any tate uing a few weight: Q-learning with linear Q-function: dvantage: our experience i ummed up in a few powerful number Diadvantage: tate may hare feature but actually be very different in value! Intuitive interpretation: djut weight of active feature.g., if omething unexpectedly bad happen, blame the feature that were on: dipreferall tate with that tate feature Formal jutification: online leat quare xact Q pproximate Q xample: Q-Pacman Q-Learning and Leat Square [demo RL pacman] Linear pproximation: Regreion* Optimization: Leat Square* Obervation rror or reidual Prediction Prediction: Prediction: 2 8

9 Minimizing rror* Imagine we had only one point x, with feature f(x), target value y, and weight w: Overfitting: Why Limiting Capacity Can Help* Degree 15 polynomial pproximate q update explained: -5 target prediction Policy Search Policy Search Problem: often the feature-baed policie that work well (win game, maximize utilitie) aren t the one that approximate V / Q bet.g. your value function from project 2 were probably horrible etimate of future reward, but they till produced good deciion Q-learning priority: get Q-value cloe (modeling) ction election priority: get ordering of Q-value right (prediction) We ll ee thi ditinction between modeling and prediction again later in the coure Solution: learn policie that maximize reward, not the value that predict them Policy earch: tart with an ok olution (e.g. Q-learning) then fine-tune by hill climbing on feature weight Policy Search Simplet policy earch: Start with an initial linear value function or Q-function Nudge each feature weight up and down and ee if your policy i better than before Problem: How do we tell the policy got better? Need to run many ample epiode! If there are a lot of feature, thi can be impractical Better method exploit lookaheadtructure, ample wiely, change multiple parameter Concluion We re done with Part I: Search and Planning! We ve een how I method can olve problem in: Search Contraint Satifaction Problem Game Markov Deciion Problem Next up: Part II: Uncertainty and Learning! 9

Lecture 10: Reinforcement Learning

Lecture 10: Reinforcement Learning Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation

More information

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Exploration. CS : Deep Reinforcement Learning Sergey Levine Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hoted at the Radboud Repoitory of the Radboud Univerity Nijmegen The folloing full text i a publiher' verion. For additional information about thi publication click thi link. http://hdl.handle.net/2066/43776

More information

Reinforcement Learning by Comparing Immediate Reward

Reinforcement Learning by Comparing Immediate Reward Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks A Joint Many-Tak Model: Growing a Neural Network for Multiple NLP Tak Kazuma Hahimoto, Caiming Xiong, Yohimaa Turuoka, and Richard Socher The Univerity of Tokyo {hay, turuoka}@logo.t.u-tokyo.ac.jp Saleforce

More information

INFORMATION SEEKING BEHAVIOR OF USERS OF ICT ORIENTED COLLEGES: A CASE STUDY

INFORMATION SEEKING BEHAVIOR OF USERS OF ICT ORIENTED COLLEGES: A CASE STUDY Review Of Reearch Impact Factor :.40(UIF) ISSN 49-894X Volume - 5 Iue - Oct - 05 INFORMATION SEEKING BEHAVIOR OF USERS OF ICT ORIENTED COLLEGES: A CASE STUDY Dr. Sachin D. Sakarkar Shri. R. R. Lahoti Science

More information

Improving Action Selection in MDP s via Knowledge Transfer

Improving Action Selection in MDP s via Knowledge Transfer In Proc. 20th National Conference on Artificial Intelligence (AAAI-05), July 9 13, 2005, Pittsburgh, USA. Improving Action Selection in MDP s via Knowledge Transfer Alexander A. Sherstov and Peter Stone

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

AMULTIAGENT system [1] can be defined as a group of

AMULTIAGENT system [1] can be defined as a group of 156 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 38, NO. 2, MARCH 2008 A Comprehensive Survey of Multiagent Reinforcement Learning Lucian Buşoniu, Robert Babuška,

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Speeding Up Reinforcement Learning with Behavior Transfer

Speeding Up Reinforcement Learning with Behavior Transfer Speeding Up Reinforcement Learning with Behavior Transfer Matthew E. Taylor and Peter Stone Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188 {mtaylor, pstone}@cs.utexas.edu

More information

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Proceedings of 28 ISFA 28 International Symposium on Flexible Automation Atlanta, GA, USA June 23-26, 28 ISFA28U_12 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Amit Gil, Helman Stern, Yael Edan, and

More information

Regret-based Reward Elicitation for Markov Decision Processes

Regret-based Reward Elicitation for Markov Decision Processes 444 REGAN & BOUTILIER UAI 2009 Regret-based Reward Elicitation for Markov Decision Processes Kevin Regan Department of Computer Science University of Toronto Toronto, ON, CANADA kmregan@cs.toronto.edu

More information

Axiom 2013 Team Description Paper

Axiom 2013 Team Description Paper Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association

More information

The Evolution of Random Phenomena

The Evolution of Random Phenomena The Evolution of Random Phenomena A Look at Markov Chains Glen Wang glenw@uchicago.edu Splash! Chicago: Winter Cascade 2012 Lecture 1: What is Randomness? What is randomness? Can you think of some examples

More information

High-level Reinforcement Learning in Strategy Games

High-level Reinforcement Learning in Strategy Games High-level Reinforcement Learning in Strategy Games Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu Guy Shani Department of Computer

More information

12- A whirlwind tour of statistics

12- A whirlwind tour of statistics CyLab HT 05-436 / 05-836 / 08-534 / 08-734 / 19-534 / 19-734 Usable Privacy and Security TP :// C DU February 22, 2016 y & Secu rivac rity P le ratory bo La Lujo Bauer, Nicolas Christin, and Abby Marsh

More information

Major Milestones, Team Activities, and Individual Deliverables

Major Milestones, Team Activities, and Individual Deliverables Major Milestones, Team Activities, and Individual Deliverables Milestone #1: Team Semester Proposal Your team should write a proposal that describes project objectives, existing relevant technology, engineering

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

ALL-IN-ONE MEETING GUIDE THE ECONOMICS OF WELL-BEING

ALL-IN-ONE MEETING GUIDE THE ECONOMICS OF WELL-BEING ALL-IN-ONE MEETING GUIDE THE ECONOMICS OF WELL-BEING LeanIn.0rg, 2016 1 Overview Do we limit our thinking and focus only on short-term goals when we make trade-offs between career and family? This final

More information

TD(λ) and Q-Learning Based Ludo Players

TD(λ) and Q-Learning Based Ludo Players TD(λ) and Q-Learning Based Ludo Players Majed Alhajry, Faisal Alvi, Member, IEEE and Moataz Ahmed Abstract Reinforcement learning is a popular machine learning technique whose inherent self-learning ability

More information

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

More information

FF+FPG: Guiding a Policy-Gradient Planner

FF+FPG: Guiding a Policy-Gradient Planner FF+FPG: Guiding a Policy-Gradient Planner Olivier Buffet LAAS-CNRS University of Toulouse Toulouse, France firstname.lastname@laas.fr Douglas Aberdeen National ICT australia & The Australian National University

More information

Georgetown University at TREC 2017 Dynamic Domain Track

Georgetown University at TREC 2017 Dynamic Domain Track Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain

More information

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA Testing a Moving Target How Do We Test Machine Learning Systems? Peter Varhol, Technology

More information

A Syntactic Description of German in a Formalism Designed for Machine Translation

A Syntactic Description of German in a Formalism Designed for Machine Translation A Syntactic Decription of German in a Formalim Deigned for Machine Tranlation Paul Schmldt A-Eurotra-D Martln-Luther-Str. 14 D-6600 Saarbrlickcn Wet-Germany Abtract: Thi paper preent a yntactic decription

More information

Hentai High School A Game Guide

Hentai High School A Game Guide Hentai High School A Game Guide Hentai High School is a sex game where you are the Principal of a high school with the goal of turning the students into sex crazed people within 15 years. The game is difficult

More information

Go fishing! Responsibility judgments when cooperation breaks down

Go fishing! Responsibility judgments when cooperation breaks down Go fishing! Responsibility judgments when cooperation breaks down Kelsey Allen (krallen@mit.edu), Julian Jara-Ettinger (jjara@mit.edu), Tobias Gerstenberg (tger@mit.edu), Max Kleiman-Weiner (maxkw@mit.edu)

More information

Characteristics of Functions

Characteristics of Functions Characteristics of Functions Unit: 01 Lesson: 01 Suggested Duration: 10 days Lesson Synopsis Students will collect and organize data using various representations. They will identify the characteristics

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

IN THIS UNIT YOU LEARN HOW TO: SPEAKING 1 Work in pairs. Discuss the questions. 2 Work with a new partner. Discuss the questions.

IN THIS UNIT YOU LEARN HOW TO: SPEAKING 1 Work in pairs. Discuss the questions. 2 Work with a new partner. Discuss the questions. 6 1 IN THIS UNIT YOU LEARN HOW TO: ask and answer common questions about jobs talk about what you re doing at work at the moment talk about arrangements and appointments recognise and use collocations

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

An OO Framework for building Intelligence and Learning properties in Software Agents

An OO Framework for building Intelligence and Learning properties in Software Agents An OO Framework for building Intelligence and Learning properties in Software Agents José A. R. P. Sardinha, Ruy L. Milidiú, Carlos J. P. Lucena, Patrick Paranhos Abstract Software agents are defined as

More information

Generative models and adversarial training

Generative models and adversarial training Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?

More information

Laboratorio di Intelligenza Artificiale e Robotica

Laboratorio di Intelligenza Artificiale e Robotica Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning

More information

Task Completion Transfer Learning for Reward Inference

Task Completion Transfer Learning for Reward Inference Task Completion Transfer Learning for Reward Inference Layla El Asri 1,2, Romain Laroche 1, Olivier Pietquin 3 1 Orange Labs, Issy-les-Moulineaux, France 2 UMI 2958 (CNRS - GeorgiaTech), France 3 University

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Softprop: Softmax Neural Network Backpropagation Learning

Softprop: Softmax Neural Network Backpropagation Learning Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science

More information

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots Varun Raj Kompella, Marijn Stollenga, Matthew Luciw, Juergen Schmidhuber The Swiss AI Lab IDSIA, USI

More information

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley Challenges in Deep Reinforcement Learning Sergey Levine UC Berkeley Discuss some recent work in deep reinforcement learning Present a few major challenges Show some of our recent work toward tackling

More information

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Learning Prospective Robot Behavior

Learning Prospective Robot Behavior Learning Prospective Robot Behavior Shichao Ou and Rod Grupen Laboratory for Perceptual Robotics Computer Science Department University of Massachusetts Amherst {chao,grupen}@cs.umass.edu Abstract This

More information

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering Lecture Details Instructor Course Objectives Tuesday and Thursday, 4:00 pm to 5:15 pm Information Technology and Engineering

More information

Managerial Decision Making

Managerial Decision Making Course Business Managerial Decision Making Session 4 Conditional Probability & Bayesian Updating Surveys in the future... attempt to participate is the important thing Work-load goals Average 6-7 hours,

More information

Knowledge Transfer in Deep Convolutional Neural Nets

Knowledge Transfer in Deep Convolutional Neural Nets Knowledge Transfer in Deep Convolutional Neural Nets Steven Gutstein, Olac Fuentes and Eric Freudenthal Computer Science Department University of Texas at El Paso El Paso, Texas, 79968, U.S.A. Abstract

More information

UDL AND LANGUAGE ARTS LESSON OVERVIEW

UDL AND LANGUAGE ARTS LESSON OVERVIEW UDL AND LANGUAGE ARTS LESSON OVERVIEW Title: Reading Comprehension Author: Carol Sue Englert Subject: Language Arts Grade Level 3 rd grade Duration 60 minutes Unit Description Focusing on the students

More information

Simple Random Sample (SRS) & Voluntary Response Sample: Examples: A Voluntary Response Sample: Examples: Systematic Sample Best Used When

Simple Random Sample (SRS) & Voluntary Response Sample: Examples: A Voluntary Response Sample: Examples: Systematic Sample Best Used When Simple Random Sample (SRS) & Voluntary Response Sample: In statistics, a simple random sample is a group of people who have been chosen at random from the general population. A simple random sample is

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

Task Completion Transfer Learning for Reward Inference

Task Completion Transfer Learning for Reward Inference Machine Learning for Interactive Systems: Papers from the AAAI-14 Workshop Task Completion Transfer Learning for Reward Inference Layla El Asri 1,2, Romain Laroche 1, Olivier Pietquin 3 1 Orange Labs,

More information

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One

More information

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur) Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur) 1 Interviews, diary studies Start stats Thursday: Ethics/IRB Tuesday: More stats New homework is available

More information

Sight Word Assessment

Sight Word Assessment Make, Take & Teach Sight Word Assessment Assessment and Progress Monitoring for the Dolch 220 Sight Words What are sight words? Sight words are words that are used frequently in reading and writing. Because

More information

An Introduction to Simio for Beginners

An Introduction to Simio for Beginners An Introduction to Simio for Beginners C. Dennis Pegden, Ph.D. This white paper is intended to introduce Simio to a user new to simulation. It is intended for the manufacturing engineer, hospital quality

More information

K5 Math Practice. Free Pilot Proposal Jan -Jun Boost Confidence Increase Scores Get Ahead. Studypad, Inc.

K5 Math Practice. Free Pilot Proposal Jan -Jun Boost Confidence Increase Scores Get Ahead. Studypad, Inc. K5 Math Practice Boost Confidence Increase Scores Get Ahead Free Pilot Proposal Jan -Jun 2017 Studypad, Inc. 100 W El Camino Real, Ste 72 Mountain View, CA 94040 Table of Contents I. Splash Math Pilot

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

Discriminative Learning of Beam-Search Heuristics for Planning

Discriminative Learning of Beam-Search Heuristics for Planning Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University

More information

Cognitive Thinking Style Sample Report

Cognitive Thinking Style Sample Report Cognitive Thinking Style Sample Report Goldisc Limited Authorised Agent for IML, PeopleKeys & StudentKeys DISC Profiles Online Reports Training Courses Consultations sales@goldisc.co.uk Telephone: +44

More information

CSL465/603 - Machine Learning

CSL465/603 - Machine Learning CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am

More information

Foothill College Summer 2016

Foothill College Summer 2016 Foothill College Summer 2016 Intermediate Algebra Math 105.04W CRN# 10135 5.0 units Instructor: Yvette Butterworth Text: None; Beoga.net material used Hours: Online Except Final Thurs, 8/4 3:30pm Phone:

More information

arxiv: v1 [cs.lg] 15 Jun 2015

arxiv: v1 [cs.lg] 15 Jun 2015 Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and

More information

P-4: Differentiate your plans to fit your students

P-4: Differentiate your plans to fit your students Putting It All Together: Middle School Examples 7 th Grade Math 7 th Grade Science SAM REHEARD, DC 99 7th Grade Math DIFFERENTATION AROUND THE WORLD My first teaching experience was actually not as a Teach

More information

Machine Learning and Development Policy

Machine Learning and Development Policy Machine Learning and Development Policy Sendhil Mullainathan (joint papers with Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, Ziad Obermeyer) Magic? Hard not to be wowed But what makes

More information

Using focal point learning to improve human machine tacit coordination

Using focal point learning to improve human machine tacit coordination DOI 10.1007/s10458-010-9126-5 Using focal point learning to improve human machine tacit coordination InonZuckerman SaritKraus Jeffrey S. Rosenschein The Author(s) 2010 Abstract We consider an automated

More information

How long did... Who did... Where was... When did... How did... Which did...

How long did... Who did... Where was... When did... How did... Which did... (Past Tense) Who did... Where was... How long did... When did... How did... 1 2 How were... What did... Which did... What time did... Where did... What were... Where were... Why did... Who was... How many

More information

Evolutive Neural Net Fuzzy Filtering: Basic Description

Evolutive Neural Net Fuzzy Filtering: Basic Description Journal of Intelligent Learning Systems and Applications, 2010, 2: 12-18 doi:10.4236/jilsa.2010.21002 Published Online February 2010 (http://www.scirp.org/journal/jilsa) Evolutive Neural Net Fuzzy Filtering:

More information

Learning to Schedule Straight-Line Code

Learning to Schedule Straight-Line Code Learning to Schedule Straight-Line Code Eliot Moss, Paul Utgoff, John Cavazos Doina Precup, Darko Stefanović Dept. of Comp. Sci., Univ. of Mass. Amherst, MA 01003 Carla Brodley, David Scheeff Sch. of Elec.

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

On the Combined Behavior of Autonomous Resource Management Agents

On the Combined Behavior of Autonomous Resource Management Agents On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science

More information

How People Learn Physics

How People Learn Physics How People Learn Physics Edward F. (Joe) Redish Dept. Of Physics University Of Maryland AAPM, Houston TX, Work supported in part by NSF grants DUE #04-4-0113 and #05-2-4987 Teaching complex subjects 2

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

Navigating the PhD Options in CMS

Navigating the PhD Options in CMS Navigating the PhD Options in CMS This document gives an overview of the typical student path through the four Ph.D. programs in the CMS department ACM, CDS, CS, and CMS. Note that it is not a replacement

More information

Renaissance Learning 32 Harbour Exchange Square London, E14 9GE +44 (0)

Renaissance Learning 32 Harbour Exchange Square London, E14 9GE +44 (0) Maths Pretest Instructions It is extremely important that you follow standard testing procedures when you administer the STAR Maths test to your students. Before you begin testing, please check the following:

More information

Multi-genre Writing Assignment

Multi-genre Writing Assignment Multi-genre Writing Assignment for Peter and the Starcatchers Context: The following is an outline for the culminating project for the unit on Peter and the Starcatchers. This is a multi-genre project.

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Probability and Statistics Curriculum Pacing Guide

Probability and Statistics Curriculum Pacing Guide Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods

More information

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer. Tip Sheet I m going to show you how to deal with ten of the most typical aspects of English grammar that are tested on the CAE Use of English paper, part 4. Of course, there are many other grammar points

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Truth Inference in Crowdsourcing: Is the Problem Solved?

Truth Inference in Crowdsourcing: Is the Problem Solved? Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer

More information

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study Purdue Data Summit 2017 Communication of Big Data Analytics New SAT Predictive Validity Case Study Paul M. Johnson, Ed.D. Associate Vice President for Enrollment Management, Research & Enrollment Information

More information

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Neural Network GUI Tested on Text-To-Phoneme Mapping A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis

More information

You re not a princess... But you can still rule the world.

You re not a princess... But you can still rule the world. 5801 Fegenbuh Lane Louiville, Kentucky 40228 Non-Profit Org. U.S. Potage PAID Lebanon Jct., Ky. Permit No. 738 Spring 2014 Return Service Requeted You re not a prince... But you can till rule the world.

More information

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

More information

CS 446: Machine Learning

CS 446: Machine Learning CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt

More information

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits. DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE Sample 2-Year Academic Plan DRAFT Junior Year Summer (Bridge Quarter) Fall Winter Spring MMDP/GAME 124 GAME 310 GAME 318 GAME 330 Introduction to Maya

More information

PREPARATION GUIDE FOR LEVEL 1 REGULATORY EXAMINATIONS Applicants and/or Key Individuals in Category III (RE4)

PREPARATION GUIDE FOR LEVEL 1 REGULATORY EXAMINATIONS Applicants and/or Key Individuals in Category III (RE4) PREPARATION GUIDE FOR LEVEL 1 REGULATORY EXAMINATION Applicants and/or ey Individuals in Category III (RE4) Table of Contents 1. DICLAIMER... 2 2. BACGROUND TO THE REGULATORY EXAM... 2 3. FORMAT OF THE

More information

arxiv: v2 [cs.ro] 3 Mar 2017

arxiv: v2 [cs.ro] 3 Mar 2017 Learning Feedback Terms for Reactive Planning and Control Akshara Rai 2,3,, Giovanni Sutanto 1,2,, Stefan Schaal 1,2 and Franziska Meier 1,2 arxiv:1610.03557v2 [cs.ro] 3 Mar 2017 Abstract With the advancement

More information

CS 100: Principles of Computing

CS 100: Principles of Computing CS 100: Principles of Computing Kevin Molloy August 29, 2017 1 Basic Course Information 1.1 Prerequisites: None 1.2 General Education Fulfills Mason Core requirement in Information Technology (ALL). 1.3

More information

ICTCM 28th International Conference on Technology in Collegiate Mathematics

ICTCM 28th International Conference on Technology in Collegiate Mathematics DEVELOPING DIGITAL LITERACY IN THE CALCULUS SEQUENCE Dr. Jeremy Brazas Georgia State University Department of Mathematics and Statistics 30 Pryor Street Atlanta, GA 30303 jbrazas@gsu.edu Dr. Todd Abel

More information

Outline for Session III

Outline for Session III Outline for Session III Before you begin be sure to have the following materials Extra JM cards Extra blank break-down sheets Extra proposal sheets Proposal reports Attendance record Be at the meeting

More information

Running Head: STUDENT CENTRIC INTEGRATED TECHNOLOGY

Running Head: STUDENT CENTRIC INTEGRATED TECHNOLOGY SCIT Model 1 Running Head: STUDENT CENTRIC INTEGRATED TECHNOLOGY Instructional Design Based on Student Centric Integrated Technology Model Robert Newbury, MS December, 2008 SCIT Model 2 Abstract The ADDIE

More information

Planning for Preassessment. Kathy Paul Johnston CSD Johnston, Iowa

Planning for Preassessment. Kathy Paul Johnston CSD Johnston, Iowa Planning for Preassessment Kathy Paul Johnston CSD Johnston, Iowa Why Plan? Establishes the starting point for learning Students can t learn what they already know Match instructional strategies to individual

More information

Laboratorio di Intelligenza Artificiale e Robotica

Laboratorio di Intelligenza Artificiale e Robotica Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning

More information

This map-tastic middle-grade story from Andrew Clements gives the phrase uncharted territory a whole new meaning!

This map-tastic middle-grade story from Andrew Clements gives the phrase uncharted territory a whole new meaning! A Curriculum Guide to The Map Trap By Andrew Clements About the Book This map-tastic middle-grade story from Andrew Clements gives the phrase uncharted territory a whole new meaning! Alton Barnes loves

More information