Robot Autonomy Inverse Reinforcement Learning
|
|
- Wilfrid Ryan
- 6 years ago
- Views:
Transcription
1 Robot Autonomy Inverse Reinforcement Learning Katharina Muelling NSH 4521
2 Last Lecture Autonomous learning from scratch is hard Real world exploration Reward function What can we do: Effective representation Use prior knowledge Imitation Learning for creating good starting points (prior knowledge) Dynamical System Motor Primitives to represent motor skills I learned to ride with RL Pic: researchers.lille.i nria.fr/~munos/
3 Effective Representation of Motor Skills Dynamic System Motor Primitives Arbitrarily shaped smooth movements Simple to adapt Stable and robust Linear in parameters w: Easy to learn through imitation and reinforcement learning Shape! Not goal or intention! a
4 Dynamical System Motor Primitives What do we gain from this representation? Motor policy representation that performs an automatic mapping of states to actions over time π w g θ t θሶ t t T = a t+1 Mapping depends on shape parameters w
5 Concept Imitation Learning: Imitation Learning Given a set of labeled training data (demonstrations), learn a function that maps the (observed) state to an action. Teacher Record Mapping Recording Embodiment Mapping Learner Problems: Correspondence Problem Need to know what to imitate
6 Today s Lecture Case study: Learning motor skills in ball in a cup Inverse Reinforcement Learning Examples of Inverse Reinforcement Learning Case study: Learning strategies Shortcomings of Inverse Reinforcement Learning
7 How to Learn from Demonstrations Control Policy p Behavioral Cloning Expert Demonstration s i, a i, r i i=1:t Learner π(a s)
8 How to Learn from Demonstrations Reward R Reinforcement Learning, Optimal Control Control Policy p Dynamical Model T Behavioral Cloning Inverse Reinforcement Learning Expert Demonstration s i, a i, r i i=1:t Learner R
9 Learning from Demonstration Case Study: Learning motor skills from demonstration
10 Learning Hitting Motions in Table Tennis Represent motor policy as DMP Reduces the learning problem to finding the right trajectory weights Initiate good policy through demonstration Learned through interactions with the world which DMP to associate with state
11 ሷ ሷ Case Study: Ball in a Cup Goal: Get the ball into the cup 1) Represent motor policy as dynamical system motor primitive θ~π w θ t s t 2) Learn initial parameter w from demonstration J. Kober and J. Peters, Policy Search for Motor Primitives in Robotics, NIPS, 2008 Mind the number of local models 3) Perturb parameters to change acceleration pattern by sampling from normal distribution w = w + E σ T t=1 ε t Q π E σt t=1 Q π
12 Case Study: Ball in a Cup Reward J. Kober and J. Peters, Policy Search for Motor Primitives in Robotics, NIPS, 2008
13 Solving a MDP Reward R Reinforcement Learning, Optimal Control Control Policy p Dynamical Model T Behavioral Cloning Inverse Reinforcement Learning Expert Demonstration Katharina Muelling (NREC, Carnegie Mellon University) 13
14 Imitation Learning Demonstrated Behavior Novel Scene Ratliff et al.: Maximum Margin Planning, 2006
15 Imitation Learning Demonstrated Behavior Learned Behavior Ratliff et al.: Maximum Margin Planning, 2006
16 Inverse Reinforcement Learning What is this robot up to?
17 Inverse Reinforcement Learning What is this robot up to?
18 Inverse Reinforcement Learning What is this robot up to?
19 Inverse Reinforcement Learning What is this robot up to?
20 Imitation Learning Demonstrated Behavior Novel Scene Ratliff et al.: Maximum Margin Planning, 2006
21 Imitation Learning Demonstrated Behavior Learned Behavior Ratliff et al.: Maximum Margin Planning, 2006
22 Inverse Reinforcement Learning Learning Input Features Behavior
23 Inverse Reinforcement Learning Learning Input Features Behavior
24 Inverse Reinforcement Learning Learning RL Input Features Reward Function Behavior
25 Inverse Reinforcement Learning Input Features Ratliff et al.: Maximum Margin Planning, 2006 Reward Function Behavior
26 Inverse Reinforcement Learning Reinforcement Learning goal: Given an MDP, maximize the expected return p = argmax J(p ) p J π = E γ t R s t, a t π t=0 Hand Designed Environment Observable Reward Reinforcement Learning Behavior K. Muelling (National Robotics Engineering Center, CMU) 26
27 Inverse Reinforcement Learning Reinforcement Learning goal: Given an MDP, maximize the expected return p = argmax J(p ) p J π = E γ t R s t, a t π t=0 Problems: Reward function defines the desired behavior Can be hard to define a good reward function that guides the learning process, especially when human behavior is considered K. Muelling (National Robotics Engineering Center, CMU) 27
28 Inverse Reinforcement Learning Idea: If you really want to imitate, you need to find the reward function rather than the policy! A Markov Decision Process without a reward function is denoted by MDP\R Environment Reward Reinforcement Learning Behavior K. Muelling (National Robotics Engineering Center, CMU) 28
29 IRL: Basic Idea N { } n=1 Given a MDP\R and a set of demonstrations D = t n from an expert, find the reward function R = σ m i=1 w i f i (s, a) that satisfies For all policies π: J(π E ) J(π) Basic assumption: Reward function can be written as a linear combination of known reward features R s, a = m i=1 w i f i s, a = w T f(s, a)
30 Inverse Reinforcement Learning Idea: Change reward: higher lower π
31 Inverse Reinforcement Learning Idea: Change reward: higher lower π
32 IRL: Basic Idea N { } n=1 Given a MDP\R and a set of demonstrations D = t n from an expert, find the reward function R = σ m i=1 w i f i (s, a) that satisfies For all policies π: J(π E ) J(π) Basic assumption: Reward function can be written as a linear combination of known reward features R s, a = m i=1 w i f i s, a = w T f(s, a)
33 IRL: Basic Idea Basic assumption: Reward function can be written as a linear combination of known reward features R s, a = Rewrite the expected return as: J π = E γ t R s t, a t π t=0 = E w T f s t, a t π t=0 m i=1 w i f i s, a = w T f(s, a)
34 IRL: Basic Idea Basic assumption: Reward function can be written as a linear combination of known reward features R s, a = Based on this assumption we can rewrite the expected return as: J π = E γ t R s t, a t π t=0 m i=1 w i f i s, a = w T f(s, a) = E w T f s t, a t π t=0 = w T E f s t, a t π t=0 Feature expectation/count m(p)
35 IRL: Basic Idea J(π) = w T E γ t f s t, a t π J(π E ) J(π) t=0 Find a weight vector w, s.t.: m(p) w T μ π E w T μ π π Feature expectation/count Can be estimated from sample trajectories Problems: We do not have the policy p E, we only have some observed trajectories Reward function ambiguity: A large class of reward functions may lead to the same optimal policy Assumes we can enumerate all policies
36 IRL: Basic Idea Reward function ambiguity: need additional constraints! Much of the literature in IRL focuses on solving this problem! How did Abbeel and Ng address the problem? m(pe) Maximize the difference between expert and other policies m(p)
37 Apprenticeship Learning via IRL Assumptions: We can observe the state-action pairs Agent is goal driven and follows some optimal policy Access to a reinforcement learning solver Solver returns optimal policy Ng and Abbeel: Apprenticeship Learning via Inverse Reinforcement Learning, 04
38 Apprenticeship Learning via IRL Given a set of m demonstrations, compute the expected feature counts Goal: m μ E = 1 m i=1 t=0 γ t f s t i Find a policy π whose performance is close to that of the expert demonstrator E t=0 γ t R(s t )หπ E E = w T μ E w T μ π w 2 μ E μ π 2 1ε = ε t=0 γ t R(s t ) π Ng and Abbeel: Apprenticeship Learning via Inverse Reinforcement Learning, 04
39 Apprenticeship Learning via IRL Initialize: Random w, and compute μ 0 Algorithm: 1. Compute t = max min w T μ E μ i, and w: w 2 1 j w i being the w that realizes this maximum 2. If t i ε: terminate 3. Compute π i+1 using the RL solver and R = wf 4. Compute new μ i+1 m(pe) m(p) Ng and Abbeel: Apprenticeship Learning via Inverse Reinforcement Learning, 04
40 Inverse Reinforcement Learning Route Planning Examples Ratliff et al., 2006 Parking lot navigation Abbeel et al., 2008 Quadruped locomotion Kolter et al. 2008
41 Inverse Reinforcement Learning Pedestrian Prediction Ziebart et al., 2009 Activity Forecasting Kitani et al., 2012
42 Case Study: Table Tennis Can we learn higher level strategies with inverse reinforcement learning?
43 How can we learn a manipulation tasks? Learning Strategies: Learning strategic elements from demonstrations using Inverse Reinforcement Learning Learning Movements: Learning motor skills from demonstration Learning how to select and generalize motor primitives State State s Supervisory System Augmented State s Motion Generation Joint Values Execution Motor Torques u Action Teacher Policy Learning Signal Policy K. Muelling (National Robotics Engineering Center, CMU) 43
44 How can we represent such a strategy? Representing the strategy: Markov Decision Process (S,A,T,R)
45 How can we represent such a strategy? Representing the strategy: Markov Decision Process (S,A,T,R)
46 How can we represent such a strategy? Representing the strategy: Markov Decision Process (S,A,T,R)
47 Finding reward function for table tennis Coming back to the table tennis example: Can we find a reward function from which we can generate a higher level strategy? Problem in the table tennis experiment We do not have a perfect dynamical model We cannot compute all possible policies π Testing three model-free IRL methods Two model-free versions of max-margin IRL P. Abbeel and A. Ng, Apprenticeship learning via inverse reinforcement learning, ICML 2004 Model-free relative entropy IRL Boularias et al., Relative entropy inverse reinfocement learning, AISTATS 2011 K. Muelling (National Robotics Engineering Center, CMU) 47
48 Finding reward function for table tennis Model free Maximum Margin Additional trajectories of non-optimal strategies With max w τ D T J E s t, w J N k s t, w λ w 2 t=1 H J s 1, w = 1 H i=1 Most similar state wf(s i, a i ) Set horizon H=3 horizon -> Corresponds to planning two steps ahead
49 Experimental Setup Need many non-optimal and/or random trajectories: How can we generate them? What and how to record? Pilot studies K. Muelling (National Robotics Engineering Center, CMU) 49
50 Experimental Setup Subject 5 naïve players 2 skilled players 1 permanent opponent (skilled) Experiments 1) 10 min cooperative table tennis 2) Semi competitive game (coop. opponent and comp. subject) 3) Competitive game K. Muelling et al, 2014 K. Muelling (National Robotics Engineering Center, CMU) 50
51 IRL for Table Tennis Reward features that describe the world Table preferences Distance to the edge (δ t ) Distance to the opponent (δ o ) Moving direction of the opponent (v o ) Velocity ball (v b ) Orientation ball (θ y, θ z ) Proximity elbow (δ elbow ) Smash K. Muelling (National Robotics Engineering Center, CMU) 51
52 IRL for Table Tennis What do you think? Which features are important?
53 IRL for Table Tennis What do you think? Which features are important?
54 What did the system learn? Preferences Expert: Forehand are avoided Backhand are preferred Playing ball flat and cross towards backhand area Increase distance between ball and opponent K. Muelling (National Robotics Engineering Center, CMU) 54
55 Main Findings Possible Strategy that distinguish expert and non-expert players s T-2 s T-1 Planning ahead: Expert plans up to two steps ahead! K. Muelling (National Robotics Engineering Center, CMU) 55
56 Evaluation Able to distinguish between Skills of the player on strategic level Different playing styles K. Muelling (National Robotics Engineering Center, CMU) 56
57 Inverse Reinforcement Learning Problems: Need dynamic model Need RL solver or planner Depends on the hand designed features
58 Summary Imitation Learning: Learning from demonstration is a great tool to initiate learning and to make learning on real robots possible. Representing movements with DMPs allow to efficiently learn movements from demonstration and through self improvement. When learning from demonstration keep in mind: What you want to learn. Is it possible to map human demonstration to robot learner? Does it make sense to map human demonstration to the robot? There are different ways to learn from demonstration.
59 Summary Inverse Reinforcement Learning vs Behavioral Cloning Reward function defines the underlying behavior! Can we recover the reward function from demonstrations? Apprenticeship Learning: Can we find a policy that is at least as good as the demonstrated one with IRL? Can we directly learn the policy? Formulated as supervised learning problem: 1) Fix policy class 2) Find suitable ML 3) Learn policy directly from demonstrations
Lecture 10: Reinforcement Learning
Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationReinforcement Learning by Comparing Immediate Reward
Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate
More informationRegret-based Reward Elicitation for Markov Decision Processes
444 REGAN & BOUTILIER UAI 2009 Regret-based Reward Elicitation for Markov Decision Processes Kevin Regan Department of Computer Science University of Toronto Toronto, ON, CANADA kmregan@cs.toronto.edu
More informationContinual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots
Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots Varun Raj Kompella, Marijn Stollenga, Matthew Luciw, Juergen Schmidhuber The Swiss AI Lab IDSIA, USI
More informationarxiv: v2 [cs.ro] 3 Mar 2017
Learning Feedback Terms for Reactive Planning and Control Akshara Rai 2,3,, Giovanni Sutanto 1,2,, Stefan Schaal 1,2 and Franziska Meier 1,2 arxiv:1610.03557v2 [cs.ro] 3 Mar 2017 Abstract With the advancement
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationThe Strong Minimalist Thesis and Bounded Optimality
The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this
More informationAxiom 2013 Team Description Paper
Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association
More informationBMBF Project ROBUKOM: Robust Communication Networks
BMBF Project ROBUKOM: Robust Communication Networks Arie M.C.A. Koster Christoph Helmberg Andreas Bley Martin Grötschel Thomas Bauschert supported by BMBF grant 03MS616A: ROBUKOM Robust Communication Networks,
More informationSpeeding Up Reinforcement Learning with Behavior Transfer
Speeding Up Reinforcement Learning with Behavior Transfer Matthew E. Taylor and Peter Stone Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188 {mtaylor, pstone}@cs.utexas.edu
More informationAMULTIAGENT system [1] can be defined as a group of
156 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, VOL. 38, NO. 2, MARCH 2008 A Comprehensive Survey of Multiagent Reinforcement Learning Lucian Buşoniu, Robert Babuška,
More informationISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM
Proceedings of 28 ISFA 28 International Symposium on Flexible Automation Atlanta, GA, USA June 23-26, 28 ISFA28U_12 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Amit Gil, Helman Stern, Yael Edan, and
More informationLearning Human Utility from Video Demonstrations for Deductive Planning in Robotics
Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics Nishant Shukla, Yunzhong He, Frank Chen, and Song-Chun Zhu Center for Vision, Cognition, Learning, and Autonomy University
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationLearning Prospective Robot Behavior
Learning Prospective Robot Behavior Shichao Ou and Rod Grupen Laboratory for Perceptual Robotics Computer Science Department University of Massachusetts Amherst {chao,grupen}@cs.umass.edu Abstract This
More informationDiscriminative Learning of Beam-Search Heuristics for Planning
Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University
More informationGeorgetown University at TREC 2017 Dynamic Domain Track
Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain
More informationExploration. CS : Deep Reinforcement Learning Sergey Levine
Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?
More informationArtificial Neural Networks written examination
1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14
More informationChallenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley
Challenges in Deep Reinforcement Learning Sergey Levine UC Berkeley Discuss some recent work in deep reinforcement learning Present a few major challenges Show some of our recent work toward tackling
More informationImproving Action Selection in MDP s via Knowledge Transfer
In Proc. 20th National Conference on Artificial Intelligence (AAAI-05), July 9 13, 2005, Pittsburgh, USA. Improving Action Selection in MDP s via Knowledge Transfer Alexander A. Sherstov and Peter Stone
More informationLaboratorio di Intelligenza Artificiale e Robotica
Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning
More informationHigh-level Reinforcement Learning in Strategy Games
High-level Reinforcement Learning in Strategy Games Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu Guy Shani Department of Computer
More informationRobot Learning Simultaneously a Task and How to Interpret Human Instructions
Robot Learning Simultaneously a Task and How to Interpret Human Instructions Jonathan Grizou, Manuel Lopes, Pierre-Yves Oudeyer To cite this version: Jonathan Grizou, Manuel Lopes, Pierre-Yves Oudeyer.
More informationLecture 6: Applications
Lecture 6: Applications Michael L. Littman Rutgers University Department of Computer Science Rutgers Laboratory for Real-Life Reinforcement Learning What is RL? Branch of machine learning concerned with
More informationMath 1313 Section 2.1 Example 2: Given the following Linear Program, Determine the vertices of the feasible set. Subject to:
Math 1313 Section 2.1 Example 2: Given the following Linear Program, Determine the vertices of the feasible set Subject to: Min D 3 = 3x + y 10x + 2y 84 8x + 4y 120 x, y 0 3 Math 1313 Section 2.1 Popper
More informationFF+FPG: Guiding a Policy-Gradient Planner
FF+FPG: Guiding a Policy-Gradient Planner Olivier Buffet LAAS-CNRS University of Toulouse Toulouse, France firstname.lastname@laas.fr Douglas Aberdeen National ICT australia & The Australian National University
More informationIterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages
Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer
More informationAI Agent for Ice Hockey Atari 2600
AI Agent for Ice Hockey Atari 2600 Emman Kabaghe (emmank@stanford.edu) Rajarshi Roy (rroy@stanford.edu) 1 Introduction In the reinforcement learning (RL) problem an agent autonomously learns a behavior
More informationTask Completion Transfer Learning for Reward Inference
Machine Learning for Interactive Systems: Papers from the AAAI-14 Workshop Task Completion Transfer Learning for Reward Inference Layla El Asri 1,2, Romain Laroche 1, Olivier Pietquin 3 1 Orange Labs,
More informationAn OO Framework for building Intelligence and Learning properties in Software Agents
An OO Framework for building Intelligence and Learning properties in Software Agents José A. R. P. Sardinha, Ruy L. Milidiú, Carlos J. P. Lucena, Patrick Paranhos Abstract Software agents are defined as
More informationTransferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task
Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task Stephen James Dyson Robotics Lab Imperial College London slj12@ic.ac.uk Andrew J. Davison Dyson Robotics
More informationReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology
ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology Tiancheng Zhao CMU-LTI-16-006 Language Technologies Institute School of Computer Science Carnegie Mellon
More informationRobot manipulations and development of spatial imagery
Robot manipulations and development of spatial imagery Author: Igor M. Verner, Technion Israel Institute of Technology, Haifa, 32000, ISRAEL ttrigor@tx.technion.ac.il Abstract This paper considers spatial
More informationTask Completion Transfer Learning for Reward Inference
Task Completion Transfer Learning for Reward Inference Layla El Asri 1,2, Romain Laroche 1, Olivier Pietquin 3 1 Orange Labs, Issy-les-Moulineaux, France 2 UMI 2958 (CNRS - GeorgiaTech), France 3 University
More informationBODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY
BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY Sergey Levine Principal Adviser: Vladlen Koltun Secondary Adviser:
More informationTruth Inference in Crowdsourcing: Is the Problem Solved?
Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer
More informationCSL465/603 - Machine Learning
CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am
More informationAn investigation of imitation learning algorithms for structured prediction
JMLR: Workshop and Conference Proceedings 24:143 153, 2012 10th European Workshop on Reinforcement Learning An investigation of imitation learning algorithms for structured prediction Andreas Vlachos Computer
More informationSeminar - Organic Computing
Seminar - Organic Computing Self-Organisation of OC-Systems Markus Franke 25.01.2006 Typeset by FoilTEX Timetable 1. Overview 2. Characteristics of SO-Systems 3. Concern with Nature 4. Design-Concepts
More informationA Case-Based Approach To Imitation Learning in Robotic Agents
A Case-Based Approach To Imitation Learning in Robotic Agents Tesca Fitzgerald, Ashok Goel School of Interactive Computing Georgia Institute of Technology, Atlanta, GA 30332, USA {tesca.fitzgerald,goel}@cc.gatech.edu
More informationTransfer Learning Action Models by Measuring the Similarity of Different Domains
Transfer Learning Action Models by Measuring the Similarity of Different Domains Hankui Zhuo 1, Qiang Yang 2, and Lei Li 1 1 Software Research Institute, Sun Yat-sen University, Guangzhou, China. zhuohank@gmail.com,lnslilei@mail.sysu.edu.cn
More informationLaboratorio di Intelligenza Artificiale e Robotica
Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning
More informationCentralized Assignment of Students to Majors: Evidence from the University of Costa Rica. Job Market Paper
Centralized Assignment of Students to Majors: Evidence from the University of Costa Rica Job Market Paper Allan Hernandez-Chanto December 22, 2016 Abstract Many countries use a centralized admissions process
More informationPhonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project
Phonetic- and Speaker-Discriminant Features for Speaker Recognition by Lara Stoll Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California
More informationActive Learning. Yingyu Liang Computer Sciences 760 Fall
Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,
More informationCollege Pricing and Income Inequality
College Pricing and Income Inequality Zhifeng Cai U of Minnesota, Rutgers University, and FRB Minneapolis Jonathan Heathcote FRB Minneapolis NBER Income Distribution, July 20, 2017 The views expressed
More informationSemi-Supervised Face Detection
Semi-Supervised Face Detection Nicu Sebe, Ira Cohen 2, Thomas S. Huang 3, Theo Gevers Faculty of Science, University of Amsterdam, The Netherlands 2 HP Research Labs, USA 3 Beckman Institute, University
More informationCollege Pricing and Income Inequality
College Pricing and Income Inequality Zhifeng Cai U of Minnesota and FRB Minneapolis Jonathan Heathcote FRB Minneapolis OSU, November 15 2016 The views expressed herein are those of the authors and not
More informationComparison of network inference packages and methods for multiple networks inference
Comparison of network inference packages and methods for multiple networks inference Nathalie Villa-Vialaneix http://www.nathalievilla.org nathalie.villa@univ-paris1.fr 1ères Rencontres R - BoRdeaux, 3
More informationCooperative Game Theoretic Models for Decision-Making in Contexts of Library Cooperation 1
Cooperative Game Theoretic Models for Decision-Making in Contexts of Library Cooperation 1 Robert M. Hayes Abstract This article starts, in Section 1, with a brief summary of Cooperative Economic Game
More informationA Reinforcement Learning Variant for Control Scheduling
A Reinforcement Learning Variant for Control Scheduling Aloke Guha Honeywell Sensor and System Development Center 3660 Technology Drive Minneapolis MN 55417 Abstract We present an algorithm based on reinforcement
More informationGenerative models and adversarial training
Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?
More informationDecision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1
Decision Support: Decision Analysis Jožef Stefan International Postgraduate School, Ljubljana Programme: Information and Communication Technologies [ICT3] Course Web Page: http://kt.ijs.si/markobohanec/ds/ds.html
More informationCorrective Feedback and Persistent Learning for Information Extraction
Corrective Feedback and Persistent Learning for Information Extraction Aron Culotta a, Trausti Kristjansson b, Andrew McCallum a, Paul Viola c a Dept. of Computer Science, University of Massachusetts,
More informationarxiv: v1 [cs.lg] 8 Mar 2017
Lerrel Pinto 1 James Davidson 2 Rahul Sukthankar 3 Abhinav Gupta 1 3 arxiv:173.272v1 [cs.lg] 8 Mar 217 Abstract Deep neural networks coupled with fast simulation and improved computation have led to recent
More informationMultivariate k-nearest Neighbor Regression for Time Series data -
Multivariate k-nearest Neighbor Regression for Time Series data - a novel Algorithm for Forecasting UK Electricity Demand ISF 2013, Seoul, Korea Fahad H. Al-Qahtani Dr. Sven F. Crone Management Science,
More informationLesson plan for Maze Game 1: Using vector representations to move through a maze Time for activity: homework for 20 minutes
Lesson plan for Maze Game 1: Using vector representations to move through a maze Time for activity: homework for 20 minutes Learning Goals: Students will be able to: Maneuver through the maze controlling
More informationDevelopment of Multistage Tests based on Teacher Ratings
Development of Multistage Tests based on Teacher Ratings Stéphanie Berger 12, Jeannette Oostlander 1, Angela Verschoor 3, Theo Eggen 23 & Urs Moser 1 1 Institute for Educational Evaluation, 2 Research
More informationLikelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition Seltzer, M.L.; Raj, B.; Stern, R.M. TR2004-088 December 2004 Abstract
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationLearning Methods for Fuzzy Systems
Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8
More informationTeaching a Laboratory Section
Chapter 3 Teaching a Laboratory Section Page I. Cooperative Problem Solving Labs in Operation 57 II. Grading the Labs 75 III. Overview of Teaching a Lab Session 79 IV. Outline for Teaching a Lab Session
More informationENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering
ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering Lecture Details Instructor Course Objectives Tuesday and Thursday, 4:00 pm to 5:15 pm Information Technology and Engineering
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationUniversity of Victoria School of Exercise Science, Physical and Health Education EPHE 245 MOTOR LEARNING. Calendar Description Units: 1.
University of Victoria School of Exercise Science, Physical and Health Education EPHE 245 MOTOR LEARNING Calendar Description Units: 1.5 Hours: 3-2 Neural and cognitive processes underlying human skilled
More informationUniversityy. The content of
WORKING PAPER #31 An Evaluation of Empirical Bayes Estimation of Value Added Teacher Performance Measuress Cassandra M. Guarino, Indianaa Universityy Michelle Maxfield, Michigan State Universityy Mark
More informationLanguage Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus
Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More information3D DIGITAL ANIMATION TECHNIQUES (3DAT)
3D DIGITAL ANIMATION TECHNIQUES (3DAT) COURSE NUMBER: DIG3305C CREDIT HOURS: 3.0 SEMESTER/YEAR: FALL 2017 CLASS LOCATION: OORC, NORMAN (NRG) 0120 CLASS MEETING TIME(S): M 3:00 4:55 / W 4:05 4:55 INSTRUCTOR:
More informationMulti-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.
Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling. Bengt Muthén & Tihomir Asparouhov In van der Linden, W. J., Handbook of Item Response Theory. Volume One. Models, pp. 527-539.
More informationSelf Study Report Computer Science
Computer Science undergraduate students have access to undergraduate teaching, and general computing facilities in three buildings. Two large classrooms are housed in the Davis Centre, which hold about
More informationChapter 2. Intelligent Agents. Outline. Agents and environments. Rationality. PEAS (Performance measure, Environment, Actuators, Sensors)
Intelligent Agents Chapter 2 1 Outline Agents and environments Rationality PEAS (Performance measure, Environment, Actuators, Sensors) Agent types 2 Agents and environments sensors environment percepts
More informationQuickStroke: An Incremental On-line Chinese Handwriting Recognition System
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents
More informationVocational Training Dropouts: The Role of Secondary Jobs
Vocational Training Dropouts: The Role of Secondary Jobs Katja Seidel Insitute of Economics Leuphana University Lueneburg katja.seidel@leuphana.de Nutzerkonferenz Bildung und Beruf: Erwerb und Verwertung
More informationTeam Formation for Generalized Tasks in Expertise Social Networks
IEEE International Conference on Social Computing / IEEE International Conference on Privacy, Security, Risk and Trust Team Formation for Generalized Tasks in Expertise Social Networks Cheng-Te Li Graduate
More informationCOMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR
COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR ROLAND HAUSSER Institut für Deutsche Philologie Ludwig-Maximilians Universität München München, West Germany 1. CHOICE OF A PRIMITIVE OPERATION The
More informationAN EXAMPLE OF THE GOMORY CUTTING PLANE ALGORITHM. max z = 3x 1 + 4x 2. 3x 1 x x x x N 2
AN EXAMPLE OF THE GOMORY CUTTING PLANE ALGORITHM Consider the integer programme subject to max z = 3x 1 + 4x 2 3x 1 x 2 12 3x 1 + 11x 2 66 The first linear programming relaxation is subject to x N 2 max
More informationarxiv: v1 [cs.lg] 3 May 2013
Feature Selection Based on Term Frequency and T-Test for Text Categorization Deqing Wang dqwang@nlsde.buaa.edu.cn Hui Zhang hzhang@nlsde.buaa.edu.cn Rui Liu, Weifeng Lv {liurui,lwf}@nlsde.buaa.edu.cn arxiv:1305.0638v1
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationHARPER ADAMS UNIVERSITY Programme Specification
HARPER ADAMS UNIVERSITY Programme Specification 1 Awarding Institution: Harper Adams University 2 Teaching Institution: Askham Bryan College 3 Course Accredited by: Not Applicable 4 Final Award and Level:
More informationAn empirical study of learning speed in backpropagation
Carnegie Mellon University Research Showcase @ CMU Computer Science Department School of Computer Science 1988 An empirical study of learning speed in backpropagation networks Scott E. Fahlman Carnegie
More informationPurdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study
Purdue Data Summit 2017 Communication of Big Data Analytics New SAT Predictive Validity Case Study Paul M. Johnson, Ed.D. Associate Vice President for Enrollment Management, Research & Enrollment Information
More information(Sub)Gradient Descent
(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include
More informationQuantitative Evaluation of an Intuitive Teaching Method for Industrial Robot Using a Force / Moment Direction Sensor
International Journal of Control, Automation, and Systems Vol. 1, No. 3, September 2003 395 Quantitative Evaluation of an Intuitive Teaching Method for Industrial Robot Using a Force / Moment Direction
More informationSaliency in Human-Computer Interaction *
From: AAA Technical Report FS-96-05. Compilation copyright 1996, AAA (www.aaai.org). All rights reserved. Saliency in Human-Computer nteraction * Polly K. Pook MT A Lab 545 Technology Square Cambridge,
More informationTeachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners
Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners Andrea L. Thomaz and Cynthia Breazeal Abstract While Reinforcement Learning (RL) is not traditionally designed
More informationSoftprop: Softmax Neural Network Backpropagation Learning
Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science
More informationMTH 141 Calculus 1 Syllabus Spring 2017
Instructor: Section/Meets Office Hrs: Textbook: Calculus: Single Variable, by Hughes-Hallet et al, 6th ed., Wiley. Also needed: access code to WileyPlus (included in new books) Calculator: Not required,
More informationDialog-based Language Learning
Dialog-based Language Learning Jason Weston Facebook AI Research, New York. jase@fb.com arxiv:1604.06045v4 [cs.cl] 20 May 2016 Abstract A long-term goal of machine learning research is to build an intelligent
More informationA Comparison of Annealing Techniques for Academic Course Scheduling
A Comparison of Annealing Techniques for Academic Course Scheduling M. A. Saleh Elmohamed 1, Paul Coddington 2, and Geoffrey Fox 1 1 Northeast Parallel Architectures Center Syracuse University, Syracuse,
More informationProbability and Game Theory Course Syllabus
Probability and Game Theory Course Syllabus DATE ACTIVITY CONCEPT Sunday Learn names; introduction to course, introduce the Battle of the Bismarck Sea as a 2-person zero-sum game. Monday Day 1 Pre-test
More informationA simulated annealing and hill-climbing algorithm for the traveling tournament problem
European Journal of Operational Research xxx (2005) xxx xxx Discrete Optimization A simulated annealing and hill-climbing algorithm for the traveling tournament problem A. Lim a, B. Rodrigues b, *, X.
More informationAgents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators
s and environments Percepts Intelligent s? Chapter 2 Actions s include humans, robots, softbots, thermostats, etc. The agent function maps from percept histories to actions: f : P A The agent program runs
More informationDOCTOR OF PHILOSOPHY HANDBOOK
University of Virginia Department of Systems and Information Engineering DOCTOR OF PHILOSOPHY HANDBOOK 1. Program Description 2. Degree Requirements 3. Advisory Committee 4. Plan of Study 5. Comprehensive
More informationAC : DESIGNING AN UNDERGRADUATE ROBOTICS ENGINEERING CURRICULUM: UNIFIED ROBOTICS I AND II
AC 2009-1161: DESIGNING AN UNDERGRADUATE ROBOTICS ENGINEERING CURRICULUM: UNIFIED ROBOTICS I AND II Michael Ciaraldi, Worcester Polytechnic Institute Eben Cobb, Worcester Polytechnic Institute Fred Looft,
More informationSoftware Maintenance
1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories
More informationCarter M. Mast. Participants: Peter Mackenzie-Helnwein, Pedro Arduino, and Greg Miller. 6 th MPM Workshop Albuquerque, New Mexico August 9-10, 2010
Representing Arbitrary Bounding Surfaces in the Material Point Method Carter M. Mast 6 th MPM Workshop Albuquerque, New Mexico August 9-10, 2010 Participants: Peter Mackenzie-Helnwein, Pedro Arduino, and
More information