Sequential decision making under uncertainty

Similar documents
Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Lecture 10: Reinforcement Learning

Axiom 2013 Team Description Paper

High-level Reinforcement Learning in Strategy Games

Reinforcement Learning by Comparing Immediate Reward

Regret-based Reward Elicitation for Markov Decision Processes

Speeding Up Reinforcement Learning with Behavior Transfer

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Lecture 6: Applications

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Learning Methods for Fuzzy Systems

AMULTIAGENT system [1] can be defined as a group of

Learning Prospective Robot Behavior

TD(λ) and Q-Learning Based Ludo Players

Chapter 2. Intelligent Agents. Outline. Agents and environments. Rationality. PEAS (Performance measure, Environment, Actuators, Sensors)

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

TOKEN-BASED APPROACH FOR SCALABLE TEAM COORDINATION. by Yang Xu PhD of Information Sciences

Improving Action Selection in MDP s via Knowledge Transfer

Agents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators

Agent-Based Software Engineering

FF+FPG: Guiding a Policy-Gradient Planner

Xinyu Tang. Education. Research Interests. Honors and Awards. Professional Experience

An investigation of imitation learning algorithms for structured prediction

An OO Framework for building Intelligence and Learning properties in Software Agents

Intelligent Agents. Chapter 2. Chapter 2 1

A Case-Based Approach To Imitation Learning in Robotic Agents

Learning Semantic Maps Through Dialog for a Voice-Commandable Wheelchair

Seminar - Organic Computing

Task Completion Transfer Learning for Reward Inference

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

Automatic Discretization of Actions and States in Monte-Carlo Tree Search

Laboratorio di Intelligenza Artificiale e Robotica

Georgetown University at TREC 2017 Dynamic Domain Track

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Learning and Transferring Relational Instance-Based Policies

Artificial Neural Networks written examination

Laboratorio di Intelligenza Artificiale e Robotica

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Lecture 1: Machine Learning Basics

Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics

Applying Fuzzy Rule-Based System on FMEA to Assess the Risks on Project-Based Software Engineering Education

Probabilistic Latent Semantic Analysis

Planning with External Events

Lecture 1: Basic Concepts of Machine Learning

Cooperative Game Theoretic Models for Decision-Making in Contexts of Library Cooperation 1

Evolutive Neural Net Fuzzy Filtering: Basic Description

Task Completion Transfer Learning for Reward Inference

A Reinforcement Learning Variant for Control Scheduling

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Action Models and their Induction

DOCTOR OF PHILOSOPHY HANDBOOK

Discriminative Learning of Beam-Search Heuristics for Planning

Quantitative Evaluation of an Intuitive Teaching Method for Industrial Robot Using a Force / Moment Direction Sensor

Reducing Features to Improve Bug Prediction

An Online Handwriting Recognition System For Turkish

AC : DESIGNING AN UNDERGRADUATE ROBOTICS ENGINEERING CURRICULUM: UNIFIED ROBOTICS I AND II

AI Agent for Ice Hockey Atari 2600

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

Numerical Recipes in Fortran- Press et al (1992) Recursive Methods in Economic Dynamics - Stokey and Lucas (1989)

Probability and Game Theory Course Syllabus

LEARNING TO PLAY IN A DAY: FASTER DEEP REIN-

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

On the Combined Behavior of Autonomous Resource Management Agents

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

(Sub)Gradient Descent

An Introduction to Simulation Optimization

Data Fusion Models in WSNs: Comparison and Analysis

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Motivation to e-learn within organizational settings: What is it and how could it be measured?

University of Groningen. Systemen, planning, netwerken Bosman, Aart

CSL465/603 - Machine Learning

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Commanding Officer Decision Superiority: The Role of Technology and the Decision Maker

Introduction to Simulation

Navigating the PhD Options in CMS

Learning to Schedule Straight-Line Code

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

2017 Florence, Italty Conference Abstract

SAM - Sensors, Actuators and Microcontrollers in Mobile Robots

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Designing Autonomous Robot Systems - Evaluation of the R3-COP Decision Support System Approach

Speech Emotion Recognition Using Support Vector Machine

arxiv: v1 [cs.lg] 8 Mar 2017

Softprop: Softmax Neural Network Backpropagation Learning

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Multisensor Data Fusion: From Algorithms And Architectural Design To Applications (Devices, Circuits, And Systems)

Learning Cases to Resolve Conflicts and Improve Group Behavior

MAE Flight Simulation for Aircraft Safety

Curriculum Vitae of Chiang-Ju Chien

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

Using focal point learning to improve human machine tacit coordination

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)

The Incentives to Enhance Teachers Teaching Profession: An Empirical Study in Hong Kong Primary Schools

A Case Study: News Classification Based on Term Frequency

A Comparison of Standard and Interval Association Rules

Adaptive Generation in Dialogue Systems Using Dynamic User Modeling

Transcription:

Sequential decision making under uncertainty Matthijs Spaan Francisco S. Melo Institute for Systems and Robotics Instituto Superior Técnico Lisbon, Portugal Reading group meeting, January 4, 2007 1/20

Introduction This meeting: Overview of the field Motivation Assumptions Models Methods What topics shall we address? Fix a schedule. 2/20

Motivation Major goal of Artificial Intelligence: build intelligent agents. Russell and Norvig (2003): an agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators. Problem: how to act? Example: a robot performing an assigned task. 3/20

Applications Reinforcement learning applications: Aibo gait optimization (Kohl and Stone, 2004a,b; Saggar et al., 2006) Helicopter control (Bagnell and Schneider, 2001; Ng et al., 2004) Airhockey (Bentivegna et al., 2002) More on http://neuromancer.eecs.umich.edu/cgi-bin/twiki/view/main/ 4/20

Sequential decision making under uncertainty Assumptions: Sequential decisions: problems are formulated as a sequence of independent decisions; Markovian environment: the state at time t depends only on the events at time t 1; Evaluative feedback: use of a reinforcement signal as performance measure (reinforcement learning); 5/20

Sequential decision making under uncertainty (1) Possible variations: Type of uncertainty. Full vs. partial state observability. Single vs. multiple decision-makers. Model-based vs. model-free methods. Finite vs. infinite state space. Discrete vs. continuous time. Finite vs. infinite horizon. 6/20

Sequential decision making under uncertainty (2) Models MDPs, POMDPs, etc. Methods DP, TD, etc. Applications 7/20

Basic model: Markov chains The basic model of Markov chains describes (first order) discrete-time dynamic systems. X t = j 1 P(i,j 1 ) P(i,j 2 ) X t 1 = i X t = j 2 X t+1 = k P(i,j 3 ) X t = j 3 8/20

Adding control In controlled Markov chains, the transition probabilities depend on a control parameter a. P a1 (i,j) X t 1 = i P a2 (i,j) P a3 (i,j) X t = j 9/20

Markov decision processes A Markov decision process (MDP) is a controlled Markov chain endowed with a performance criterion (Puterman, 1994; Bertsekas, 2000). The decision-maker receives a numerical reward R t for each time instant t; The decision-maker must optimize some long-run optimality criterion, e.g., J av = lim T 1 T E [ T ] [ ] R t ; J disc = E γ t R t. t=1 t=1 10/20

Considering partial observability A partially observable MDP (POMDP) is an MDP where the decision maker is not able to access all information relevant to the decision-making process (Kaelbling et al., 1998). The decision-maker receives an observation Z t for each time instant t; The observation depends on the state of the underlying Markov chain; 11/20

Considering partial observability (1) Control A t 1 A t State X t 1 X t X t+1 Sensor Z t 1 Z t 12/20

Multiple decision-makers Stochastic games (aka Markov games) provide a multi-agent generalization of MDPs (Shapley, 1953); In stochastic games, the control parameter depends on the choice of several independent decision-makers; In stochastic games, each decision-maker (k) can receive a different reward R k t at each time instant t. 13/20

Multiple decision-makers (1) In stochastic games, as in MDPs, Each decision-maker (k) must optimize its own long-run optimality criterion, e.g., J k av = lim T 1 T E [ T t=1 R k t ] ; J k disc = E [ ] γ t Rt k ; Partial state observability can be considered, leading to the framework of partially observable stochastic games (POSGs). t=1 14/20

Multiagent models Fully observable: Multiagent MDPs (Boutilier, 1996). Partially observable: Partially observable stochastic games (Hansen et al., 2004). Decentralized POMDPs (Bernstein et al., 2002). Interactive POMDPs (Gmytrasiewicz and Doshi, 2005). Each agent only observes its own observation. 15/20

Model based Solution methods: MDPs Basic: dynamic programming (Bellman, 1957), value iteration, policy iteration. Advanced: prioritized sweeping, function approximators. Model free, reinforcement learning (Sutton and Barto, 1998) Basic: Q-learning, TD(λ), SARSA, actor-critic. Advanced: generalization in infinite state spaces, exploration/exploitation issues. 16/20

Techniques for partially observable environments Model based (POMDP) Exact methods (Monahan, 1982; Cheng, 1988; Cassandra et al., 1994; Zhang and Liu, 1996) Heuristic methods: based on MDP solution. Approximate methods: gradient descent, policy search, point-based techniques. Other topics Predictive State Representations (Littman et al., 2002). Reinforcement learning in POMDPs, PSRs. 17/20

Multiagent methods Model based: Hansen et al. (2004) s dynamic programming. JESP (Nair et al., 2003). Bayesian game approximation (Emery-Montemerlo et al., 2004). Model free: Minimax-Q (Littman, 1994) FriendFoe-Q (Littman, 2001) Nash-Q, multi-agent DYNA-Q, correlated-q. Learning coordination. 18/20

Reading group Questions to be answered: What topics shall we cover? When shall we meet? How often? Schedule, volunteers? 19/20

References J. A. Bagnell and J. G. Schneider. Autonomous helicopter control using reinforcement learning policy search methods. In Proceedings of the 2001 IEEE International Conference on Robotics and Automation, pages 1615 1620, 2001. R. Bellman. Dynamic programming. Princeton University Press, 1957. D. C. Bentivegna, A. Ude, C. G. Atkeson, and G. Cheng. Humanoid robot learning and game playing using PC-based vision. In Proceedings of the 2002 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 02), pages 2449 2454, October 2002. D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4):819 840, 2002. D. P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, Belmont, MA, 2nd edition, 2000. C. Boutilier. Planning, learning and coordination in multiagent decision processes. In Theoretical Aspects of Rationality and Knowledge, 1996. A. R. Cassandra, L. P. Kaelbling, and M. L. Littman. Acting optimally in partially observable stochastic domains. In Proc. of the National Conference on Artificial Intelligence, 1994. H. T. Cheng. Algorithms for partially observable Markov decision processes. PhD thesis, University of British Columbia, 1988. R. Emery-Montemerlo, G. Gordon, J. Schneider, and S. Thrun. Approximate solutions for partially observable stochastic games with common payoffs. In Proc. of Int. Joint Conference on Autonomous Agents and Multi Agent Systems, 2004. P. J. Gmytrasiewicz and P. Doshi. A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research, 24:49 79, 2005. E. A. Hansen, D. Bernstein, and S. Zilberstein. Dynamic programming for partially observable stochastic games. In Proc. of the National Conference on Artificial Intelligence, 2004. L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101:99 134, 1998. N. Kohl and P. Stone. Machine learning for fast quadrupedal locomotion. In Proceedings of the 19th National Conference on Artificial Intelligence (AAAI 04), pages 611 616, July 2004a. N. Kohl and P. Stone. Policy gradient reinforcement learning for fast quadrupedal locomotion. In Proceedings of the 2004 IEEE International Conference on Robotics and Automation (ICRA 04), pages 2619 2624, May 2004b. M. L. Littman. Friend-or-foe q-learning in general-sum games. In International Conference on Machine Learning, 2001. M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In International Conference on Machine Learning, 1994. M. L. Littman, R. S. Sutton, and S. Singh. Predictive representations of state. In Advances in Neural Information Processing Systems 14. MIT Press, 2002. G. E. Monahan. A survey of partially observable Markov decision processes: theory, models and algorithms. Management Science, 28(1), Jan. 1982. R. Nair, M. Tambe, M. Yokoo, D. Pynadath, and S. Marsella. Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings. In Proc. Int. Joint Conf. on Artificial Intelligence, 2003. A. Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, and E. Liang. Inverted autonomous helicopter flight via reinforcement learning. In Proceedings of the 2004 International Symposium on Experimental Robotics (ISER 04), 2004. M. L. Puterman. Markov Decision Processes Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, 1994. S. J. Russell and P. Norvig. Artificial Intelligence: a modern approach. Prentice Hall, 2nd edition, 2003. M. Saggar, T. D Silva, N. Kohl, and P. Stone. Autonomous learning of stable quadruped locomotion. In Proceedings of the 2006 International RoboCup Symposium (to appear), 2006. L. Shapley. Stochastic games. Proceedings of the National Academy of Sciences, 39:1095 1100, 1953. R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998. N. L. Zhang and W. Liu. Planning in stochastic domains: problem characteristics and approximations. Technical Report HKUST-CS96-31, Department of Computer Science, The Hong Kong University of Science and Technology, 1996. 20/20