Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination

Similar documents
Reinforcement Learning by Comparing Immediate Reward

High-level Reinforcement Learning in Strategy Games

Lecture 10: Reinforcement Learning

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Automatic Discretization of Actions and States in Monte-Carlo Tree Search

AMULTIAGENT system [1] can be defined as a group of

Using focal point learning to improve human machine tacit coordination

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Agent-Based Software Engineering

On the Combined Behavior of Autonomous Resource Management Agents

Speeding Up Reinforcement Learning with Behavior Transfer

Laboratorio di Intelligenza Artificiale e Robotica

Seminar - Organic Computing

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

An Investigation into Team-Based Planning

A Case-Based Approach To Imitation Learning in Robotic Agents

An OO Framework for building Intelligence and Learning properties in Software Agents

Evolution of Collective Commitment during Teamwork

Axiom 2013 Team Description Paper

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

TOKEN-BASED APPROACH FOR SCALABLE TEAM COORDINATION. by Yang Xu PhD of Information Sciences

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Go fishing! Responsibility judgments when cooperation breaks down

Teachable Robots: Understanding Human Teaching Behavior to Build More Effective Robot Learners

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Probability and Statistics Curriculum Pacing Guide

Software Maintenance

CS Machine Learning

Emergency Management Games and Test Case Utility:

The dilemma of Saussurean communication

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Introduction to Simulation

Probability and Game Theory Course Syllabus

Georgetown University at TREC 2017 Dynamic Domain Track

Laboratorio di Intelligenza Artificiale e Robotica

Towards Team Formation via Automated Planning

Computers Change the World

Cooperative Game Theoretic Models for Decision-Making in Contexts of Library Cooperation 1

Lecture 1: Machine Learning Basics

DOCTOR OF PHILOSOPHY HANDBOOK

An Introduction to Simulation Optimization

Learning Cases to Resolve Conflicts and Improve Group Behavior

School Executive Standard 7: Micro-political Leadership. Dr. Kimberly Simmons NCEES Coordinator

Major Milestones, Team Activities, and Individual Deliverables

Regret-based Reward Elicitation for Markov Decision Processes

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

IMGD Technical Game Development I: Iterative Development Techniques. by Robert W. Lindeman

AC : DEVELOPMENT OF AN INTRODUCTION TO INFRAS- TRUCTURE COURSE

MGT/MGP/MGB 261: Investment Analysis

Motivation to e-learn within organizational settings: What is it and how could it be measured?

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

Mission and Teamwork Paul Stanley

Math 1313 Section 2.1 Example 2: Given the following Linear Program, Determine the vertices of the feasible set. Subject to:

November 17, 2017 ARIZONA STATE UNIVERSITY. ADDENDUM 3 RFP Digital Integrated Enrollment Support for Students

Improving Action Selection in MDP s via Knowledge Transfer

Lecture 6: Applications

Multiagent Simulation of Learning Environments

Learning and Transferring Relational Instance-Based Policies

How do adults reason about their opponent? Typologies of players in a turn-taking game

DRAFT VERSION 2, 02/24/12

Improving Fairness in Memory Scheduling

Firms and Markets Saturdays Summer I 2014

Characteristics of Collaborative Network Models. ed. by Line Gry Knudsen

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

Banal Creativity and Unique Creation What is Learning in a Learning Intensive Society? Riel Miller. Future of Learning Glasgow, June 25, 2005

Unit 3: Lesson 1 Decimals as Equal Divisions

College Pricing and Income Inequality

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

BMBF Project ROBUKOM: Robust Communication Networks

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Vorlesung Mensch-Maschine-Interaktion

3. Improving Weather and Emergency Management Messaging: The Tulsa Weather Message Experiment. Arizona State University

Rule Learning With Negation: Issues Regarding Effectiveness

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

GETTING THE MOST OF OUT OF BRAINSTORMING GROUPS

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

A Grammar for Battle Management Language

Smart Grids Simulation with MECSYCO

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

Learning From the Past with Experiment Databases

Understanding University Funding

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Undergraduate Program Guide. Bachelor of Science. Computer Science DEPARTMENT OF COMPUTER SCIENCE and ENGINEERING

Artificial Neural Networks written examination

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Using LibQUAL+ at Brown University and at the University of Connecticut Libraries

Match or Mismatch: Engineering Faculty Beliefs about Communication and Teamwork versus Published Criteria

Match or Mismatch: Engineering Faculty Beliefs about Communication and Teamwork versus Published Criteria

school students to improve communication skills

Rule Learning with Negation: Issues Regarding Effectiveness

A basic cognitive system for interactive continuous learning of visual concepts

TD(λ) and Q-Learning Based Ludo Players

CROSS COUNTRY CERTIFICATION STANDARDS

Chapter 2. Intelligent Agents. Outline. Agents and environments. Rationality. PEAS (Performance measure, Environment, Actuators, Sensors)

PreReading. Lateral Leadership. provided by MDI Management Development International

Transcription:

Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination Peter Stone Director, Learning Agents Research Group Department of Computer Science The University of Texas at Austin Joint work with Gal A. Kaminka, Sarit Kraus, Bar Ilan University Jeffrey S. Rosenschein, Hebrew University

Teamwork

Teamwork

Teamwork Typical scenario: pre-coordination People practice together Robots given coordination languages, protocols Locker room agreement [Stone & Veloso, 99]

Ad Hoc Teams Ad hoc team player is an individual Unknown teammates (programmed by others)

Ad Hoc Teams Ad hoc team player is an individual Unknown teammates (programmed by others) May or may not be able to communicate

Ad Hoc Teams Ad hoc team player is an individual Unknown teammates (programmed by others) May or may not be able to communicate Teammates likely sub-optimal: no control

Ad Hoc Teams Ad hoc team player is an individual Unknown teammates (programmed by others) May or may not be able to communicate Teammates likely sub-optimal: no control

Ad Hoc Teams Ad hoc team player is an individual Unknown teammates (programmed by others) May or may not be able to communicate Teammates likely sub-optimal: no control Challenge: Create a good team player

Illustration

An Individual

With Teammates

Made by Others

Heterogeneous

May not Communicate

May Have Different Capabilities

And/Or Maneuverability

May be a Previously Unknown Type

Human Ad Hoc Teams Military and industrial settings

Human Ad Hoc Teams Military and industrial settings Outsourcing

Human Ad Hoc Teams Military and industrial settings Outsourcing Agents support human ad hoc team formation [Just et al., 2004; Kildare, 2004]

Human Ad Hoc Teams Military and industrial settings Outsourcing Agents support human ad hoc team formation [Just et al., 2004; Kildare, 2004] Autonomous agents (robots) deployed for short times Teams developed as cohesive groups Tuned to interact well together

Challenge Statement Create an autonomous agent that is able to efficiently and robustly collaborate with previously unknown teammates on tasks to which they are all individually capable of contributing as team members.

Challenge Statement Create an autonomous agent that is able to efficiently and robustly collaborate with previously unknown teammates on tasks to which they are all individually capable of contributing as team members. Aspects can be approached theoretically

Challenge Statement Create an autonomous agent that is able to efficiently and robustly collaborate with previously unknown teammates on tasks to which they are all individually capable of contributing as team members. Aspects can be approached theoretically Ultimately an empirical challenge

Empirical Evaluation a0

Evaluation: A Metric a0 a1

Evaluation: A Metric a0 a1 Most meaningful when a0 and a1 have similar individual competencies

Evaluation: Domain Consisting of Tasks a0 a1 D

Evaluation: Set of Possible Teammates a0 a1 D A

Evaluation: Draw a Random Task a0 a1 D A c 2010 Peter Stone

Evaluation: Random Team, Check Comp a0 a1 D A c 2010 Peter Stone

Evalution: Replace Random with a0 a1 D a0 A c 2010 Peter Stone

Evaluation: Then a1 Evaluate Diff a0 D a1 A c 2010 Peter Stone

Evaluation: Repeat a0 a1 D A

Evaluate(a 0, a 1, A, D) Initialize performance (reward) counters r 0 and r 1 for agents a 0 and a 1 respectively to r 0 = r 1 = 0. Repeat: Sample a task d from D. Randomly draw a subset of agents B, B 2, from A such that E[s(B, d)] s min. Randomly select one agent b B to remove from the team to create the team B. increment r 0 by s({a 0 } B, d) increment r 1 by s({a 1 } B, d) If r 0 > r 1 then we conclude that a 0 is a better ad-hoc team player than a 1 in domain D over the set of possible teammates A.

Technical Requirements Assess capabilities of other agents (teammate modeling)

Technical Requirements Assess capabilities of other agents (teammate modeling) Assess the other agents knowledge states

Technical Requirements Assess capabilities of other agents (teammate modeling) Assess the other agents knowledge states Estimate effects of actions on teammates

Technical Requirements Assess capabilities of other agents (teammate modeling) Assess the other agents knowledge states Estimate effects of actions on teammates Be prepared to interact with many types of teammates: May or may not be able to communicate May be more or less mobile May be better or worse at sensing

Technical Requirements Assess capabilities of other agents (teammate modeling) Assess the other agents knowledge states Estimate effects of actions on teammates Be prepared to interact with many types of teammates: May or may not be able to communicate May be more or less mobile May be better or worse at sensing A good team player s best actions will differ depending on its teammates characteristics.

Preliminary Theoretical Progress Aspects can be approached theoretically Ultimately an empirical challenge

Preliminary Theoretical Progress Aspects can be approached theoretically Ultimately an empirical challenge Be prepared to interact with many types of teammates

Preliminary Theoretical Progress Aspects can be approached theoretically Ultimately an empirical challenge Be prepared to interact with many types of teammates Minimal representative scenarios One teammate, no communication Fixed and known behavior

Scenarios Cooperative iterated normal form game [w/ Kaminka & Rosenschein AMEC 09] M1 b 0 b 1 b 2 a 0 25 1 0 a 1 10 30 10 a 2 0 33 40 Cooperative k-armed bandit [w/ Kraus AAMAS 10]

Scenarios Cooperative normal form game M1 b 0 b 1 b 2 a 0 25 1 0 a 1 10 30 10 a 2 0 33 40 Cooperative k-armed bandit

3-armed bandit = Random value from a distribution Expected value µ

3-armed bandit Arm Arm 1 Arm 2

3-armed bandit Arm Arm 1 Arm 2 µ > µ 1 > µ 2 Agent A: teacher Knows payoff distributions Objective: maximize expected sum of payoffs

3-armed bandit Arm Arm 1 Arm 2 µ > µ 1 > µ 2 Agent A: teacher Knows payoff distributions Objective: maximize expected sum of payoffs If alone, always Arm

3-armed bandit Arm Arm 1 Arm 2 µ > µ 1 > µ 2 Agent A: teacher Knows payoff distributions Objective: maximize expected sum of payoffs If alone, always Arm Agent B: learner Can only pull Arm 1 or Arm 2

3-armed bandit Arm Arm 1 Arm 2 µ > µ 1 > µ 2 Agent A: teacher Knows payoff distributions Objective: maximize expected sum of payoffs If alone, always Arm Agent B: learner Can only pull Arm 1 or Arm 2 Selects arm with highest observed sample average

Assumptions Arm Arm 1 Arm 2

Assumptions Arm Arm 1 Arm 2 Alternate actions (teacher first) µ > µ 1 > µ 2 Results of all actions fully observable (to both)

Assumptions Arm Arm 1 Arm 2 µ > µ 1 > µ 2 Alternate actions (teacher first) Results of all actions fully observable (to both) Number of rounds remaining finite, known to teacher

Assumptions Arm Arm 1 Arm 2 µ > µ 1 > µ 2 Alternate actions (teacher first) Results of all actions fully observable (to both) Number of rounds remaining finite, known to teacher Objective: maximize expected sum of payoffs

Summary of Findings Arm Arm 1 Arm 2

Summary of Findings Arm Arm 1 Arm 2 Arm 1 is sometimes optimal Arm 2 is never optimal µ > µ 1 > µ 2

Summary of Findings Arm Arm 1 Arm 2 µ > µ 1 > µ 2 Arm 1 is sometimes optimal Arm 2 is never optimal Optimal solution when arms have discrete distribution Interesting patterns in optimal action Extensions to more arms

Summary of Findings Arm Arm 1 Arm 2 µ > µ 1 > µ 2 Arm 1 is sometimes optimal Arm 2 is never optimal Optimal solution when arms have discrete distribution Interesting patterns in optimal action Extensions to more arms Exploitation vs.

Summary of Findings Arm Arm 1 Arm 2 µ > µ 1 > µ 2 Arm 1 is sometimes optimal Arm 2 is never optimal Optimal solution when arms have discrete distribution Interesting patterns in optimal action Extensions to more arms Exploitation vs. vs. teaching

Challenge Statement Create an autonomous agent that is able to efficiently and robustly collaborate with previously unknown teammates on tasks to which they are all individually capable of contributing as team members.

Suggested Research Plan 1. Identify the full range of possible teamwork situations that a complete ad hoc team player needs to be capable of addressing (D and A).

Suggested Research Plan 1. Identify the full range of possible teamwork situations that a complete ad hoc team player needs to be capable of addressing (D and A). 2. For each such situation, find theoretically optimal and/or empirically effective algorithms for behavior.

Suggested Research Plan 1. Identify the full range of possible teamwork situations that a complete ad hoc team player needs to be capable of addressing (D and A). 2. For each such situation, find theoretically optimal and/or empirically effective algorithms for behavior. 3. Develop methods for identifying which type of teamwork situation the agent is currently in, in an online fashion.

Suggested Research Plan 1. Identify the full range of possible teamwork situations that a complete ad hoc team player needs to be capable of addressing (D and A). 2. For each such situation, find theoretically optimal and/or empirically effective algorithms for behavior. 3. Develop methods for identifying which type of teamwork situation the agent is currently in, in an online fashion. 2 and 3: the core technical challenges

Suggested Research Plan 1. Identify the full range of possible teamwork situations that a complete ad hoc team player needs to be capable of addressing (D and A). 2. For each such situation, find theoretically optimal and/or empirically effective algorithms for behavior. 3. Develop methods for identifying which type of teamwork situation the agent is currently in, in an online fashion. 2 and 3: the core technical challenges 1 and 3: a knob to incrementally increase difficulty

Related Work Multiagent learning [Claus & Boutilier, 98],[Littman, 01], Opponent Modeling [Conitzer & Sandholm, 03],[Powers & Shoham, 05],[Chakraborty & Stone, 08] Intended plan recognition [Sidner, 85],[Lochbaum, 91],[Carberry, 01] SharedPlans [Grosz & Kraus, 96] Recursive Modeling [Vidal & Durfee, 95] Human-Robot-Agent Teams Overlapping but different challenges, including HRI [Klein, 04] Out of scope Much More pertaining to specific teammate characteristics

Acknowledgements Fulbright and Guggenheim Foundations Israel Science Foundation

Ad Hoc Teams Ad hoc team player is an individual Unknown teammates (programmed by others) May or may not be able to communicate Teammates likely sub-optimal: no control Challenge: Create a good team player