An Investigation into Team-Based Planning

An Investigation into Team-Based Planning Dionysis Kalofonos and Timothy J. Norman Computing Science Department University of Aberdeen {dkalofon,tnorman}@csd.abdn.ac.uk Abstract Models of plan formation by teams of autonomous, social agents are of interest in the application of multi-agent systems in both engineering and commerce. A team of agents may be motivated by the need to achieve a common goal, and must agree on a plan of action. However, being autonomous entities, they may differ in the contributions that they are capable of making and may have varying attitudes towards the options available (e.g. preferences over who should do what). Furthermore, in general, it is not appropriate for a team plan to be imposed by a single manager agent. In this paper we investigate methods of collaborative plan construction and conflict resolution in teams of autonomous, social agents. We propose a multiagent planning algorithm that interleaves planning with information exchange, coordination and negotiation, allowing the agents to handle potential conflicts, and to promote their own preferences over the derived solutions. 1 1. Introduction In a team of cooperative agents pursuing a given set of goals the agents should be able to resolve any conflicts introduced by their activities for the achievement of an overall sound solution that supports the goals of the team. A technique widely employed in the resolution of such conflicts is the generation of solutions to sub-problems by individual agents and subsequent merging of those solutions. The merging of solutions to sub-problems results in a team plan, and can be performed by a central manager agent (see [CCPD01]) or in a decentralised manner (see [CD99b, ER94]). Alternatively, as discussed by Chang et al. [CDP93] and by Riley & Veloso [RV01], a central planning agent may be employed to generate the team plan given information about the competencies of the team members, and to distribute the team plan for enactment. 1 0-7803-8566-7/04/$20.00 c 2004 IEEE. These approaches either reduce the model to that of a centralised planner with multi-agent plan enactment, or they either assume that sub-problems have few inter-dependencies or require significant post-processing to fix conflicts that may have arisen between the solutions to sub-problems. A means of minimising this post-processing of solutions to sub-problems (or sub-plans) has been suggested by Clement & Durfee [CD99a], where summary information is extracted from a hierarchy of individual plans and communicated among the agents in order to eliminate the degree of backtracking required in refining individual plans. In more recent work Brenner [Bre03] has shown that it is beneficial for agents to interleave planning with communication, since it enables them to identify conflicts early in the planning phase, reducing the cost of backtracking. Following the model presented by Brenner [Bre03], in this paper we present a multi-agent planner based on Graph- Plan [BF97] that allows agents within a team: to interleave plan graph construction with information exchange; and to interleave solution (plan) extraction with various forms of conflict resolution. In our model, interleaving plan graph construction with information exchange allows team members to inform each other of the actions they are willing to perform at a particular stage. Interleaving plan extraction with conflict resolution performs two functions in our model: first, the agents may, during the search for a team plan, inform their teammates about the choices they make in order to eliminate (or at least minimise) backtracking; and second it enables the agents to promote their preferences over the possible plan outcomes. This paper is organised as follows. First, we outline the assumptions that we have made regarding shared knowledge and purpose within the team of agents and the private preferences of individual team members (section 2.1). Then we discuss the two key extensions to GraphPlan [BF97] that we have developed to facilitate its use in team-based plan-

ning: information exchange (section 2.2) and conflict resolution (section 2.3). In section 3 we evaluate our approach using a variant of a classical problem domain blocks world for multiple agents. Finally, we discuss avenues for future development and conclude. 2. Multi-Agent GraphPlan 2.1. The Planning Model The team-based planning mechanism presented in this paper is based on the GraphPlan [BF97] algorithm. Graph- Plan operates in two phases: graph expansion and solution extraction. During the graph expansion phase the algorithm creates a planning graph, which encodes the plan search space as a constraint satisfaction network. This graph is expanded until either the graph levels off, 2 in which case no solution can be found, or the network is expanded to a level in which the goals are present and are free from mutual exclusion (mutex) constraints (i.e. there are no constraints in the graph that state that two of the goals cannot both be true at that level). If the graph has been expanded to a level in which the goals appear and are mutex free, a solution may exist in the graph, and the algorithm moves to the solution extraction phase to search for a plan. If no plan is found, the algorithm returns to the graph expansion phase. Our extensions to GraphPlan are detailed in the following two sections, but here we outline the approach that we have taken in developing a team-based planning mechanism. Our principal focus in this research is on maintaining an agent s balance between revealing appropriate information about their preferences in the pursuit of shared goals and not disclosing information about activities unrelated to the shared goal or their complete set of preferences or priorities. We are, therefore, viewing agents in a team as peers, having their own resources and qualifications, but contributing to the process of searching for a team plan. For instance, we want to allow agents to state what actions (and hence resources) that they are willing to contribute to the team activity, not their capabilities. An agent may, for example, decide that it will not contribute a certain resource because it has other, undisclosed, commitments with respect to this resource. We do make a number of assumptions about the information available to each agent in the team prior to the commencement of the planning process and the behaviour of team members. These assumptions are as follows: All the agents share the same knowledge about the (relevant aspects of the) initial state. Otherwise, we would 2 The graph is said to have levelled off if the nodes that may be added at level n are the same as those in level n 1 the algorithm ensures that the operator nodes at level n appear in all levels > n require a pre-planning phase during which any disagreements on matters of truth are resolved. It would be perfectly reasonable to relax this assumption, but modelling such a, potentially complex, discourse between the members of the team is outside the scope of this paper. All the agents share the same knowledge about things that can be done; i.e. the agents have the same model of the operators that may be included in a plan. However, this does not mean that all the agents have the same set of capabilities they are simply aware of the input constraints (preconditions) and output constraints (effects) of each action. This information is essential for the agents to create the edges within a planning graph. The agents are in agreement about the team s goals. We assume that the agents are cooperative although it is an important issue for multi-agent systems research, we do not address the issue of trust or consider the effect that an uncooperative agent would have on team activity. The communication that is taking place is error free (i.e. there is no message loss, sound in the received messages, etc. ), and the agents are not restricted from a communication time frame. With all this information available, why not simply select a single member of the team to search for a solution and distribute this to the other team members? There are a number of reasons for it being inappropriate to do this; here we consider two such reasons: Agents differ in their capabilities and they may have commitments to act outside the context of the team activity. This means that, even if the capabilities of others is common knowledge, only the agent concerned knows when it may or may not contribute an action to the team plan. It would be inappropriate for information about an agent s extra-team commitments to be common knowledge within a team. Execution cost is associated with each operator, agent pair. A particular action (or operator) may be performed by a number of team members (or associated to a number of team members in the team plan), but the costs to each member may be very different. It would be inappropriate for an agent to reveal its individual costs (or any other information that contributes to its preferences/utility function) to other team members, except for declaring preferences over a set of possible outcomes during conflict resolution. So, we have a set of agents that are motivated to collaborate in the construction of a plan of action to solve a conjunction of shared goals how may we adapt an algorithm

such as GraphPlan to support this team planning activity? In the following three sections we address our extensions to GraphPlan information exchange and conflict resolution in detail. 2.2. Information exchange Our first modification to the GraphPlan algorithm involves the exchange of information between team members during planning. Conceptually, the members of the team work together on a shared plan and share information by, each in turn, adding to the plan graph. In practise, however, each agent maintains their individual copy of the plan graph and share updates as they make their own contribution to the next plan graph level. The information sharing is designed to ensure that each team member s copy is up-to-date. The additions to the plan graph provided by members of the team are nodes representing actions that each team member is prepared to perform at that point in the plan. Each time the planning graph is expanded one step further, each agent collects all the possible operators that it is willing to contribute to the team plan. These operators are then added to the team plan as operator nodes, appropriate edges are added from the previous propositional level, the new propositional level is constructed, and exclusivity rules are applied. It is important to note here that once an agent has committed a resource to the team plan (i.e. once it has added an operator to the team plan) it may not withdraw that commitment. This is essential to ensure that the multi-agent planning mechanism is guaranteed to terminate. 3 The collaborative plan graph construction then progresses until the team goals appear in the current propositional level (detected by the final agent to contribute to that graph expansion step and flagged for the other members of the team), at which point the team proceeds to extract solutions from the planning graph and agree on who does what through conflict resolution. 2.3. Conflict Resolution Having created the planning graph, which encapsulates the search space of possible team plans, the agents must, in some way, identify a plan within this search space that represents a compromise that each team member finds acceptable. The search for such a compromise must take into account the preferences of each team member. In developing a conflict resolution mechanism of this kind, a typical approach is for each party to identify the set of possible outcomes and submit votes or negotiate over these outcomes. 3 As mentioned in section 2.1, the absence of a plan is characterised by the plan graph levelling off without the goals being present in a propositional level, and this relies on operator nodes at level n appearing in all levels after n. However, in this case the outcomes are encoded in the plan graph, requiring an exhaustive search to extract them all. In contrast to this, agents in the team collaboratively guide the search for one compromise plan within the plan graph by expressing their preferences over which branch to pursue. We have investigated two mechanisms for enabling agents to guide the search for a plan in this manner a voting scheme and a simple round-robin negotiation mechanism discussed in the following sections. 2.3.1. Borda Count Voting Mechanism Our first approach is to use the Borda Count voting mechanism. This mechanism was introduced in 1770 by Borda as a way of electing a new leader of l Académie Royale des Sciences, Paris. In this voting scheme, for x candidates, each voter awards x points to their first choice, x 1 to their second and so on. The points for each candidate from each voter are counted and the candidate with the most points wins. It is possible for there to be a draw, in which case we select a single winner at random. This voting scheme has a number of characteristics that make it relatively fair including reflectional and rotational symmetry. 4 The simplicity and fairness (to some extent at least) of the Borda Count voting mechanism are distinct advantages. However, there are two key disadvantages of this mechanism for the team-based planning mechanism described in this paper. First, the Borda Count voting scheme requires that each voter give a complete ordering over the possible outcomes. Typically, if the vote is over possible plans (or strictly possible partial plans) an agent may need to arbitrarily place an ordering between two outcomes that it considers equally good. Ideally, we would like to have a mechanism that allows agents to express a partial ordering of preference over the possible outcomes. Second, we now require the introduction of a trusted vote supervisor agent (or returning officer agent); i.e. an agent to whom the votes of each team member are submitted, then, once all votes have been received, determines the outcome using the Borda Count scheme and reports the outcome to the team members. In the light of these limitations, we developed a simple negotiation mechanism. 2.3.2. Round-Robing Voting Mechanism In our second approach the agents create a partial ordering of outcomes according to their preferences. Thus, they construct a partial ordering of outcomes according to the cost of each outcome to the agent. This ordering of outcomes represents the list of all the votes of an agent a single vote being a set of outcomes, where the agent considers each outcome in this set to be equally good. Having created their votes, the agents agree an ordering in which they will present their 4 Having the property of reflectional symmetry, for example, means that if one voter prefers candidate a to b and another prefers b to a their votes cancel.

votes. They then proceed to vote in a round-robin (RR) fashion. The first agent votes by declaring the set of outcomes that it considers to be best. In all subsequent turns the agent taking that turn collects the intersection of the sets of votes of the rest of the members in the team. If the intersection is an empty set then the voter agent publishes its next vote. If the intersection is not an empty set the voter agent finds the intersection between its next vote and the intersection of the sets of votes of the rest of the members in the team. If the new intersection is an empty set, the agents do not come into an agreement in this phase, so the voter agent publishes its next vote. If the new intersection is not an empty set, the voter agent adds the elements of the new intersection to a shared ordered set containing the elements of the final agreement. In any case the voter agent informs the rest members in the team whether it finds an intersection or not. When not voting, agents keep track by updating their local copy of the state of the vote by following each declaration. If the voting agent states that there is no intersection (no agreement), then the others take no action. If the voter agent states that there is a new intersection (new agreement), the others remove the members of the new intersection from the set of votes that have been made. To make this more concrete let us consider the following example. Let us assume agent A has the set of partial orderings of outcomes {{1}, {2, 3}, {4, 5}}, agent B has the set {{3, 4}, {1, 2}, {5}} and agent C has the set {{5}, {2, 3}, {1}, {4}}. Initially the sets of votes V A, V B, V C of the three agents respectively and the final agreement FA is empty. If the ordering of the negotiation that the agents have agreed on is A B C, the negotiation progresses as follows: 1. Agent A votes {1} since V B V A = {1}, V B = {}, V C = {}, FA = {} 2. Agent B votes {3, 4} since V A V A = {1}, V B = {3, 4}, V C = {}, FA = {} 3. Agent C votes {5} since V A VB = {} V A = {1}, V B = {3, 4}, V C = {5}, FA = {} 4. Agent A votes {2, 3} since V B V A = {1, 2, 3}, V B = {3, 4}, V C = {5}, FA = {} 5. Agent B votes {1, 2} since V A V A = {1, 2, 3}, V B = {1, 2, 3, 4}, V C = {5}, FA = {} 6. Agent C updates FA since {2, 3} V A V B = {2, 3} V A = {1}, V B = {1, 4}, V C = {5}, FA = {2, 3} 7. Agent A votes {4, 5} since V B V A = {1, 4, 5}, V B = {1, 4}, V C = {5}, FA = {2, 3} 8. Agent B updates FA since {5} V A VC = {5} V A = {1, 4}, V B = {1, 4}, V C = {}, FA = {2, 3, 5} 9. Agent C updates FA since {1} V A VB = {1} V A = {4}, V B = {4}, V C = {}, FA = {2, 3, 5, 1} 10. Agent A has no votes left, gives way to agent B V A = {4}, V B = {4}, V C = {}, FA = {2, 3, 5, 1} 11. Agent B has no votes left, gives way to agent C V A = {4}, V B = {4}, V C = {}, FA = {2, 3, 5, 1} 12. Agent C updates FA since {4} V A VB = {4} V A = {}, V B = {}, V C = {}, FA = [2, 3, 5, 1, 4] At this point the agents have made all their votes, and the agreement set has been formed and the vote is complete. 3. Evaluation For the evaluation of our work we used the Blocks world domain as presented in IPP. 5 This domain allows us to model agent interactions by associating with each operator the agent who has committed to perform the action. This scheme captures any conflicts that arise during the parallel activity of agents on non-shared resources, and maximises parallel activities in domains where conflicts between operators can be relaxed when activities are associated with different actors. We extended the domain further by adding a give operator (see figure 1) so that we can model how agents can exchange resources. This modification also reduces the length of the found plans, since in order to achieve the same effect as the give operator using the classic Blocks world domain, an agent X would have to either stack or put-down a block that it holds before it can be unstacked or picked-up respectively by an agent Y. The multi-agent planner is written in SICStus Prolog and these experiments were run on an AMD Athlon(tm) 800MHz processor with 256Mb of RAM running Linux. The results presented in this section were taken after running the system on five problems (see figure 2 for a representation in IPP of one of our problem instances). This section is organised as follows. In section 3.1 we present the problems used for the evaluation of the system. In section 3.2 we present the evaluation of our implementation of the Graphplan algorithm showing how our system scales up. Subsequently, in section 3.3 we present the evaluation of the two approaches for conflict resolution. 3.1. Problems For the evaluation of the system we selected the blocks world domain (see figure 2). One of the characteristics of 5 http://www.informatik.uni-freiburg.de/ koehler/ipp.html

problem 3bl 4bl 5bl 6bl 7bl 1 agent 2 agents 3 agents 8 5 5 12 7 7 Table 1. The number of time-steps that the planning graph was expanded in the single and multi-agent cases. 16 9 9 20 11 11 24 13 13 problem 3bl 4bl 5bl 6bl 7bl 2 agents BC 2 agents RR 3 agents BC 3 agents RR 0.950 2.650 4.680 12.420 16.980 45.040 57.240 1.080 4.960 17.360 58.020 228.740 2.530 12.330 45.040 221.280 227.730 222.170 Table 2. CPU times (sec) for the multi-agent with the negotiation and auction mechanisms. this domain is the high degree of inter-relationships among the several entities present in the domain. The principal reason for choosing such problems is that systems that need to fragment a given problem into independent subproblems (see [IM02, ER94]) tend to perform poorly. 3.2. Evaluating our Graphplan implementation Table 1 shows the time-steps that the planning graph was expanded in each of our experiments. We observe that as the degree of parallelism that the agents explore increases, we obtain sorter plans. We also observe that with three agents there is no further reduction in the length of the found plans, since the maximum parallelism that we can explore in the given problems is with the simultaneous performance of actions by two agents. 3.3. Evaluating our conflict resolution approaches Table 2 shows the time spent by the system for any of the five problems. We observe that there is a communication overhead during the round robbing negotiation which is influenced by the number of nodes appearing in the graph. This overhead is fairly insignificant as the graph is small, but tends to increase as the number of nodes in the graph increases, reaching at the worst case 6.57 seconds. 4. Future research In this section, we discuss potential extensions of our work. RealPlan is an extension to the GraphPlan algorithm presented in [SK00, Sri00] where a separation of action selection and resource allocation has been discussed. Instead of considering all the possible instantiations of operators on the available resources, the planner assigns a dummy value to resource variables and ignores any conflicts that would appear from the use of a resource. This method allows the planner to focus to the actual problem without having to resolve any scheduling violations. Having the planner produced an abstract plan, the found plan is then passed to a scheduler which assigns resources to actions and modifies the abstract plan by adding new actions to the plan or moving actions from one time-step to another in order to resolve any conflicts. The extension of GraphPlan presented in RealPlan promises significant benefits from the introduction of additional agents. Since the number of action nodes will remain unchanged regardless of the addition of new agents, and the degree of parallelism increases with the introduction of new agents, we expect to reach shorter plans without the trade-off of larger graphs. As presented in [SK00, Sri00], this approach presents significant speed-ups in comparison to the original GraphPlan algorithm. In [IM02] an approach to the distribution of the Graph- Plan algorithm is presented. IG-DGP (interaction graphbased distributed GraphPlan) starts by creating the interaction graph. Using the interaction graph and the number of resources of each kind of entities present in the domain description the algorithm decomposes the problem into a number of independent subproblems which are delegated to different agents. Having the agents produced a solution to their subproblems, they merge those solutions into a complete plan. The approach presented in [IM02] promises a solution to our problem of decreasing the performance of the algorithm as the graph grows. Studying the planning graph we realise that the size of the graph increases if there is an intersection of the operators that each agent can perform. Having such an intersection the algorithm is forced to propagate into the graph all the possible instantiations of the operators appearing in that intersection to different agents. Being able to distinguish those alternatives of the same operator we can decompose the problem into different subproblems and distribute those subproblems among the agents, keeping the time needed for a solution extraction proportional to the problem the agents tackle and not to the number of agents involved.

(:action give :parameters (?a?ra - agent?ob - object) :precodition (and (holding?a?ob) (arm-empty?ra)) :effect (and (arm-empty?a) (holding?ra?ob) (not (holding?a?ob)) (not (arm-empty?ra)))) ) (define (problem 3bl) (:domain blocks-world) (:objects a b c - object vega virgin andromeda - agent) (:init (on-table a) (on b a) (on c b) (clear c) (arm-empty vega) (arm-empty virgin) (arm-empty andromeda)) (:goal (and (on-table b) (on c b) (on a c)))) Figure 1. Our extension of the Blocks world domain defined in IPP. Figure 2. One of the five problems used for the evaluation of our system. 5. Conclusion In this paper we have presented a multi-agent planning system that allows agents within a team to interleave planning with information exchange, coordination and negotiation. We claim that this approach limits the degree of post-processing required for resolving conflicts since potential interference from the concurrent execution of individual plans is identified and handled during the search for a plan. We presented an information exchange mechanism that allows each agent to build a complete planning graph that encodes the abilities of all the agents involved in the planning process, and captures all the possible changes that may take place in the environment that the agent is working in. Additionally we presented two negotiation strategies that allow the agents to coordinate their choices and to promote their own preferences, namely: an implementation of the Borda auction mechanism that ensures a small communication overhead and fairness among the agents, but ties the system into a centralised auction supervisor, and a direct negotiation mechanism that is dis-centralised, but requires a greater communication overhead and promises a smaller degree of fairness. 6. Acknowledgements The research reported in this paper has been funded by the EPSRC project The Information Exchange. The project partners are the Universities of Aberdeen, Dundee and Southampton. References [BF97] A. L. Blum and M. L. Furst. Fast planning through planning graph analysis. Artificial Intelligence, 90:281 300, 1997. [Bre03] M. Brenner. Multiagent planning with partially ordered temporal plans. In Proceedings of the Eigh- teenth International Joint Conference on Artificial Intelligence, pages 1513 1514, 2003. [CCPD01] J. S. Cox, B. J. Clement, P. M. Pappachan, and E. H. Durfee. Integrating multiagent coordination with reactive plan execution. In Proceedings of the Fifth International Conference on Autonomous Agents, pages 149 150, 2001. [CD99a] B. J. Clement and E. H. Durfee. Theory for coordinating concurrent hierarchical planning agents using summary information. In Proceedings of the Sixteenth National Conference on Artificial Intelligence, pages 459 502, 1999. [CD99b] B. J. Clement and E. H. Durfee. Top-down search for coordinating the hierarchical plans of multiple agents. In Proceedings of the Third International Conference on Autonomous Agents, pages 252 259, 1999. [CDP93] K. H. Chang, W. B. Day, and S. Phiphobmongkol. An agent-oriented multiagent planning system. In Proceedings of the 1993 ACM Conference on Computer Science, pages 107 114, 1993. [ER94] E. Ephrati and J. S. Rosenschein. Divide and conquer in multi-agent planning. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 375 380, 1994. [IM02] M. Iwen and A. D. Mali. Distributed graphplan. In Proceedings of the Fourteenth IEEE International Conference on Tools with Artificial Intelligence, pages 138 145, 2002. [RV01] P. Riley and M. Veloso. Planning for distributed execution through use of probabilistic opponent models. In Proceedings of the IJCAI Workshop on Planning under Uncertainty and Incomplete Information, 2001. [SK00] B. Srivastava and S. Kambhampati. Scaling up planning by teasing out resource scheduling. In S. Biundo and M. Fox, editors, Recent Advances in AI Planning: Proceedings of the 5th European Conference on Planning, volume 1809 of Lecture Notes in Computer Science, pages 172 186. Springer-Verlag, 2000. [Sri00] B. Srivastava. RealPlan: Decoupling causal and resource reasoning in planning. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, pages 812 818, 2000.