Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 1/39 Artificial Intelligence 16. Non-Classical Planning Relaxing our assumptions over the agents environment Álvaro Torralba Wolfgang Wahlster Summer Term 2018 Thanks to Prof. Hoffmann for slide sources
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 2/39 Agenda 1 Introduction 2 Planning with Uncertainty 3 Non-Deterministic Planning 4 Numeric Planning 5 Multi-Agent Planning 6 Temporal Planning 7 Planning in the Real World 8 Conclusion
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 4/39 Reminder (Chapter 3): What is an Agent in AI? Agents: Perceive the environment through sensors ( percepts). Act upon the environment through actuators ( actions). Agent Sensors Percepts? Environment Actuators Actions
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 5/39 Reminder (Chapter 3): Environment of Rational Agents Fully observable vs. Partially observable: Are the relevant aspects of the environment accessible to the sensors? Deterministic vs. stochastic: Is the next state of the environment completely determined by the current state and the selected action? Episodic vs. sequential: Can the quality of an action be evaluated within an episode (perception + action), or are future developments decisive? Static vs. dynamic: Can the environment change while the agent is deliberating? Discrete vs. continuous: Is the environment discrete or continuous? Single agent vs. multi-agent: Is there just one agent, or several of them?
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 6/39 Planning Model-based sequential decision making: Given a model of the environment and the agent s capabilities, decide what action to execute next reasoning about the possible future outcomes. Chapter 14: Classical STRIPS Planning: A STRIPS planning task, short planning task, is a 4-tuple Π = (P, A, I, G) where: P is a finite set of facts. A is a finite set of actions; each a A is a triple of pre, add, and del. I P is the initial state. G P is the goal. Solution (plan) = sequence of actions transforming I into a G s. So, what assumptions are we making over the environment? Fully observable, Deterministic, Static, Discrete, Single agent
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 7/39 Our Agenda for This Chapter Planning with Uncertainty: Dealing with partially-observable environments. Non-deterministic Planning: Dealing with non-deterministic environments. Numeric Planning: Dealing with continous environments. Multi-agent Planning: Dealing with environments where several agents cooperate. Temporal Planning: Dealing with actions whose effects take some time. Planning in the Real World: So, what is the role of classical planning in all this.
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 8/39 Disclaimer(s) This lecture intends to be a general overview of many differnet topics. The concepts and explanations in this chapter are very broad and superficial, and also you are not expected to understand all the details. Moreover, we do not cover all types of non-classical planning.
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 10/39 Partially Observable Environments In the real world: we do not know everything! Example: Chapter 10: Wumpus world Differences with respect to classical planning: 1 We do not have full information about the initial state 2 (Optionally) Actions may give us new information about the current state
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 11/39 Belief State We have only partial knowledge about the initial state. How to represent our knowledge? Logic! Initial knowledge: You re in cell [1,1]. P 1,1 There s a Wumpus (W 1,1 W 1,2 W 1,3... ) There s gold (G 1,1 G 1,2 G 1,3... ) There s no stench in position [1, 1]: S 1,1 General knowledge: S 1,1 W 1,1 W 1,2 W 2,1 Definition (Belief State). Let ϕ be a propositional formula that describes our knowledge about the current state. Then, the belief state B is the set of states that correspond to satisfying assignments of ϕ. The set of states that are consistent with our belief.
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 12/39 Partially-Observable Planning: Conformant Planning Conformant Planning: A planning task with uncertainty in the initial state, is a 4-tuple Π = (P, A, ϕ I, G) where: P is a finite set of facts. A is a finite set of actions; each a A is a triple of pre, add, and del. ϕ I is the initial belief state. G P is the goal. Find a sequence of actions that transform the initial belief state ϕ I into ϕ G = G (conformant plan). A conformant plan is a sequence of actions that works no matter in which initial state are we (compatible with our initial belief state)
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 13/39 Conformant Planning: How to solve it? Method 1: Heuristic search Each search node contains a belief state that can be represented in different ways: Enumerate states (possibly exponentially many) As a logical formula φ B (checking whether an action is applicable in φ B requires solving a satisfiability problem to check if the formula entails the precondition of the action). Method 2: Compiling it to classical planning The compilation is either exponential in the worst case (or incomplete) Complexity: PlanEx of conformant planning is EXPSPACE-hard
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 14/39 Questionnaire Question! What s a conformant plan for the Wumpus example? None, we need to sense the environment to discover where is the wumpus!
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 15/39 Partially-Observable Planning: Contingent Planning Contingent Planning: A contingent planning task, is a 4-tuple Π = (P, A, ϕ I, G) where: P is a finite set of facts. A is a finite set of actions; each a A is a tuple (pre, add, del, obs). The value of facts in obs set after executing the action. ϕ I is the initial belief state. G P is the goal.
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 16/39 Modelling the Wumpus Problem σ I = {at 1 1 clear 1 1 clear i,j wumpus i,j wumpus i,j stench i,j+1... } G = {have gold} move-x-y: pre: {at x, clear y } add: {at y } del: {at x } obs: {stench y, breeze y } move-x-y has 4 different outcomes and we need to plan for each of them individually! (Blackboard)
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 17/39 Contingent Planning: How to Solve It Given a contingent planning task find a tree of actions that transform the initial belief state ϕ I into ϕ G = G. We may need different plans for every result of the observation actions! Method: An option is AndOr search, similar to MinMax search where we distinguish between Or/Max nodes (where we choose the action to apply) and And/Min nodes where the environment chooses. Complexity: PlanEx of contingent planning is 2EXPSPACE-hard
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 19/39 Stochastic Environments In the real world: we cannot always anticipate the effect of our actions! win buy-lottery lose Differences with respect to classical planning: 1 Actions can have multiple outcomes 2 (Optionally) If the probability of each outcome is known, then we call it probabilistic planning.
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 20/39 Markov Decision Processes The state space is replaced by a Markov Decision Processes: A Markov Decision Process, is a 4-tuple (S, A, T, R, I, S G ) where: S is a finite set of states. A is a finite set of actions. T is a transition function (s, a, s ) probability. R is a reward function R(s, a, s ) R. I is the initial state. S G is the set of goal states. Differences with respect to classical search problems: Transitions have multiple outcomes each with a given probability Transitions provide a reward Objective: Reach a goal state? Maximize reward?
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 21/39 Markov Decision Processes: Objective Find a policy: mapping from states to actions. Multiple types of MDPs, depending on what is the objective: 1 Maximize reward:the reward is often no matter what the agent does! 1 Maximize reward after a finite number of steps 2 Maximize reward with discount factor 2 Reach goal: 1 Find a policy that reaches a goal state with least average cost if one exists. what if it is not possible to reach the goal with probability 1? 2 Find policy that reaches a goal state with maximum probability
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 22/39 Probabilistic Planning: How to Solve It Value Iteration: Finding optimal policy can be done in polynomial time in the size of the MDP! But: the MDP is often exponential in the size of the probabilistic planning task LRTDP: We can use heuristic search to approximate what is the relevant part of the MDP Complexity: PlanEx of contingent planning is 2EXPSPACE-hard
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 23/39 Questionnaire buy-lottery win lose Question! In the real world, what do you do? Do you always plan ahead for every possibility/contingency? No, you do not want to spent an hour thinking whether it is best to take the 102 or the 124 back to the city center. You may consider the possibility that the bus is delayed but not of the bus crashing.
Online vs. Offline Planning Given a contingent/probabilistic planning task: Offline Planning: Plan ahead for every possibility. Find contingent plan Find (optimal) policy Online Planning: Decide what to do next: spent some time deciding what action to execute, execute it, observe the result and re-plan if necessary. FF-Replan. Given a probabilistic planning task. 1 Drop probabilities away (assume that you get to choose the outcome) 2 Use a classical problem to solve the simplified task 3 Execute the action recommended by the planner 4 If the outcome is the expected one, continue. Otherwise, re-plan from the state. Very effective online probabilistic planner, specially in tasks where probabilities model that actions have a (low) probability of failure. Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 24/39
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 26/39 Continous Environments In the real world: We have numbers! Fuel Voltage control
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 27/39 Numeric STRIPS Numeric STRIPS Planning: Extends STRIPS by introducing numeric variables V n with rational values in Q. Numeric expressions: We can do simple arithmetic (+,,, ) with the values of variables and/or constants. Numeric conditions: compare numeric expressions with {<,, =,, >}. Numeric effects: assign a numeric expression to a numeric variable in V n. Example: drive (x,y): pre : {at(x), fuel > 0}, add : {at(y)}del : {at(x)}, assign : {fuel := fuel 1}
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 28/39 Numeric STRIPS: How to solve it? Method 1: Compiling it to classical planning Each number is discretized into a finite set of values: The loss of precision requires some rounding, causing this methods to be either unsound or incomplete. Method 2: Heuristic Search: Similar to classical planning, only that now heuristics must take the numbers into account (e.g. delete relaxation based on intervals). But the state space is not finite anymore! Complexity: PlanEx-Numeric STRIPS is Undecidable
Competitive Environments: General Game Playing (see Chapter 07) Colaborative Environments Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 30/39 Multi-agent Environments In the real world: We are not the single and only agent!
Multi-agent Environments Multi-agent planning: Several agents must collaborate to achieve a common goal. Key: There is some global information known by all agents but each agent has his own private facts, who do not want to share with the rest. Multi-Agent STRIPS Planning: A multi-agent STRIPS planning task, is a 5-tuple Π = (P, A, I, G) where: P is a finite set of facts, divided in private and public facts. A is a finite set of actions; each a A is a triple of pre, add, and del, divided in private and public actions. I P is the initial state. G P is the goal. Agents must communicate during the planning process to share information about how they will achieve the goal Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 31/39
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 33/39 Hybrid Environments In the real world: Events do not happen instantaneously!
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 34/39 Durative Actions In classical planning the action effects happen immediately. However, in the real world, actions take time to execute. When the precondition needs to hold? When the effect is applied? preconditions at-start, and effects at-end preconditions at-end effects at-end over-all: during the execution of the action
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 36/39 Planning in the Real World Question! If most real-world environments are not deterministic, not fully observable, not discrete, not single agent, and temporal. What is classical planning good for? Agent Sensors? Actuators Percepts Actions Environment The model does not try to simulate the environment, it is just a tool to take good decisions. Oftentimes, reasoning with a simplified model can still lead to intelligent decisions and solutions are easier to compute than with more complex models. My two cents: Ideally, we should always provide an accurate description of the environment so that the AI simplifies it when necessary. However, automatic simplification methods are not powerful enough in all cases yet.
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 38/39 Summary There are many planning sub-areas that study the model-based sequential decision making. Non-classical planning models reason about more complex environments such as non-deterministic, partially-observable, continuous, temporal, etc. Solving these problems by computing a complete offline policy is hard (though many non-classical planners are able to do this satisfactorily in some domains). Many approaches are online, planning to decide the next action by looking into the future but without considering all alternatives. Classical planning and heuristic search techniques are still an important ingredient of many approaches that deal with complex environments.
Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 39/39 References I