Artificial Intelligence

Similar documents
Lecture 10: Reinforcement Learning

Learning and Transferring Relational Instance-Based Policies

Agents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators

Chapter 2. Intelligent Agents. Outline. Agents and environments. Rationality. PEAS (Performance measure, Environment, Actuators, Sensors)

Intelligent Agents. Chapter 2. Chapter 2 1

Discriminative Learning of Beam-Search Heuristics for Planning

High-level Reinforcement Learning in Strategy Games

AMULTIAGENT system [1] can be defined as a group of

FF+FPG: Guiding a Policy-Gradient Planner

Planning with External Events

Introduction to Simulation

Evolution of Collective Commitment during Teamwork

BMBF Project ROBUKOM: Robust Communication Networks

Speeding Up Reinforcement Learning with Behavior Transfer

Transfer Learning Action Models by Measuring the Similarity of Different Domains

TOKEN-BASED APPROACH FOR SCALABLE TEAM COORDINATION. by Yang Xu PhD of Information Sciences

Axiom 2013 Team Description Paper

Case Acquisition Strategies for Case-Based Reasoning in Real-Time Strategy Games

Mathematics subject curriculum

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Causal Link Semantics for Narrative Planning Using Numeric Fluents

Reinforcement Learning by Comparing Immediate Reward

Planning in Intelligent Systems: Model-based Approach to Autonomous Behavior

Regret-based Reward Elicitation for Markov Decision Processes

An Investigation into Team-Based Planning

MYCIN. The MYCIN Task

Generating Test Cases From Use Cases

Knowledge-Based - Systems

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Probabilistic Latent Semantic Analysis

Rule-based Expert Systems

Action Models and their Induction

Visual CP Representation of Knowledge

Georgetown University at TREC 2017 Dynamic Domain Track

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

GUIDE TO THE CUNY ASSESSMENT TESTS

A theoretic and practical framework for scheduling in a stochastic environment

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

Seminar - Organic Computing

Are You Ready? Simplify Fractions

Towards Team Formation via Automated Planning

Laboratorio di Intelligenza Artificiale e Robotica

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

PreReading. Lateral Leadership. provided by MDI Management Development International

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

University of Groningen. Systemen, planning, netwerken Bosman, Aart

TD(λ) and Q-Learning Based Ludo Players

An Introduction to Simio for Beginners

Designing A Computer Opponent for Wargames: Integrating Planning, Knowledge Acquisition and Learning in WARGLES

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Liquid Narrative Group Technical Report Number

Math 121 Fundamentals of Mathematics I

UC Merced Proceedings of the Annual Meeting of the Cognitive Science Society

Evolutive Neural Net Fuzzy Filtering: Basic Description

How do adults reason about their opponent? Typologies of players in a turn-taking game

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Harvesting the Wisdom of Coalitions

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

On the Combined Behavior of Autonomous Resource Management Agents

Learning Prospective Robot Behavior

GACE Computer Science Assessment Test at a Glance

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

SARDNET: A Self-Organizing Feature Map for Sequences

Automatic Discretization of Actions and States in Monte-Carlo Tree Search

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Laboratorio di Intelligenza Artificiale e Robotica

Using Rhetoric Technique in Persuasive Speech

Artificial Neural Networks written examination

Abstractions and the Brain

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Visit us at:

A Reinforcement Learning Variant for Control Scheduling

Shared Mental Models

Task Completion Transfer Learning for Reward Inference

CSC200: Lecture 4. Allan Borodin

Inside the mind of a learner

Integrating derivational analogy into a general problem solving architecture

1.11 I Know What Do You Know?

A Genetic Irrational Belief System

Science Olympiad Competition Model This! Event Guidelines

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

Foothill College Summer 2016

Lecture 1: Machine Learning Basics

Mathematics. Mathematics

DEVELOPMENT OF AN INTELLIGENT MAINTENANCE SYSTEM FOR ELECTRONIC VALVES

Rule Learning With Negation: Issues Regarding Effectiveness

Strategy for teaching communication skills in dentistry

An Introduction to Simulation Optimization

University of Cincinnati College of Medicine. DECISION ANALYSIS AND COST-EFFECTIVENESS BE-7068C: Spring 2016

MTH 141 Calculus 1 Syllabus Spring 2017

Computer Science 141: Computing Hardware Course Information Fall 2012

Seamus Bradley and Katie Steele Can free evidence be bad? Value of information for the imprecise probabilist

Chapter 2 Rule Learning in a Nutshell

Grade 6: Correlated to AGS Basic Math Skills

Software Development: Programming Paradigms (SCQF level 8)

Task Completion Transfer Learning for Reward Inference

Improving Action Selection in MDP s via Knowledge Transfer

Language properties and Grammar of Parallel and Series Parallel Languages

Transcription:

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 1/39 Artificial Intelligence 16. Non-Classical Planning Relaxing our assumptions over the agents environment Álvaro Torralba Wolfgang Wahlster Summer Term 2018 Thanks to Prof. Hoffmann for slide sources

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 2/39 Agenda 1 Introduction 2 Planning with Uncertainty 3 Non-Deterministic Planning 4 Numeric Planning 5 Multi-Agent Planning 6 Temporal Planning 7 Planning in the Real World 8 Conclusion

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 4/39 Reminder (Chapter 3): What is an Agent in AI? Agents: Perceive the environment through sensors ( percepts). Act upon the environment through actuators ( actions). Agent Sensors Percepts? Environment Actuators Actions

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 5/39 Reminder (Chapter 3): Environment of Rational Agents Fully observable vs. Partially observable: Are the relevant aspects of the environment accessible to the sensors? Deterministic vs. stochastic: Is the next state of the environment completely determined by the current state and the selected action? Episodic vs. sequential: Can the quality of an action be evaluated within an episode (perception + action), or are future developments decisive? Static vs. dynamic: Can the environment change while the agent is deliberating? Discrete vs. continuous: Is the environment discrete or continuous? Single agent vs. multi-agent: Is there just one agent, or several of them?

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 6/39 Planning Model-based sequential decision making: Given a model of the environment and the agent s capabilities, decide what action to execute next reasoning about the possible future outcomes. Chapter 14: Classical STRIPS Planning: A STRIPS planning task, short planning task, is a 4-tuple Π = (P, A, I, G) where: P is a finite set of facts. A is a finite set of actions; each a A is a triple of pre, add, and del. I P is the initial state. G P is the goal. Solution (plan) = sequence of actions transforming I into a G s. So, what assumptions are we making over the environment? Fully observable, Deterministic, Static, Discrete, Single agent

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 7/39 Our Agenda for This Chapter Planning with Uncertainty: Dealing with partially-observable environments. Non-deterministic Planning: Dealing with non-deterministic environments. Numeric Planning: Dealing with continous environments. Multi-agent Planning: Dealing with environments where several agents cooperate. Temporal Planning: Dealing with actions whose effects take some time. Planning in the Real World: So, what is the role of classical planning in all this.

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 8/39 Disclaimer(s) This lecture intends to be a general overview of many differnet topics. The concepts and explanations in this chapter are very broad and superficial, and also you are not expected to understand all the details. Moreover, we do not cover all types of non-classical planning.

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 10/39 Partially Observable Environments In the real world: we do not know everything! Example: Chapter 10: Wumpus world Differences with respect to classical planning: 1 We do not have full information about the initial state 2 (Optionally) Actions may give us new information about the current state

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 11/39 Belief State We have only partial knowledge about the initial state. How to represent our knowledge? Logic! Initial knowledge: You re in cell [1,1]. P 1,1 There s a Wumpus (W 1,1 W 1,2 W 1,3... ) There s gold (G 1,1 G 1,2 G 1,3... ) There s no stench in position [1, 1]: S 1,1 General knowledge: S 1,1 W 1,1 W 1,2 W 2,1 Definition (Belief State). Let ϕ be a propositional formula that describes our knowledge about the current state. Then, the belief state B is the set of states that correspond to satisfying assignments of ϕ. The set of states that are consistent with our belief.

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 12/39 Partially-Observable Planning: Conformant Planning Conformant Planning: A planning task with uncertainty in the initial state, is a 4-tuple Π = (P, A, ϕ I, G) where: P is a finite set of facts. A is a finite set of actions; each a A is a triple of pre, add, and del. ϕ I is the initial belief state. G P is the goal. Find a sequence of actions that transform the initial belief state ϕ I into ϕ G = G (conformant plan). A conformant plan is a sequence of actions that works no matter in which initial state are we (compatible with our initial belief state)

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 13/39 Conformant Planning: How to solve it? Method 1: Heuristic search Each search node contains a belief state that can be represented in different ways: Enumerate states (possibly exponentially many) As a logical formula φ B (checking whether an action is applicable in φ B requires solving a satisfiability problem to check if the formula entails the precondition of the action). Method 2: Compiling it to classical planning The compilation is either exponential in the worst case (or incomplete) Complexity: PlanEx of conformant planning is EXPSPACE-hard

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 14/39 Questionnaire Question! What s a conformant plan for the Wumpus example? None, we need to sense the environment to discover where is the wumpus!

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 15/39 Partially-Observable Planning: Contingent Planning Contingent Planning: A contingent planning task, is a 4-tuple Π = (P, A, ϕ I, G) where: P is a finite set of facts. A is a finite set of actions; each a A is a tuple (pre, add, del, obs). The value of facts in obs set after executing the action. ϕ I is the initial belief state. G P is the goal.

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 16/39 Modelling the Wumpus Problem σ I = {at 1 1 clear 1 1 clear i,j wumpus i,j wumpus i,j stench i,j+1... } G = {have gold} move-x-y: pre: {at x, clear y } add: {at y } del: {at x } obs: {stench y, breeze y } move-x-y has 4 different outcomes and we need to plan for each of them individually! (Blackboard)

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 17/39 Contingent Planning: How to Solve It Given a contingent planning task find a tree of actions that transform the initial belief state ϕ I into ϕ G = G. We may need different plans for every result of the observation actions! Method: An option is AndOr search, similar to MinMax search where we distinguish between Or/Max nodes (where we choose the action to apply) and And/Min nodes where the environment chooses. Complexity: PlanEx of contingent planning is 2EXPSPACE-hard

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 19/39 Stochastic Environments In the real world: we cannot always anticipate the effect of our actions! win buy-lottery lose Differences with respect to classical planning: 1 Actions can have multiple outcomes 2 (Optionally) If the probability of each outcome is known, then we call it probabilistic planning.

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 20/39 Markov Decision Processes The state space is replaced by a Markov Decision Processes: A Markov Decision Process, is a 4-tuple (S, A, T, R, I, S G ) where: S is a finite set of states. A is a finite set of actions. T is a transition function (s, a, s ) probability. R is a reward function R(s, a, s ) R. I is the initial state. S G is the set of goal states. Differences with respect to classical search problems: Transitions have multiple outcomes each with a given probability Transitions provide a reward Objective: Reach a goal state? Maximize reward?

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 21/39 Markov Decision Processes: Objective Find a policy: mapping from states to actions. Multiple types of MDPs, depending on what is the objective: 1 Maximize reward:the reward is often no matter what the agent does! 1 Maximize reward after a finite number of steps 2 Maximize reward with discount factor 2 Reach goal: 1 Find a policy that reaches a goal state with least average cost if one exists. what if it is not possible to reach the goal with probability 1? 2 Find policy that reaches a goal state with maximum probability

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 22/39 Probabilistic Planning: How to Solve It Value Iteration: Finding optimal policy can be done in polynomial time in the size of the MDP! But: the MDP is often exponential in the size of the probabilistic planning task LRTDP: We can use heuristic search to approximate what is the relevant part of the MDP Complexity: PlanEx of contingent planning is 2EXPSPACE-hard

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 23/39 Questionnaire buy-lottery win lose Question! In the real world, what do you do? Do you always plan ahead for every possibility/contingency? No, you do not want to spent an hour thinking whether it is best to take the 102 or the 124 back to the city center. You may consider the possibility that the bus is delayed but not of the bus crashing.

Online vs. Offline Planning Given a contingent/probabilistic planning task: Offline Planning: Plan ahead for every possibility. Find contingent plan Find (optimal) policy Online Planning: Decide what to do next: spent some time deciding what action to execute, execute it, observe the result and re-plan if necessary. FF-Replan. Given a probabilistic planning task. 1 Drop probabilities away (assume that you get to choose the outcome) 2 Use a classical problem to solve the simplified task 3 Execute the action recommended by the planner 4 If the outcome is the expected one, continue. Otherwise, re-plan from the state. Very effective online probabilistic planner, specially in tasks where probabilities model that actions have a (low) probability of failure. Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 24/39

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 26/39 Continous Environments In the real world: We have numbers! Fuel Voltage control

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 27/39 Numeric STRIPS Numeric STRIPS Planning: Extends STRIPS by introducing numeric variables V n with rational values in Q. Numeric expressions: We can do simple arithmetic (+,,, ) with the values of variables and/or constants. Numeric conditions: compare numeric expressions with {<,, =,, >}. Numeric effects: assign a numeric expression to a numeric variable in V n. Example: drive (x,y): pre : {at(x), fuel > 0}, add : {at(y)}del : {at(x)}, assign : {fuel := fuel 1}

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 28/39 Numeric STRIPS: How to solve it? Method 1: Compiling it to classical planning Each number is discretized into a finite set of values: The loss of precision requires some rounding, causing this methods to be either unsound or incomplete. Method 2: Heuristic Search: Similar to classical planning, only that now heuristics must take the numbers into account (e.g. delete relaxation based on intervals). But the state space is not finite anymore! Complexity: PlanEx-Numeric STRIPS is Undecidable

Competitive Environments: General Game Playing (see Chapter 07) Colaborative Environments Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 30/39 Multi-agent Environments In the real world: We are not the single and only agent!

Multi-agent Environments Multi-agent planning: Several agents must collaborate to achieve a common goal. Key: There is some global information known by all agents but each agent has his own private facts, who do not want to share with the rest. Multi-Agent STRIPS Planning: A multi-agent STRIPS planning task, is a 5-tuple Π = (P, A, I, G) where: P is a finite set of facts, divided in private and public facts. A is a finite set of actions; each a A is a triple of pre, add, and del, divided in private and public actions. I P is the initial state. G P is the goal. Agents must communicate during the planning process to share information about how they will achieve the goal Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 31/39

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 33/39 Hybrid Environments In the real world: Events do not happen instantaneously!

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 34/39 Durative Actions In classical planning the action effects happen immediately. However, in the real world, actions take time to execute. When the precondition needs to hold? When the effect is applied? preconditions at-start, and effects at-end preconditions at-end effects at-end over-all: during the execution of the action

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 36/39 Planning in the Real World Question! If most real-world environments are not deterministic, not fully observable, not discrete, not single agent, and temporal. What is classical planning good for? Agent Sensors? Actuators Percepts Actions Environment The model does not try to simulate the environment, it is just a tool to take good decisions. Oftentimes, reasoning with a simplified model can still lead to intelligent decisions and solutions are easier to compute than with more complex models. My two cents: Ideally, we should always provide an accurate description of the environment so that the AI simplifies it when necessary. However, automatic simplification methods are not powerful enough in all cases yet.

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 38/39 Summary There are many planning sub-areas that study the model-based sequential decision making. Non-classical planning models reason about more complex environments such as non-deterministic, partially-observable, continuous, temporal, etc. Solving these problems by computing a complete offline policy is hard (though many non-classical planners are able to do this satisfactorily in some domains). Many approaches are online, planning to decide the next action by looking into the future but without considering all alternatives. Classical planning and heuristic search techniques are still an important ingredient of many approaches that deal with complex environments.

Torralba and Wahlster Artificial Intelligence Chapter 16: Non-Classical Planning 39/39 References I