Behavioral Animation of Autonomous Virtual Agents Helped by Reinforcement Learning

Similar documents
Reinforcement Learning by Comparing Immediate Reward

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Seminar - Organic Computing

Lecture 10: Reinforcement Learning

Axiom 2013 Team Description Paper

Learning Methods for Fuzzy Systems

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

An OO Framework for building Intelligence and Learning properties in Software Agents

Laboratorio di Intelligenza Artificiale e Robotica

Laboratorio di Intelligenza Artificiale e Robotica

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Speeding Up Reinforcement Learning with Behavior Transfer

Different Requirements Gathering Techniques and Issues. Javaria Mushtaq

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

A Reinforcement Learning Variant for Control Scheduling

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

A Case-Based Approach To Imitation Learning in Robotic Agents

The Strong Minimalist Thesis and Bounded Optimality

Learning Prospective Robot Behavior

Evolutive Neural Net Fuzzy Filtering: Basic Description

Abstractions and the Brain

Automating the E-learning Personalization

TD(λ) and Q-Learning Based Ludo Players

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Visual CP Representation of Knowledge

BUILD-IT: Intuitive plant layout mediated by natural interaction

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Georgetown University at TREC 2017 Dynamic Domain Track

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

A MULTI-AGENT SYSTEM FOR A DISTANCE SUPPORT IN EDUCATIONAL ROBOTICS

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Evolution of Symbolisation in Chimpanzees and Neural Nets

Running Head: STUDENT CENTRIC INTEGRATED TECHNOLOGY

Concept Acquisition Without Representation William Dylan Sabo

UC Merced Proceedings of the Annual Meeting of the Cognitive Science Society

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Artificial Neural Networks written examination

ECE-492 SENIOR ADVANCED DESIGN PROJECT

Agent-Based Software Engineering

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Introduction to Simulation

COMPUTER-AIDED DESIGN TOOLS THAT ADAPT

PROCESS USE CASES: USE CASES IDENTIFICATION

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Emergency Management Games and Test Case Utility:

On the Combined Behavior of Autonomous Resource Management Agents

SAM - Sensors, Actuators and Microcontrollers in Mobile Robots

Action Models and their Induction

LEGO MINDSTORMS Education EV3 Coding Activities

Robot Shaping: Developing Autonomous Agents through Learning*

Radius STEM Readiness TM

Implementing a tool to Support KAOS-Beta Process Model Using EPF

A student diagnosing and evaluation system for laboratory-based academic exercises

A Context-Driven Use Case Creation Process for Specifying Automotive Driver Assistance Systems

Specification of the Verity Learning Companion and Self-Assessment Tool

Multiagent Simulation of Learning Environments

Learning and Transferring Relational Instance-Based Policies

Navigating the PhD Options in CMS

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

SOFTWARE EVALUATION TOOL

Designing Autonomous Robot Systems - Evaluation of the R3-COP Decision Support System Approach

Mexico (CONAFE) Dialogue and Discover Model, from the Community Courses Program

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

WHAT ARE VIRTUAL MANIPULATIVES?

Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform

PRODUCT COMPLEXITY: A NEW MODELLING COURSE IN THE INDUSTRIAL DESIGN PROGRAM AT THE UNIVERSITY OF TWENTE

SITUATING AN ENVIRONMENT TO PROMOTE DESIGN CREATIVITY BY EXPANDING STRUCTURE HOLES

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

An Introduction to Simio for Beginners

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Modeling user preferences and norms in context-aware systems

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Knowledge-Based - Systems

High-level Reinforcement Learning in Strategy Games

Chapter 2. Intelligent Agents. Outline. Agents and environments. Rationality. PEAS (Performance measure, Environment, Actuators, Sensors)

Shared Mental Models

A SURVEY OF FUZZY COGNITIVE MAP LEARNING METHODS

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Grade 4. Common Core Adoption Process. (Unpacked Standards)

The Enterprise Knowledge Portal: The Concept

Initial English Language Training for Controllers and Pilots. Mr. John Kennedy École Nationale de L Aviation Civile (ENAC) Toulouse, France.

A Pipelined Approach for Iterative Software Process Model

Software Security: Integrating Secure Software Engineering in Graduate Computer Science Curriculum

2017 Florence, Italty Conference Abstract

Cooperative evolutive concept learning: an empirical study

An Estimating Method for IT Project Expected Duration Oriented to GERT

Learning and Teaching

AQUA: An Ontology-Driven Question Answering System

Ecology in architecture design: Testing an advanced educational path

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Causal Link Semantics for Narrative Planning Using Numeric Fluents

Mathematics Success Grade 7

ANGLAIS LANGUE SECONDE

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

SARDNET: A Self-Organizing Feature Map for Sequences

9.85 Cognition in Infancy and Early Childhood. Lecture 7: Number

Transcription:

Behavioral Animation of Autonomous Virtual Agents Helped by Reinforcement Learning Toni Conde, William Tambellini, and Daniel Thalmann Virtual Reality Lab, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland {Toni.Conde, Daniel.Thalmann}@epfl.ch http://vrlab.epfl.ch Abstract. Our research focuses on the behavioral animation of virtual humans who are capable of taking actions by themselves. In this paper we will deal more specifically with Reinforcement Learning methodologies, which integrate in an original way the RL agent and the Autonomous Virtual Agent in a Virtual Environment. With the help of a Virtual Environment in the form of a town, we shall demonstrate that it is indeed the learning process and not the optimization of RL, which is used by the AVAs. 1. Introduction Our research is mainly focused on modeling and behavioral animation of Virtual Humans, and more specifically on the simulation of Autonomous Virtual Agents (AVAs), capable of undertaking actions themselves. In the beginning, research work was concentrated on graphics animation, but this has evolved to the integration of methodologies coupling computer graphics and classical Artificial Intelligence [1], Artificial Life (modeling of motivational level for action selection) [2, 11], behavioral animation (Story-Telling with the help of a rules-based inference engine) [3] and sociology (group movement and social life) [2]. It is in such a context that we are trying to implement humanoids, which have complex behavior. In this paper we are presenting research work in the domain of behavioral animation using methodologies of Reinforcement Learning. Main contribution: Two well-known Reinforcement Learning algorithms are applied to a virtual environment as a behavioral engine for exploration, learning and visiting a virtual environment. Therefore, contrary to the use of reinforcement learning algorithms, our interest here lies more in learning than in the exploitation of this learning. Thus, it is indeed the learning rather than the optimization that allows us to simulate the behavior of Autonomous Virtual Agents (AVAs) and this constitutes a new use of reinforcement learning. T. Rist et al. (Eds.): IVA 2003, LNAI 2792, pp. 175-180, 2003. Springer-Verlag Berlin Heidelberg 2003

176 Toni Conde, William Tambellini, and Daniel Thalmann 2. Background Humans are always situated in an environment with which they interact permanently by means of their sensors and their effectors. The classical AI techniques have demonstrated their limits very quickly, as they are mainly based on behavioral animation rules, installed beforehand by the designer. The situated AI can make up for these limitations. In fact, SAI s objective is to conceive adaptive artificial systems evolving in an environment, which is not entirely predictable. The associated methodologies in this field are inspired from biology and can be applied to AVAs, capable of interacting with their Virtual Environment (VE), in which they may pursue several goals, which can be conflicting. In order to evolve, these AVAs must use the information furnished by their sensors: they must actively search for this information by means of their effectors and interpret it in function of the environment met with and the goal pursued [10]. In this context, by behavioral animation we mean the methodologies, which make every AVA intelligent and autonomous, reacting to its environment and taking decisions based on its perceptive, memorial and logic system. By intelligence, we mean the capacity to plan and to carry out tasks based on the model of the actual state of the VE. By autonomy, we mean the capacity to visit and memorize any given VE without the intervention of an Avatar. The objective sought after is to allow the AVA to explore the VE until now unknown and to build structures in the order of cognitive models or cognitive maps based on this exploration. Once its representation has been constructed, the AVA could then easily communicate its knowledge to other naive AVAs, for example. 3. Our Novel Technique Reinforcement learning (RL) is one of the methodologies [ 6, 7] of machine learning and cognitive sciences. The RL algorithms allow one or several agents to carry out a series of optimal actions according to a given environment thanks to reward/penalty techniques. Through the repetition of non-pertinent or pertinent tests, these agents learn the task asked for. The precision of learning is in function of the time allocated. Fast learning will give a bad representation of the task asked for and long-term learning will give a more satisfying result concerning what the agent has to carry out. All RL methodologies require a balance between the research for new strategies and the use of already acquired knowledge. Let us call memory the structure stocking the preceding actions with their score. When the agent uses his memory to choose an action, we then speak about exploitation. Whenever he is looking for new ways, we speak about exploration. Good learning requires the combination of both strategies. The objective of our research is to use RL methodologies with our Virtual Reality platform [8] in order to obtain behavioral animation of AVAs in the discovery of VE.

Behavioral Animation of Autonomous Virtual Agents 177 4. Integration The algorithms of other learning methodologies like Artificial Neural Networks or Genetic Algorithms have library programs at disposal, but concerning our work we have had to conceive an engine carrying out such reinforcement learning (RL); we have taken inspiration from the C++ interface [5] entitled RLI (Reinforcement Learning Interface). 4.1 Implementation of an RL Engine Although the RLI interface, in its C++ version, proposes complete architecture as well as a set of objects, which are fairly compatible with the different RL problems, the vhdrlservice engine doesn t use them directly (see [8] for our middleware platform description.) ATQ SateRep SateAction 1.. * owns 1.. * represents 1.. * memorize vhdrlagent 1.. * perceives vhdrlworld 1.. * perceives Sate 1.. * ATQ vhdrlengine 1.. * is composed Fig. 1. The simplified UML diagram of RL class engine (vhdrlservice). In fact, the RLI interface proposes high-level architecture, which would have made the engine more complex; it also lacks the taking in charge of simulations constituted of several agents fig. 1 shows the simplified UML diagram of the vhdrlservice class. Globally speaking, the Reinforcement Learning engine is composed of three main elements, exactly like the RLI interface: VhdRLWorld, VhdRLAgent and VhdRLEngine. 4.2 Choice of Learning Algorithms Two RL algorithms have been used. Contrary to the common use of RL, the objective here isn t to find the best algorithm and parameters in order to obtain the fastest learning process, as for example in a maze. In this way, both methods have been implemented and then used in their simplest version. For Q-Learning, Q π (s, a) Q π (s, a) + α [ r + γ.q π (s, a ) - Q π (s, a) ] according to [6]. In our case, the reinforcement r is the shortest path type, that is, 1 for all actions. This allows the distance necessary to be evaluated (in number of actions) in order to reach the closest Terminal State (red color in fig. 2). This distance, negative value, is to be found in fig. 2 at the right-hand bottom of each State (in black in fig. 2).

178 Toni Conde, William Tambellini, and Daniel Thalmann Typically, the learning rate decreases with the length of the trial. However, in our engine version, it will remain constant (α = 1) during the whole length of the learning. The γ coefficient will also remain at 1. For TD-Learning, V(u) V(u) + α [ r t+1 + γ.v(s t+1 ) V(s t ) ] according to [6]. It should be noted that this algorithm is used so that the weight of the network isn t updated until the end of the trial. In other words, the Agent updates these connections only when he reaches a Terminal State; and this is due to the memorization of the route taken. For example, as indicated in fig. 2, in order to construct this network, the Agent went by Terminal State number 0 three times; therefore, he only updated his value function 3 times. Fig. 2. Network after learning with Q-Learning. 5. Experimental Results In [1, 2] the AVAs navigate inside open virtual environments (e.g. public places, streets). The targeted environment here corresponds to any type of virtual environment type imposing physical constraints of navigation (e.g. city, public buildings, houses, bank, airport, streets). In fact, as RL proposes finding a series of optimal actions in a given environment, this service is unnecessary in an open environment where all States would be interconnected. Finally, the virtual environment must contain Terminal States (defined by the service user). We have tested our new approach with a virtual environment representing a city constituted of a dozen buildings and some streets. The numbers in fig. 3 represent the States of RL. Q-Learning: Following some trials carried out with the environment defined in fig. 3, and by measuring the number of iterations needed to reach the representation of the best path (see fig. 2), we can say that on average an AVA carries out the learning in 15 trials with a choice strategy of random actions. TD-Learning: The AVA also carries out the learning in 4 trials with a choice strategy of random actions.

Behavioral Animation of Autonomous Virtual Agents 179 Fig. 3. Sate Graph (Reinforcement Learning) of Virtual City simulation. 6. Discussion and Improvement Proposals The AVAs can be considered as visitors. Thanks to parameters of learning, we can produce good or bad representations of the environment, which then allow us to simulate the behavior of lost AVAs - fig. 4 and of expert AVAs in function of the learning length. Fig. 4. AVA Visitors in a Virtual City. Fig. 5. AVAs lost in a Virtual City. Reinforcement learning seems to be well adapted to simulate the task of a visit in a virtual environment, following precise behavior: - An untargeted visit for an exploration strategy, as the goal of such AVAs is the discovery of the virtual environment without a precise objective - fig. 5. - Targeted research for strategy exploration for such AVAs only has one objective: reach their objective as quickly as possible. The latter would only use exploration during their first steps in the virtual environment. Reinforcement learning isn t used in its classical approach, but in its introduction into a virtual environment as a behavioral engine for exploration, learning and visiting the virtual environment. Therefore, contrary to reinforcement learning algorithms, our interest lies more in the learning than in the exploitation of this learning. Thus, it is indeed the learning rather than optimization that allows us to simulate the behavioral

180 Toni Conde, William Tambellini, and Daniel Thalmann animation of Autonomous Virtual Agents (AVAs); this really constitutes a new use of reinforcement learning methodology. In the actual engine, the network containing RL values indicates the policy, which allows the closest goal to be reached. In this way, the user cannot request an AVA to reach a specific goal, as the AVA directs himself to the closest Terminal State. This drawback can be resolved by using reinforcement learning with multiple goals [4]. With this technique, the engine has as many networks as there are goals at disposal; the AVA can therefore reach any goal, wherever it may be. However, this technique would use more memory, for it would require a network for each goal. The approach presented here is part of a more complex model, which is the object of our research. The goal is to realize a Virtual Life Environment for an Autonomous Virtual Agent including different interfaces and sensorial modalities coupled with different learning methodologies, which can evolve. Acknowledgments. This research has been partially funded by the Swiss National Science Foundation. References 1. Guye-Vuillème and D. Thalmann, "A High-level Architecture for Believable Social Agents", VR Journal, Springer, 5, pp. 95-106, 2001. 2. M. Kallmann, E. de Sevin and D. Thalmann,"Constructing Virtual Human Life Simulations", AVATARS Workshop, Lausanne, Switzerland, 2000. 3. J. S. Monzani, A. Caicedo and D. Thalmann, "Integrating Behavioural Animation Techniques", Proc. Eurographics 2001. 4. S. Whitehead, J. Karlsson and J. Tenenberg, "Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging", in Connell and Mahadevan, editors, Robot Learning, Kluwer Academic Publishers, 1993. 5. R. S. Sutton and J.-C. Santamaria, "A Standard Interface for Reinforcement Learning Software in C++", version 1.1. 6. R. S. Sutton and G. Barto, "Reinforcement Learning: an Introduction", MIT Press, 1998. 7. L. P. Kaebling, M. Littman and A. Moore, "Reinforcement Learning: a Survey", JAIR, volume 4, pp. 237-285, May 1996. 8. M. Ponder, G. Papagiannakis, T. Molet, N. Magnenat-Thalmann and D. Thalmann, "VHD++ Development Framework: Towards Extendible, Component Based VR/AR Simulation Engine Featuring Advanced Virtual Character Technologies", IEEE Virtual Reality, 2003. 9. A. Elizabeth, Employing AI methods to control the behavior of animated interface agents. Applied Artificial Intelligence, 13(4-5): 45-448, 1999. 10. C. Langton, Artificial Life, Addison-Wesley, 1989. 11. L. Steels and R-A. Brooks, The Artificial Life Role to Artificial Intelligence: Building Embodied Situated Agents, Laurence Ferlbaum Associates, 1995.