Behavioral Animation of Autonomous Virtual Agents Helped by Reinforcement Learning

Behavioral Animation of Autonomous Virtual Agents Helped by Reinforcement Learning Toni Conde, William Tambellini, and Daniel Thalmann Virtual Reality Lab, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland {Toni.Conde, Daniel.Thalmann}@epfl.ch http://vrlab.epfl.ch Abstract. Our research focuses on the behavioral animation of virtual humans who are capable of taking actions by themselves. In this paper we will deal more specifically with Reinforcement Learning methodologies, which integrate in an original way the RL agent and the Autonomous Virtual Agent in a Virtual Environment. With the help of a Virtual Environment in the form of a town, we shall demonstrate that it is indeed the learning process and not the optimization of RL, which is used by the AVAs. 1. Introduction Our research is mainly focused on modeling and behavioral animation of Virtual Humans, and more specifically on the simulation of Autonomous Virtual Agents (AVAs), capable of undertaking actions themselves. In the beginning, research work was concentrated on graphics animation, but this has evolved to the integration of methodologies coupling computer graphics and classical Artificial Intelligence [1], Artificial Life (modeling of motivational level for action selection) [2, 11], behavioral animation (Story-Telling with the help of a rules-based inference engine) [3] and sociology (group movement and social life) [2]. It is in such a context that we are trying to implement humanoids, which have complex behavior. In this paper we are presenting research work in the domain of behavioral animation using methodologies of Reinforcement Learning. Main contribution: Two well-known Reinforcement Learning algorithms are applied to a virtual environment as a behavioral engine for exploration, learning and visiting a virtual environment. Therefore, contrary to the use of reinforcement learning algorithms, our interest here lies more in learning than in the exploitation of this learning. Thus, it is indeed the learning rather than the optimization that allows us to simulate the behavior of Autonomous Virtual Agents (AVAs) and this constitutes a new use of reinforcement learning. T. Rist et al. (Eds.): IVA 2003, LNAI 2792, pp. 175-180, 2003. Springer-Verlag Berlin Heidelberg 2003

176 Toni Conde, William Tambellini, and Daniel Thalmann 2. Background Humans are always situated in an environment with which they interact permanently by means of their sensors and their effectors. The classical AI techniques have demonstrated their limits very quickly, as they are mainly based on behavioral animation rules, installed beforehand by the designer. The situated AI can make up for these limitations. In fact, SAI s objective is to conceive adaptive artificial systems evolving in an environment, which is not entirely predictable. The associated methodologies in this field are inspired from biology and can be applied to AVAs, capable of interacting with their Virtual Environment (VE), in which they may pursue several goals, which can be conflicting. In order to evolve, these AVAs must use the information furnished by their sensors: they must actively search for this information by means of their effectors and interpret it in function of the environment met with and the goal pursued [10]. In this context, by behavioral animation we mean the methodologies, which make every AVA intelligent and autonomous, reacting to its environment and taking decisions based on its perceptive, memorial and logic system. By intelligence, we mean the capacity to plan and to carry out tasks based on the model of the actual state of the VE. By autonomy, we mean the capacity to visit and memorize any given VE without the intervention of an Avatar. The objective sought after is to allow the AVA to explore the VE until now unknown and to build structures in the order of cognitive models or cognitive maps based on this exploration. Once its representation has been constructed, the AVA could then easily communicate its knowledge to other naive AVAs, for example. 3. Our Novel Technique Reinforcement learning (RL) is one of the methodologies [ 6, 7] of machine learning and cognitive sciences. The RL algorithms allow one or several agents to carry out a series of optimal actions according to a given environment thanks to reward/penalty techniques. Through the repetition of non-pertinent or pertinent tests, these agents learn the task asked for. The precision of learning is in function of the time allocated. Fast learning will give a bad representation of the task asked for and long-term learning will give a more satisfying result concerning what the agent has to carry out. All RL methodologies require a balance between the research for new strategies and the use of already acquired knowledge. Let us call memory the structure stocking the preceding actions with their score. When the agent uses his memory to choose an action, we then speak about exploitation. Whenever he is looking for new ways, we speak about exploration. Good learning requires the combination of both strategies. The objective of our research is to use RL methodologies with our Virtual Reality platform [8] in order to obtain behavioral animation of AVAs in the discovery of VE.

Behavioral Animation of Autonomous Virtual Agents 177 4. Integration The algorithms of other learning methodologies like Artificial Neural Networks or Genetic Algorithms have library programs at disposal, but concerning our work we have had to conceive an engine carrying out such reinforcement learning (RL); we have taken inspiration from the C++ interface [5] entitled RLI (Reinforcement Learning Interface). 4.1 Implementation of an RL Engine Although the RLI interface, in its C++ version, proposes complete architecture as well as a set of objects, which are fairly compatible with the different RL problems, the vhdrlservice engine doesn t use them directly (see [8] for our middleware platform description.) ATQ SateRep SateAction 1.. * owns 1.. * represents 1.. * memorize vhdrlagent 1.. * perceives vhdrlworld 1.. * perceives Sate 1.. * ATQ vhdrlengine 1.. * is composed Fig. 1. The simplified UML diagram of RL class engine (vhdrlservice). In fact, the RLI interface proposes high-level architecture, which would have made the engine more complex; it also lacks the taking in charge of simulations constituted of several agents fig. 1 shows the simplified UML diagram of the vhdrlservice class. Globally speaking, the Reinforcement Learning engine is composed of three main elements, exactly like the RLI interface: VhdRLWorld, VhdRLAgent and VhdRLEngine. 4.2 Choice of Learning Algorithms Two RL algorithms have been used. Contrary to the common use of RL, the objective here isn t to find the best algorithm and parameters in order to obtain the fastest learning process, as for example in a maze. In this way, both methods have been implemented and then used in their simplest version. For Q-Learning, Q π (s, a) Q π (s, a) + α [ r + γ.q π (s, a ) - Q π (s, a) ] according to [6]. In our case, the reinforcement r is the shortest path type, that is, 1 for all actions. This allows the distance necessary to be evaluated (in number of actions) in order to reach the closest Terminal State (red color in fig. 2). This distance, negative value, is to be found in fig. 2 at the right-hand bottom of each State (in black in fig. 2).

178 Toni Conde, William Tambellini, and Daniel Thalmann Typically, the learning rate decreases with the length of the trial. However, in our engine version, it will remain constant (α = 1) during the whole length of the learning. The γ coefficient will also remain at 1. For TD-Learning, V(u) V(u) + α [ r t+1 + γ.v(s t+1 ) V(s t ) ] according to [6]. It should be noted that this algorithm is used so that the weight of the network isn t updated until the end of the trial. In other words, the Agent updates these connections only when he reaches a Terminal State; and this is due to the memorization of the route taken. For example, as indicated in fig. 2, in order to construct this network, the Agent went by Terminal State number 0 three times; therefore, he only updated his value function 3 times. Fig. 2. Network after learning with Q-Learning. 5. Experimental Results In [1, 2] the AVAs navigate inside open virtual environments (e.g. public places, streets). The targeted environment here corresponds to any type of virtual environment type imposing physical constraints of navigation (e.g. city, public buildings, houses, bank, airport, streets). In fact, as RL proposes finding a series of optimal actions in a given environment, this service is unnecessary in an open environment where all States would be interconnected. Finally, the virtual environment must contain Terminal States (defined by the service user). We have tested our new approach with a virtual environment representing a city constituted of a dozen buildings and some streets. The numbers in fig. 3 represent the States of RL. Q-Learning: Following some trials carried out with the environment defined in fig. 3, and by measuring the number of iterations needed to reach the representation of the best path (see fig. 2), we can say that on average an AVA carries out the learning in 15 trials with a choice strategy of random actions. TD-Learning: The AVA also carries out the learning in 4 trials with a choice strategy of random actions.

Behavioral Animation of Autonomous Virtual Agents 179 Fig. 3. Sate Graph (Reinforcement Learning) of Virtual City simulation. 6. Discussion and Improvement Proposals The AVAs can be considered as visitors. Thanks to parameters of learning, we can produce good or bad representations of the environment, which then allow us to simulate the behavior of lost AVAs - fig. 4 and of expert AVAs in function of the learning length. Fig. 4. AVA Visitors in a Virtual City. Fig. 5. AVAs lost in a Virtual City. Reinforcement learning seems to be well adapted to simulate the task of a visit in a virtual environment, following precise behavior: - An untargeted visit for an exploration strategy, as the goal of such AVAs is the discovery of the virtual environment without a precise objective - fig. 5. - Targeted research for strategy exploration for such AVAs only has one objective: reach their objective as quickly as possible. The latter would only use exploration during their first steps in the virtual environment. Reinforcement learning isn t used in its classical approach, but in its introduction into a virtual environment as a behavioral engine for exploration, learning and visiting the virtual environment. Therefore, contrary to reinforcement learning algorithms, our interest lies more in the learning than in the exploitation of this learning. Thus, it is indeed the learning rather than optimization that allows us to simulate the behavioral

180 Toni Conde, William Tambellini, and Daniel Thalmann animation of Autonomous Virtual Agents (AVAs); this really constitutes a new use of reinforcement learning methodology. In the actual engine, the network containing RL values indicates the policy, which allows the closest goal to be reached. In this way, the user cannot request an AVA to reach a specific goal, as the AVA directs himself to the closest Terminal State. This drawback can be resolved by using reinforcement learning with multiple goals [4]. With this technique, the engine has as many networks as there are goals at disposal; the AVA can therefore reach any goal, wherever it may be. However, this technique would use more memory, for it would require a network for each goal. The approach presented here is part of a more complex model, which is the object of our research. The goal is to realize a Virtual Life Environment for an Autonomous Virtual Agent including different interfaces and sensorial modalities coupled with different learning methodologies, which can evolve. Acknowledgments. This research has been partially funded by the Swiss National Science Foundation. References 1. Guye-Vuillème and D. Thalmann, "A High-level Architecture for Believable Social Agents", VR Journal, Springer, 5, pp. 95-106, 2001. 2. M. Kallmann, E. de Sevin and D. Thalmann,"Constructing Virtual Human Life Simulations", AVATARS Workshop, Lausanne, Switzerland, 2000. 3. J. S. Monzani, A. Caicedo and D. Thalmann, "Integrating Behavioural Animation Techniques", Proc. Eurographics 2001. 4. S. Whitehead, J. Karlsson and J. Tenenberg, "Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging", in Connell and Mahadevan, editors, Robot Learning, Kluwer Academic Publishers, 1993. 5. R. S. Sutton and J.-C. Santamaria, "A Standard Interface for Reinforcement Learning Software in C++", version 1.1. 6. R. S. Sutton and G. Barto, "Reinforcement Learning: an Introduction", MIT Press, 1998. 7. L. P. Kaebling, M. Littman and A. Moore, "Reinforcement Learning: a Survey", JAIR, volume 4, pp. 237-285, May 1996. 8. M. Ponder, G. Papagiannakis, T. Molet, N. Magnenat-Thalmann and D. Thalmann, "VHD++ Development Framework: Towards Extendible, Component Based VR/AR Simulation Engine Featuring Advanced Virtual Character Technologies", IEEE Virtual Reality, 2003. 9. A. Elizabeth, Employing AI methods to control the behavior of animated interface agents. Applied Artificial Intelligence, 13(4-5): 45-448, 1999. 10. C. Langton, Artificial Life, Addison-Wesley, 1989. 11. L. Steels and R-A. Brooks, The Artificial Life Role to Artificial Intelligence: Building Embodied Situated Agents, Laurence Ferlbaum Associates, 1995.