Improved Multi-Agent Reinforcement Learning for Minimizing Traffic Waiting Time

Similar documents
Neural Network Model of the Backpropagation Algorithm

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports

Fast Multi-task Learning for Query Spelling Correction

MyLab & Mastering Business

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices

1 Language universals

More Accurate Question Answering on Freebase

Reinforcement Learning by Comparing Immediate Reward

AMULTIAGENT system [1] can be defined as a group of

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

TD(λ) and Q-Learning Based Ludo Players

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

Laboratorio di Intelligenza Artificiale e Robotica

Artificial Neural Networks written examination

On the Combined Behavior of Autonomous Resource Management Agents

Reduce the Failure Rate of the Screwing Process with Six Sigma Approach

Axiom 2013 Team Description Paper

Lecture 10: Reinforcement Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Rule-based Expert Systems

ACTIVITY: Comparing Combination Locks

Laboratorio di Intelligenza Artificiale e Robotica

Learning Methods for Fuzzy Systems

Public Speaking Rubric

MEE 6501, Advanced Air Quality Control Course Syllabus. Course Description. Course Textbook. Course Learning Outcomes. Credits.

A comparative study on cost-sharing in higher education Using the case study approach to contribute to evidence-based policy

Math 1313 Section 2.1 Example 2: Given the following Linear Program, Determine the vertices of the feasible set. Subject to:

Lecture 1: Machine Learning Basics

Improving Fairness in Memory Scheduling

An Introduction to Simio for Beginners

ACCOUNTING FOR MANAGERS BU-5190-OL Syllabus

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Impact of Educational Reforms to International Cooperation CASE: Finland

Welcome to. ECML/PKDD 2004 Community meeting

A Reinforcement Learning Variant for Control Scheduling

Philosophy 301L: Early Modern Philosophy, Spring 2012

Lab 1 - The Scientific Method

Visual CP Representation of Knowledge

An OO Framework for building Intelligence and Learning properties in Software Agents

Exclusions Policy. Policy reviewed: May 2016 Policy review date: May OAT Model Policy

Seminar - Organic Computing

Data Fusion Models in WSNs: Comparison and Analysis

INPE São José dos Campos

Intelligent Agents. Chapter 2. Chapter 2 1

ACCOUNTING FOR MANAGERS BU-5190-AU7 Syllabus

Evolutive Neural Net Fuzzy Filtering: Basic Description

A student diagnosing and evaluation system for laboratory-based academic exercises

SimCity 4 Deluxe Tutorial. Future City Competition

Firms and Markets Saturdays Summer I 2014

Agent-Based Software Engineering

Team Work in International Programs: Why is it so difficult?

LITERACY ACROSS THE CURRICULUM POLICY

Information Event Master Thesis

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)

International Environmental Policy Spring :374:315:01 Tuesdays, 10:55 am to 1:55 pm, Blake 131

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Multidisciplinary Engineering Systems 2 nd and 3rd Year College-Wide Courses

Soft Computing based Learning for Cognitive Radio

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Dinesh K. Sharma, Ph.D. Department of Management School of Business and Economics Fayetteville State University

On-Line Data Analytics

Keene State College SPECIAL PERMISSION FORM PRACTICUM, INTERNSHIP, EXTERNSHIP, FIELDWORK

Active Learning. Yingyu Liang Computer Sciences 760 Fall

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

TENNESSEE S ECONOMY: Implications for Economic Development

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

GREAT Britain: Film Brief

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

SAM - Sensors, Actuators and Microcontrollers in Mobile Robots

A Context-Driven Use Case Creation Process for Specifying Automotive Driver Assistance Systems

A Game-based Assessment of Children s Choices to Seek Feedback and to Revise

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Interaction Design Considerations for an Aircraft Carrier Deck Agent-based Simulation

FF+FPG: Guiding a Policy-Gradient Planner

GCSE. Mathematics A. Mark Scheme for January General Certificate of Secondary Education Unit A503/01: Mathematics C (Foundation Tier)

DegreeWorks Training Guide

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Circuit Simulators: A Revolutionary E-Learning Platform

Lecture 6: Applications

SIE: Speech Enabled Interface for E-Learning

Self Study Report Computer Science

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

Evolution of Symbolisation in Chimpanzees and Neural Nets

key findings Highlights of Results from TIMSS THIRD INTERNATIONAL MATHEMATICS AND SCIENCE STUDY November 1996

Kaipaki School. We expect the roll to climb to almost 100 in line with the demographic report from MoE through 2016.

Automatic Discretization of Actions and States in Monte-Carlo Tree Search

A Case-Based Approach To Imitation Learning in Robotic Agents

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Pair Programming. Spring 2015

Benjamin Pohl, Yves Richard, Manon Kohler, Justin Emery, Thierry Castel, Benjamin De Lapparent, Denis Thévenin, Thomas Thévenin, Julien Pergaud

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS

On JEE. Milind Sohoni Senate Meeting, IITB 6 th October 2016

Modeling function word errors in DNN-HMM based LVCSR systems

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

The development and implementation of a coaching model for project-based learning

Common Core Exemplar for English Language Arts and Social Studies: GRADE 1

Transcription:

Improved Muli-Agen Reinforcemen Learning for Minimizing Traffic Waiing Time Vijay Kumar M.T.U India B. Kaushik K.E.C., M.T.U., India H. Banka ISM, India ABSTRACT This paper depic using muli-agen reinforcemen learning (MAR algorihm for learning raffic paern o minimize he raveling ime or maximizing safey and opimizing raffic paern (OTP). This model provides a descripion and soluion o opimize raffic paern ha use muli-agen based reinforcemen learning algorihms. MARL uses muli agen srucure where vehicles and raffic signals are working as agens. In his model raffic area divide in differen-differen raffic ZONE. Each zone have own disribued agen and hese agen will pass he informaion one zone o oher hrew he nework. The Opimizaion objecives include he number of vehicle sops, he average waiing ime and maximum queue lengh of he nex (node) inersecion. In addiio This research also inroduce he prioriy conrol of buses and emergen vehicles ino his model. Expeced oucome of he algorihm is comparable o he performance of Q-Learning and Temporal difference learning. The resuls show significan reducion in waiing ime comparable o hose algorihms for he work more efficienly han oher raffic sysem. General Terms Learning Algorihm, Arificial Inelligence, Agen based learning. Keywords Agen Based Sysem, Inelligen Traffic Signal Conrol, Muli Objecive Scheme, Opimizaion Objecives, RL, Muli-Agen Sysem (MAS). 1. INTRODUCTION Manage he raffic in high raffic areas is a big problem. Increasing populaion size requires more efficien ransporaion sysems and hence beer raffic conrol sysem. Even developed counries are suffering high coss because of increasing road congesion levels. In he European Union (EU) alone, congesion coss 0.5% of he member counries Gross Domesic Produc (GDP) [11], [8], and his is expeced o increase o roughly 1% of he EU s GDP by 2009 if he problem is no deal wih properly. In 2002, he number of vehicles per housand persons had reached 460 which is nearly double he number (232) in 1974.In high raffic siuaions and bad driving in he EU (European Union) accouns for up o 50% of fuel consumpion on road neworks resuling in deadly emissions ha could oherwise be diminished. High raffic ranspor conribues 41% of carbon dioxide o give ou from road raffic in he EU hus resuling in serious healh and safey problems. In hese cases o avoid he high coss ha give by hese hreas, UTC has o provide some soluions o he problem of raffic managemen [11], [8]. To achieve he global goal UTC opimizaio increasing global such hreas and vehicles infrasrucure communicaing beween some sysems may be provide some exra deail.these deail may provide help for local view of he raffic condiions. In case medium raffic condiions he Wiering s mehod reduce he overall waiing ime for vehicles. This mehod reduce he waiing ime for vehicles and opimize he goal. In real raffic sysem, his model should consider differendifferen opimizaion objecives in differen raffic siuaio which is called muli-agen conrol scheme in his paper. In he free raffic siuaio presened model ry o minimize he overall number of sops of vehicles in he raffic nework. In case medium raffic siuaion his research ries o minimize he waiing ime on behalf opimal goal. In congesed raffic condiion main focused on queue lengh. So muli-agen conrol scheme can adap o differen raffic condiions and make a more inelligen raffic conrol sysem. Therefore, his model, propose a muli-agen conrol sraegy using MARL. Muli-objecive conrol and paramic simulaion model boh have some problems.firs node raffic siuaion pass o he all nex nodes. If firs node has a free raffic, his condiion will passes all he nex nodes, his is no good way for real raffic so his model will calculae raffic siuaion individually for each node. In congesed raffic siuaio queue spillovers mus be avoided o keep he nework from large-scale congesio hus he queue lengh mus be focused on [6]. In his model cycle is prevened. The value of is no fix (3) i depends on raffic conrol admin in his model.this may be 4, 5 ec. On behalf he value of his model will manage green ligh for emergen vehicles in raffic nework. In his model daa exchange beween vehicles and roadside raffic equipmens is necessary, hus vehicular ad hoc nework is uilized o build a wireless raffic informaion sysem. Therefore disribued nework helpful for uilized o develop a wireless raffic informaion sysem. Differen researchers have chosen varian ypes of arificial inelligence algorihms and mehods for he opimizaion of he raffic flow in real raffic condiions. Geneic algorihm or evoluionary algorihm is one of he mos common mehods inroduced ino he raffic conrol sysem. Rouing of raffic flow using geneic algorihm has shown some improvemen in he raffic conrol. Fuzzy logic conrol is also useful ino he raffic ligh sysems for beer conrol of raffic flow. Increase performance of real raffic ligh sysem is build wih some idea such ha increases green ligh ime period for vehicles. Anoher approach o improve he raffic conrol is using wireless nework communicaions beween vehicles and raffic conrol sysems o ge raffic informaion for raffic flow. This informaion can use for opimizaion in raffic sysem in medium and high raffic condiions. Reinforcemen learning echnique is used in cerain research sudies for he raffic flow conrol and 30

opimizaions. So reinforcemen learning echnique can be applied in raffic signal conrol effecively o response o he frequen change of raffic flow and ouperform radiional raffic conrol algorihm ha helpful for opimaliy, reducing raffic delay and build a beer raffic ligh sysem. This model are minimizing ravel ime or maximizing safey, Minimizing vehicle ravel ime, reducing raffic delay, increasing vehicle velociy, and prioriizing emergency raffic Since OTP conrollers by hand is a complex and edious ask,his research sudy how muli-agen reinforcemen learning(mar algorihms can be used for his goal. 2. AGENT BASED MODEL OF TRAFFIC SYSTEM In his model use an agen-based model o describe he pracical raffic sysem. In he roa here are wo ypes of agen one is vehicles and anoher is raffic signal conrollers called as disribued agens. Traffic informaion will be exchange beween hese agens. There are some possibiliy for each raffic conrollers ha preven raffic hreas and accidens. Two raffic lighs from opposing direcions allow vehicle o go sraigh ahead o urn righ, wo raffic lighs a he same direcion of he inersecion allow he vehicle from here o go sraigh ahea urn righ or urn Lef. When new vehicle have been added he raffic ligh decisions are made and each vehicle moves o cell if cell is no occupied.this decision conrol by he raffic sysem according o raffic condiions. There for, each vehicle is a a raffic a direcion a he node (dir), a posiion in he queue (place) and has a paricular desinaion (des). This model use [ place, in sor ([ o denoe he sae of each vehicle [7].The main objec is opimizaion wih reduce waiing ime,number of sops and raffic queue lengh. One name is Reinforcemen Learning ha suppor dynamic environmen using dynamic programming. A more popular approach is o use model-based reinforcemen learning, in which he ransiion and reward funcions are esimaed from experience and hen used o find a policy via planning mehods like dynamic programming. 3.1 Simple model Figure 2 shows he learning process of an agen. A each ime se he agen receives a reinforcemen feedback from he environmen along wih he curren sae. The goal for he agen is o creae an opimal acion selecion policy p o maximize he reward. In many cases, no only he immediae reward bu also he subsequen rewards Delayed rewards? should be considered when acions are aken. Fig 2: Agen wih sae and acion Agen and environmen inerac a discree ime seps: 0,1,2,k Agen observes sae a sep: Produces acion a sep: Ges resuling reward: : s a A r R 1 ( s ) S s And resuling nex sae: 1 Fig 1: Agen Based Model. In his model Q([ acion) o represen he oal expeced value of opimized indices for all raffic lighs for each vehicle. This process will be coninue unil vehicles arrive a he desinaion goal. In Wiering s model, consider firs node raffic siuaion pass o he all nex nodes. If firs node has a free raffic, his condiion passes all he nex nodes bu his model will calculae raffic siuaion individually for each node. This is he mos impor difference beween his model and Wiering s model. 3. REINFORCEMENT LEARNING FOR TRAFFIC CONTROL Previously several mehods for learn raffic have been developed like Sarsa and Q-learning.These all echniques suffered wih same problem in high raffic condiions. In urban or congesed raffic, hese echnique are no scale o muli-agen Reinforcemen Learning. In urban raffic may be possible ha raffic grows dynamically. So need a dynamic mehod for handle urban raffic ha grow dynamically. Q- learning and Sarsa hey are applied only o small nework. Fig 3: A general process model of RL [8] 3.2 This Basic Elemens of Reinforcemen Learning 1. Model of he process 2. Reward funcions. 3. Learning objecive. 4. Conrollers. 5. Exploraion. 3.3 Muli-agen Frame work The muli-agen framework is based on he same idea of Figure 2 bu, his Time, here are several agens deciding on acions over he environmen. The big difference resides in he fac ha all each agen probably has some effec on he environmen an so, acions can have differen oucomes depending on wha he oher agens are doing. Nex Fig. shows he muli-agen model or framework. 31

d [ C([ pos, Re d)/ C([ (2) Where C([ vehicle in he sae of C([ Re d) he ligh urns red in such sae. is he number of imes a [ is he number of imes 4.2 Medium raffic condiion In medium raffic condiion main goal of his model is o minimize he overall waiing ime of vehicles. If number of vehicles are larger 100 bu less han 150, i is consider as medium raffic., Fig 4: Muli-Agen Model In addiion o benefis owing o he disribued naure of he muli-agen soluio such as he speedup made possible by parallel compuaio muliple RL agens can harness new benefis from sharing experience, e.g., by communicaio eaching. Conversely, besides challenges inheried from single-agen RL, including he curse of dimensionaliy. 4. MULTI-AGENT CONTROL ALGORITHM BASED ON REINFORCEMENT LEARNING The muli-agen conrol algorihm considers hree ypes of raffic siuaions as follows less raffic (low raffic or free raffic) siuaio medium raffic siuaion and congesed raffic siuaion. 4.1 Free raffic condiion The number of sops will increase when a vehicle moving a a green ligh in curren ime sep mee a red ligh in he nex ime sep. In free raffic condiion he main goal is o minimize he number of sops. So use Q ([ Green) as he expeced cumulaive number of sops. The formulaion of Q ([ Green) is shown as follows. Q([ Green) ( dir', ) d [ ( R([ dir [ Q([ Green)) (1) [ Where means he sae of a vehicle in nex ime sep; d [ gives he probabiliy ha he raffic ligh urns red in nex ime sep; R([ dir, [ is a reward funcion as follows: if a vehicle says a he same raffic ligh, hen R=1, oherwise R=0, (he vehicle ges hrough his inersecion and eners he nex one); is he discoun facor (0 < < 1) which ensure he Q-values are bounded. The probabiliy ha a raffic ligh urns red is calculaed as follows. V ([ P( L [ L LQ ([ (3) Q([ L ( dir' pos) L [ n', ( R[ [ n', des ]) ( n', )) V (4) Where is L he raffic ligh sae (red or green), P ( L [, is calculaed in he same way as equaion 2, ( R [ [ n', is defined as follows as: if a vehicle says a he same raffic ligh, hen R=1, oherwise R=0 and use for force o be green ligh 10. 4.3 Congesed raffic condiion In his condiio spillovers of queue mus be avoided which will minimize he raffic conrol effec and probably cause large-scale raffic congesion. Q([ Green) ( dir' pos) Gree[ ( R([ [ R' ([,[ node ', V ( des'])) (5) Q([ Re d) ( dir', ) Re [ ' ( R ([ [ V ([ )) (6) Where Q ([ and V ([ have he same meanings as 32

under medium raffic condiion. Compared equaion 5 wih equaion 4, anoher reward funcion R '([ [ is added o indicae he influence from raffic condiion a he nex and use for force o be green ligh, 10 R ([ dir, [ Is he reward of vehicles waiing ime while R' ([ [ indicaes he reward from he change of he queue lengh a he nex raffic node. Consider queue lengh when design Q learning procedure, l ' denoe he max queue lengh a nex raffic ligh so l ' can wrien as K. L is he capaciy of he α is he adjusing facor ha lane of nex raffic ligh and deermine queue lengh K l' as follows: 0 IF K l' 0.8L 0 k ) 1.0 IF 0.8L Kl' (7) ( Tl0. 8 L.2 IF Kl L The larges value is se o.2 in his model. 4.4 Prioriy Conrol for Emergen Vehicles In case emergency vehicles like Fire Truck ambulances, Prime Miniser Vehicles ec. so need o manage raffic ligh when hese condiions were arise. For hese siuaions give high prioriy for hese ypes of vehicle. The raffic adminisraor can manage raffic ligh according o raffic condiions. If emergency condiion arises he admin of raffic conrol can reduce ime of he green ligh ha is se prioriy according o ype of vehicles for green ligh. In prioriy condiion he main focus manage green ligh on behalf his, presen model can reduce waiing ime for emergency vehicles. Q ([ pos, Green([ des', ])( R([ [ des' V[ pos ') (8) 5. RESULT In his research 1000 ime seps use for simulaion. For learning process 2000 seps use, and 2000 seps were also used for simulaion resul. The value 0.9 se o facor in his model. is se o be according o emergen Vehicles siuaion ha is for Fire Truck and ambulance he prioriy of green ligh may be differ, no 3(fix). If in a minue number of vehicles are 100 enering in raffic nework, i is consider as free raffic. If number of vehicles are larger 100 bu less han 150, i is consider as medium raffic, and number of vehicles are larger han 150 i is consider as congesed (high raffic) raffic condiion. 5.1 Comparison of average waiing ime Comparison of average waiing ime regard o he increasing of raffic volume rapidly is shown in figure 5.TD means emporal difference, QL means Q-learning algorihm, MARL means Muli-agen reinforcemen learning algorihm he model proposed in his paper. The nex able shows a daa se used in TD, QL, and MARL. Table 1 Visiing Poins wih Q-Capaciy and Q-Lengh visiing Poins q-capaciy q-lengh Lambeh 1000 50 Waford 500 150 WesDrayon 800 100 Leaherhead 900 200 Oford 800 700 Darford 950 200 Loughon 600 105 Aylesford 800 600 Table 2 Visiers disance Visiors Lambeh Waford WesDrayon Leaherhead Oford Darford Loughon Aylesford Lambeh 0 25 30 28-1 27 22-1 Waford 25 0 40-1 -1-1 52-1 WesDrayon 30 40 0 45-1 -1-1 -1 Leaherhead 28-1 45 0 47-1 -1-1 Oford -1-1 -1 47 0 22-1 35 Darford 27-1 -1-1 22 0 32 33 Loughon 22 52-1 -1-1 32 0-1 Aylesford -1-1 -1-1 -1 33-1 0 In Table 2 visiors disance,-1 show here is no any pah beween wo visior nodes. Number of sops under he muli-agen RL conrol will be less han hose under oher conrol sraegies like TD and Q- learning. Reinforcemen learning who minimize number of sops comparable o TD and Q-learning echnique in case medium raffic and congesed raffic condiions. 6. CONCLUSION This paper presened he muli-agen RL conrol algorihm based on reinforcemen learning. The simulaion indicaed ha he MARL go he minimum waiing ime under free raffic, comparable QL, TD. MARL could effecively preven he queue spillovers o avoid large scale raffic jams. There are sill some sysem parameers ha should carefully be deermined by hand. For, example, he adjusing facor α indicaing he influence of he queue a he nex raffic node o 33

he waiing ime of vehicles a curren ligh under congesed raffic condiion. This is a very imporan parameer, which we should furher research is deermining way based fuzzy logic approach such as crisp o fuzzy conversion such as Lambda cus for minimizing raffic paern. Neural nework as a ool can also be used for deecing rends in raffic paerns and o predic minimal waiing ime for raffic. Fig 5: Simulaion beween TD, QL and MARL by increasing he opposie raffic lengh. 7. ACKNOWLEDGMENTS Firs and foremos, I would like o express my sincere hanks o my paper advisor Associaive Prof. Baijnah Kaushik for providing me heir precious advices and suggesions. This model wouldn have been a success for me wihou heir cooperaion and valuable commens and suggesions. I also wan o express my graiude o Prof. P. S. Gill (H.O.D.) and Associaive Prof. Sunia Tiwari (M.Tech. Coordinaor) for heir suppor, kind hel coninued ineres and inspiraion during his work. 8. REFERENCES [1] Bowling,M.: Convergence and no-regre in muliagen learning. In: L.K.Saul, Y.Weiss, L. Boou (eds.) Advances in Neural Informaion Processing Sysems 17, pp. 209 216. MIT Press (2005). [2] Bus oniu, L., De Schuer, B., Babuˇska, R.: Muliagen reinforcemen learning wih adapive sae focus. In: Proceedings 17h Belgian-Duch Conference on Arificial Inelligence (BNAIC-05), pp. 35 42. Brussels, Belgium (2005). [3] Chalkiadakis, G.: Muliagen reinforcemen learning: Sochasic games wih muliple learning players. Tech. rep., Dep. of Compuer Science, Universiy of Torono, Canada (2003). [4] Guesri C., Lagoudakis, M.G., Parr, R.: Coordinaed reinforcemen learning. In: Proceedings 19h Inernaional Conference on Machine Learning (ICML- 02), pp. 227 234. Sydney, Ausralia (2002) [5] Hu, J., Wellma M.P.: Nash Q-learning for general-sum sochasic games. Journal of Machine Learning Research 4, 1039 1069 (2003) [6] M.Wiering, e al (2004). Inelligen Traffic Ligh Conrol. Technical Repor UU-CS-2004-029, Universiy Urech. [7] M.Wiering (2000). Muli-Agen Reinforcemen Learning for Traffic Ligh Conrol. Machine Learning: Proceedings of he 17h Inernaional Conference (ICML 2000), 1151-1158. [8] Michell, T. M. (1995) he Book of Machine Learning: McGraw-HILL INTERNATIONAL EDITIONS. [9] Nunes L., and Oliveira, E. C. Learning from muliple sources. In Proceedings of he 3rd Inernaional Join Conference on Auonomous Agens and Muli Agen Sysems, AAMAS (New York, USA, July 2004), vol. 3, New York, IEEE Compuer Sociey, pp. 1106 1113. [10] Oliveira, D., Bazza A. L. C., and Lesser, V. using cooperaive mediaion o coordinae raffic lighs: a case sudy. In Proceedings of he 4h Inernaional Join Conference on Auonomous Agens and Muli Agen Sysems (AAMAS) (July 2005), New York, IEEE Compuer Sociey, pp. 463 470. [11] Price, B., Bouilier, C.: Acceleraing reinforcemen learning hrough implici imiaion Journal of Arificial Inelligence Research 19, 569 629 (2003). [12] Ta M.: Muli-agen reinforcemen learning: Independen vs. cooperaive agens. In: Proceedings 10h Inernaional Conference on Machine Learning (ICML- 93), pp. 330 337. Amhers, US (1993). IJCA TM : www.ijcaonline.org 34