A Leader-Follower Computational Learning Approach to the Study of Restructured Electricity Markets: Investigating Price Caps

Similar documents
Neural Network Model of the Backpropagation Algorithm

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments

Fast Multi-task Learning for Query Spelling Correction

1 Language universals

MyLab & Mastering Business

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports

More Accurate Question Answering on Freebase

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices

Reinforcement Learning by Comparing Immediate Reward

ECON 484-A1 GAME THEORY AND ECONOMIC APPLICATIONS

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

When!Identifying!Contributors!is!Costly:!An! Experiment!on!Public!Goods!

Laboratorio di Intelligenza Artificiale e Robotica

In Workflow. Viewing: Last edit: 10/27/15 1:51 pm. Approval Path. Date Submi ed: 10/09/15 2:47 pm. 6. Coordinator Curriculum Management

On the Combined Behavior of Autonomous Resource Management Agents

Reduce the Failure Rate of the Screwing Process with Six Sigma Approach

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Lecture 1: Machine Learning Basics

A Case-Based Approach To Imitation Learning in Robotic Agents

Artificial Neural Networks written examination

2017 National Clean Water Law Seminar and Water Enforcement Workshop Continuing Legal Education (CLE) Credits. States

AMULTIAGENT system [1] can be defined as a group of

Laboratorio di Intelligenza Artificiale e Robotica

E mail: Phone: LIBRARY MBA MAIN OFFICE

Probability and Game Theory Course Syllabus

Seminar - Organic Computing

Cooperative Game Theoretic Models for Decision-Making in Contexts of Library Cooperation 1

Lecture 10: Reinforcement Learning

Keith Weigelt. University of Pennsylvania The Wharton School Management Department 2022 Steinberg-Dietrich Hall Philadelphia, PA (215)

Firms and Markets Saturdays Summer I 2014

College Pricing and Income Inequality

High-level Reinforcement Learning in Strategy Games

College Pricing and Income Inequality

Economics 201 Principles of Microeconomics Fall 2010 MWF 10:00 10:50am 160 Bryan Building

ACTIVITY: Comparing Combination Locks

WIC Contract Spillover Effects

Axiom 2013 Team Description Paper

Learning and Retaining New Vocabularies: The Case of Monolingual and Bilingual Dictionaries

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Team Work in International Programs: Why is it so difficult?

CREST Working Paper. Voluntary Provision of Public Goods: The Multiple Unit Case. Mark Bagnoli. Shaul Ben-David Michael McKee

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

TUESDAYS/THURSDAYS, NOV. 11, 2014-FEB. 12, 2015 x COURSE NUMBER 6520 (1)

Conceptual and Procedural Knowledge of a Mathematics Problem: Their Measurement and Their Causal Interrelations

Practical Integrated Learning for Machine Element Design

DfEE/DATA CAD/CAM in Schools Initiative - A Success Story so Far

By Laurence Capron and Will Mitchell, Boston, MA: Harvard Business Review Press, 2012.

Intermediate Microeconomics. Spring 2015 Jonas Vlachos A772,

Starting the Conversation about Feedback. Jennifer Marten. Plain Talk About Reading February 9-11, 2015 New Orleans

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Virtual Seminar Courses: Issues from here to there

Three Strategies for Open Source Deployment: Substitution, Innovation, and Knowledge Reuse

Development of Multistage Tests based on Teacher Ratings

medicaid and the How will the Medicaid Expansion for Adults Impact Eligibility and Coverage? Key Findings in Brief

Macroeconomic Theory Fall :00-12:50 PM 325 DKH Syllabus

Guidelines and additional provisions for the PhD Programmes at VID Specialized University

teacher, peer, or school) on each page, and a package of stickers on which

Motivating developers in OSS projects

TEAM NEWSLETTER. Welton Primar y School SENIOR LEADERSHIP TEAM. School Improvement

ELLEN E. ENGEL. Stanford University, Graduate School of Business, Ph.D. - Accounting, 1997.

Curriculum for the Academy Profession Degree Programme in Energy Technology

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Global Television Manufacturing Industry : Trend, Profit, and Forecast Analysis Published September 2012

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

A Game-based Assessment of Children s Choices to Seek Feedback and to Revise

Coordination Challenges in Global Software Development

Unequal Opportunity in Environmental Education: Environmental Education Programs and Funding at Contra Costa Secondary Schools.

Disambiguation of Thai Personal Name from Online News Articles

Detecting English-French Cognates Using Orthographic Edit Distance

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

ABET Criteria for Accrediting Computer Science Programs

I set out below my response to the Report s individual recommendations.

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Discriminative Learning of Beam-Search Heuristics for Planning

EUROPEAN COMMISSION DG RTD

Procedia - Social and Behavioral Sciences 209 ( 2015 )

Alan D. Miller Faculty of Law and Department of Economics University of Haifa Mount Carmel, Haifa, 31905, Israel

Capturing and Organizing Prior Student Learning with the OCW Backpack

A Profile of Top Performers on the Uniform CPA Exam

T Seminar on Internetworking

An Empirical and Computational Test of Linguistic Relativity

Thought and Suggestions on Teaching Material Management Job in Colleges and Universities Based on Improvement of Innovation Capacity

Preliminary Report Initiative for Investigation of Race Matters and Underrepresented Minority Faculty at MIT Revised Version Submitted July 12, 2007

Direct and Indirect Passives in East Asian. C.-T. James Huang Harvard University

A. What is research? B. Types of research

TD(λ) and Q-Learning Based Ludo Players

TCH_LRN 531 Frameworks for Research in Mathematics and Science Education (3 Credits)

Application Paralegal Training Program. Important Dates: Summer 2016 Westwood. ABA Approved. Established in 1972

Average Loan or Lease Term. Average

Formative Assessment in Mathematics. Part 3: The Learner s Role

Program Assessment and Alignment

THE ECONOMIC IMPACT OF THE UNIVERSITY OF EXETER

University of Essex Access Agreement

Special Education Services Program/Service Descriptions

IN THIS UNIT YOU LEARN HOW TO: SPEAKING 1 Work in pairs. Discuss the questions. 2 Work with a new partner. Discuss the questions.

REVIEW OF CONNECTED SPEECH

Motivation to e-learn within organizational settings: What is it and how could it be measured?

An Online Handwriting Recognition System For Turkish

Transcription:

A Leader-Follower Compuaional Learning Approach o he Sudy of Resrucured Elecriciy Markes: Invesigaing Price Caps Kurian Tharakunnel and Siddharha Bhaacharyya Deparmen of Informaion and Decision Sciences Universiy of Illinois a Chicago khara1@uic.edu, sidb@uic.edu Absrac This paper discusses he use of a compuaional learning approach based on a leader-follower muliagen framework in he sudy of regulaion of resrucured elecriciy markes. In a leader-follower muliagen sysem (LFMAS), a leader (regulaor) deermines an appropriae incenive, which moivaes a se of self-ineresed followers (he generaors, in his case) o ac such ha some measure of overall performance is maximized. In he compuaional learning approach presened, models of followers as well as he leader incorporae reinforcemen learning, allowing he exploraion of oucomes wih differen incenives, and also he learning of opimal incenive given some measure of desired overall performance. The approach is demonsraed in sudying he effec of price caps on he oucome of elecriciy aucions (uniform and discriminaory) in oligopoly seings for which analyical reamens do no exis. 1. Inroducion Search for effecive marke and non-marke mechanisms ha address marke weaknesses such as marke power associaed wih he operaion of resrucured elecriciy markes has been on ever since he resrucuring effors were underaken by he various governmens. This has moivaed several sudies ha examined he various faces of his issue employing a variey of mehods including analyical, empirical, and simulaion. Several feaures unique o he elecriciy markes such as inelasiciy of demand and inabiliy o sore elecriciy make heir sudy a complex ask. Thus, mos sudies on elecriciy markes rely on sylized models ha absrac away several of hese complexiies and focus on one or more of he issues. A promising recen approach ha allows for incorporaing beer realism in elecriciy marke models is he agenbased compuaional learning approach. In an agen-based model, he elecriciy marke is modeled as a muliagen sysem consising of auonomous agens wih learning capabiliies. Several sudies in he pas have employed his approach o he sudy of elecriciy markes [3], [4]. However, mos agen-based compuaional sudies of elecriciy markes focus on he marke design aspecs and do no include non-marke mechanisms like price caps. Sudies of resrucured elecriciy markes in general have ignored he non-marke mechanisms or have failed o consider hem in a comprehensive fashion [10]. In his work we presen a muliagen model for he sudy of non-marke mechanisms (also known as regulaory mechanisms) in resrucured elecriciy markes. From a muliagen perspecive, here is a fundamenal difference beween he srucure of a marke design problem and ha of a regulaion problem. In a marke design problem, he sraegic ineracion is amongs a se of generaors ha have similar roles and objecives. On he oher hand, in a regulaion problem, a regulaor ineracs wih a se of compeing generaors. While he individual generaors have profi maximizaion as he objecive, he regulaor s objecive is ofen maximizaion of some measure of overall performance such as social welfare. Regulaion problems hus have a hierarchical srucure. We presen a muliagen model ha is especially suied for he sudy of regulaion problems. In his model, he regulaed marke is 1530-1605/08 $25.00 2008 IEEE 1

modeled as a leader-follower muliagen sysem (LFMAS). This model has a leader-agen represening he regulaor ha designs he regulaory inpu, and follower-agens represening generaors who ake his inpu ino consideraion while deermining heir own acions. Followers are self-ineresed, and he leader s regulaory acion provides an incenive ha affecs he followers payoffs from differen acions, such ha heir combined acions lead o he maximizaion of some measure of overall performance (e.g., social welfare). Leaderfollower problems [1] are models of hierarchical decision problems wih applicaions in regulaion and conrol. For example, Keyhani [11] proposed a leader-follower framework for he conrol of elecriciy markes. In he compuaional learning approach ha his paper presens, boh he leader and he followers models incorporae reinforcemen learning. The followers joinly learn o ac such ha heir own self-ineresed goals are maximized, while he leader s learning seeks he opimal regulaory conrol wih respec o he leader s goal. We show how his approach can be useful by examining some recen sudies peraining o price-cap regulaion in elecriciy markes. Mos resrucured elecriciy markes in operaion use price caps as par of heir regulaory mechanism. For example, Pennsylvania-New Jersey-Maryland (PJM) marke has elaborae rules for seing price caps in is operaion. While price caps are pervasive in elecriciy markes, here are differing views abou heir effeciveness as a regulaory ool. Price caps have been employed as a ool for prevening excessive bidding by generaing companies. Theoreically, an appropriaely se price cap is shown o lower prices and increase aggregae oupu from he firms in he marke [2]. On he oher hand, i has been shown ha use of price caps, in he long run, may resul in disincenive for invesmen and hus may lead o shorage of capaciy [10]. We emphasize here ha in his paper our focus will be in he shor run effecs of price caps in conrolling marke power. We consider price caps in he conex of markes operaing on uniform or discriminaory (pay-as-bid) aucion formas. Specifically, we examine he effec of price caps when he generaors face an uncerain demand. A random demand can occur due o uncerain evens like plan or ransmission line ouages, exreme weaher, ec. and is especially relevan in dayahead markes where he suppliers submi bids ha remain valid for 24 hours [7]. A recen sudy [6] showed ha under cerain assumpions, he resuls abou price caps in he cerainy case do no hold in general when he demand is uncerain. Specifically, using a Courno model of he marke his sudy showed ha he average prices migh no be non-decreasing in price caps under he presence of demand uncerainy. We examine his phenomenon in he conex of uniform and discriminaory aucions in an oligopolisic seing. The resuls from our experimens did no indicae his behavior in he aucion seings we considered. However, our resuls show ha wih uncerain demand, he sraegic behavior of generaors in a oligopolisic seing are quie differen from ha in a duopoly seing in boh aucion formas. The res of he paper is organized as follows. The leader-follower muliagen model is inroduced in Secion 2 and he compuaional learning approach based on his model is described in Secion 3. Secion 4 describes elecriciy aucions. Secion 5 presens he resuls from experimens using he proposed compuaional learning approach. Discussion of conribuions, limiaions of his sudy, and fuure work appear in Secion 6. 2. Leader-Follower Muliagen Model Follower 1 (a 1 ) Leader (u) Follower 2 (a 2 ).... Figure 1. A Leader-Follower Muliagen Sysem (LFMAS) Follower N (a N ) A Leader-Follower Muliagen Sysem (LFMAS) (Figure 1) consiss of a single leader agen and N, ( N > 1) follower agens. n n Le u U Rand a A R, n= 12,,, N are he acions available o he leader and he followers respecively. The leader and he 2

followers make heir decisions sequenially, wih he leader making he decision firs and announcing i o he followers. The followers, afer knowing leader s decision, make heir individual decisions concurrenly and compeiively. In oher words, he followers play a Nash game afer knowing he leader s decision. The payoffs of he leader and he followers are inerrelaed in he sense ha he followers payoffs are coningen on he leader s decision while he leader s payoff is a funcion of he followers acions. In game heory, he abovedescribed sraegic ineracion beween he leader and he followers is known as a Sackelberg game and he associaed equilibrium soluion is known as a Sackelberg equilibrium [1]. Le he payoff funcions of he leader and he l 1 2 N followers be V ( u, a, a,, a ), and f1 1 2 N f 2 1 2 N V ( u, a, a,, a ), V ( u, a, a,, a ) fn 1 2 N, V ( u, a, a,, a ) respecively. For a given incenive u announced by he leader, le 1 2 he N -uple ( a,,, N u au a u ) be he unique Nash equilibrium for he followers subgame such ha V ( u, a, a,, a ) V ( u, a, a,, a ), a A f 1 1 2 N f 1 1 2 N 1 1 u u u u u V ( u, a, a,, a ) V ( u, a, a,, a ), a A V 1 2 1 1 2 ( ua,, a,, a ) V ( ua,, a,, a ), a A f 2 1 2 N f1 1 2 N 2 2 u u u u u fn N f N N N u u u u u The leader s problem is o deermine he opimal incenive u such ha l 1 2 arg max ( N u = V u, a, a,, a ) (1) u u u u 1 2 The ( N + 1) uple ( u, a, a,, a N ) u u u is hen called a Sackelberg equilibrium. I is assumed ha he followers subgame has a unique Nash equilibrium for every incenive decision announced by he leader. The above approach o he soluion of Sackelberg games is analyically inracable when here are more han wo followers. Furher, he above soluion approach assumes ha he leader knows he payoff funcions of every follower and can compue he corresponding equilibrium of he followers subgame for each of i s acions and hereby arrive a is opimal decision. Also, he followers are assumed o have common knowledge of heir payoff funcions o play he corresponding equilibrium acions. This approach hus pus srong assumpions abou he informaional and compuaional capabiliies of he leader and he followers. In conras, he compuaional learning approach we propose in his work assumes limied informaion requiremens for he players. In paricular, we assume ha he leader and he followers observe only he rewards hey receive for heir acions. The nex secion presens his approach. 3. Compuaional Learning Approach In he compuaional learning approach we use, he leader and he followers are adapive learners ha use simple reinforcemen learning (RL) schemes o learn heir respecive opimal sraegies- leader learning he opimal incenive, and he followers learning heir corresponding equilibrium responses. 3.1. Reinforcemen learning preliminaries Reinforcemen learning [15] is a popular model of learning frequenly employed in agenbased models o represen agen behavior. In his model of learning, agens use rewards received from pas acions o learn opimal acions. One of he mos successful and popular RL schemes is he Q -learning proposed by Wakins [16]. In wha follows we use a very simple form of Q - learning o describe he basic approach of RL. Q -learning uses a se of quaniies (one for each admissible acion of he agen) called Q values which are basically esimaes of he expeced rewards for differen acions. The algorihm sars wih some iniial esimaes for Q values. Subsequenly, a every ime sep, he Q value corresponding o he curren acion a is updaed as follows: Q ( ) ( ) λ 1 ( ( ) ( ) + a = Q a + R a Q a ) where Ra ( ) is he reward received for aking acion a a ime, and λ(0 < λ < 1) is he learning rae ha conrols he magniude by which he Q -values are modified a each updaing sep. The learning rae is gradually 2 decayed so ha λ = and λ = 0. 3

I is proved ha (when Q -values are sored in a abular forma) Q -learning converges o he opimal Q -values under he condiion ha each admissible acion is performed in each sae infiniely ofen in infinie number of decision epochs. In pracice, his condiion is me by implemening an explore/exploi acion selecion scheme for he agen. Under his scheme, a every ime sep, he agen selecs a random acion wih some small probabiliy. One widely used echnique for his purpose is he Bolzmann acion selecion scheme [15]. Under his scheme, a any ime sep, he probabiliy of selecing Q( s, a) / T e acion a when he sae is s is Q( s, a) / T e a where Qsa (, ) is he curren esimae of Q - value for sae-acion pair ( s, a ) and T a emperaure parameer ha conrols he degree of randomness in acion selecion. The emperaure T is gradually reduced from a predeermined maximum o a predeermined minimum by an appropriae decaying scheme. There have been several aemps o exend he Q-learning approach o muliagen sysems [4],[9], and [13]. These effors focus on RL schemes for muliagen sysems where he agens have symmerical roles and he associaed game heoreic soluion concep is a Nash equilibrium. In conras, he RL approach we presen in his work is for leader-follower muliagen sysems where he agens have asymmeric roles and as shown in he previous secion, he associaed game heoreic soluion concep is a Sackelberg equilibrium. 3.2. An RL approach for LFMAS The proposed RL approach consiss of leader s learning scheme modeled as a single agen RL and followers learning scheme modeled as a muliagen RL. A crucial poin hough is, he coupling of hese wo learning processes so ha condiion (1) is achieved. This coupling is obained by making he leader s learning process learn a a slower rae han he followers learning process. We assume ha he leader makes a new decision every m period while he followers repeaedly play he game for m periods. The leader s learning scheme hen resembles he single agen Q-learning excep for he fac ha he immediae reward for he leader is he reward accrued over m periods. The followers learning scheme is an adapaion of a Q-learning algorihm for repeaed games proposed recenly by Leslie and Collins [12]. There are wo imporan feaures ha make his algorihm especially aracive for our purposes. Firs, his algorihm uses player dependen learning raes for he updae of Q- values of individual agens, which under cerain assumpions have good convergence properies. Secondly, he algorihm uses a smooh bes response (SBR) [8] scheme for acion selecion ha enables he agens o learn mixed sraegies. This is imporan in many applicaions especially he regulaion problem we address in his work where he generaors have a mixed sraegy equilibrium. The original Q-learning algorihm, being an opimum-seeking algorihm, can learn only pure sraegies. The use of SBR acion selecion addresses his issue. A SBR scheme mainains a posiive probabiliy for every acion in an agen s acion se o ge seleced. One way o implemen an SBR acion selecion scheme is o use a Bolzmann acion selecion scheme described earlier wih he emperaure parameer held consan a a very small value. When T is held consan, every acion in he acion se will always have a posiive probabiliy (hough small) of geing seleced. [12] show ha, wih smooh bes responses, an agen s sraegy converges owards a Nash disribuion which is an approximaion of he mixed sraegy Nash equilibrium of he game. The proposed RL algorihm for LFMAS is as follows. 1. Leader sars wih an incenive u. 2. Followers play a game each follower n selecs an acion a n according o SBR scheme each follower n receives a reward r n and updaes is Q value Q n using he following updae scheme n n n n n n n n Q( ua, ) Q( ua, ) + λ r Q( ua, ) 3. Sep 2 is repeaed m imes l 4. Leader receives he aggregae reward r since he las incenive decision and updaes is Q value Q l using he following updae scheme l l l l l Q ( u) Q ( u) + λ r ( ) m Q u 5. If he erminaion crierion is no me he leader selecs an incenive u using explore/exploi acion selecion scheme Reurn o Sep 2. 4

The parameer m decides he number of imes he followers play he game before he leader updaes is Q value. Noe ha, he updae of he leader and he followers happen a differen ime scales. Follower Q -values are updaed in every ime period whereas for he leader, he updae happens only once in m periods. The followers mainain separae se of Q -values for each incenive decision of he leader. The player dependen learning raes for he followers learning is implemened as follows. The learning rae for follower n a ime sep is se as λ = ( + ) θ n n n C where θ (0. 5, 1] and C a consan. By selecing θ n differenly for each follower, he required sequence of learning raes is obained. Each follower agen employs a SBR acion selecion scheme ha uses he Bolzmann funcion wih he emperaure parameer T se a a very small value. As discussed earlier, his enables he followers o learn mixed sraegies. We also use he Bolzmann funcion for explore/exploi acion selecion of he leader, bu wih a decaying T as in single-agen Q-learning. 4. Elecriciy aucions Mos resrucured wholesale elecriciy markes use aucions as he primary marke mechanism for elecriciy rading. While here are several forms of aucions in exisence, he mos widely used forma in elecriciy rading is he uniform aucion, also known as firs-price aucion. However, some markes, like for example, England and Wales, have recenly swiched o a discriminaory aucion (also known as pay-asbid) forma and several ohers are considering his change. A recen debae in he sudy of resrucured elecriciy markes is he relaive meris and drawbacks of hese wo aucion formas in elecriciy rading. While our ineres in his paper is no o compare hese wo aucion formas, because of he imporance of hese wo forms of aucions in elecriciy rading, we underake our sudy using hese wo aucion formas. In boh aucion formas, he suppliers submi heir bids for a period by specifying he minimum offer prices and available capaciies a hose prices. The aucioneer considers hese bids and he forecased demand for he period and hen decides dispach quaniies for he period by allocaing he leas cos supplier firs unil all demand is me. The main difference beween he wo formas is in he paymens o he suppliers. In he uniform aucion, all allocaed unis are paid a he price of he marginal acceped uni. In discriminaory aucions, he suppliers are paid a heir offer prices for he quaniies allocaed. Several sudies have analyzed uniform and discriminaory aucions in he conex of resrucured elecriciy markes. Son e al. [14] compares he equilibrium characerisics of uniform and discriminaory aucions in a duopoly seing. A more recen sudy by Fabra e al. [7] examines bidding behavior and marke oucomes in uniform and discriminaory aucions under a variey of marke siuaions. However, mos of hese sudies are limied o duopoly siuaions as oligopoly seings are analyically hard o solve. Our focus in his sudy is on he effec of price caps on he marke oucome when he aucion forma used is uniform or discriminaory. The nex secion discusses he resuls from experimens using he compuaional learning approach discussed in Secion 3. 5. Experimens and resuls In hese experimens we consider a marke wih muliple number of generaors operaing on eiher uniform or discriminaory aucion formas. The demand faced by he marke is uniformly disribued. As in [7] we assume ha generaors offer heir enire capaciy for bid. While his is a simplificaion, i urns ou ha his may be a reasonable assumpion in many siuaions. For example, a recen sudy of bidder behavior a New York ISO [17] shows ha generaors submi mos of heir insalled capaciy once hey choose o bid ino marke. We implemened he agen model using Swarm 2.2 simulaion ki. 5.1. Sraegic behavior under demand uncerainy We begin by sudying he equilibriumbidding behavior of he generaors under duopoly and oligopoly seings wih a fixed price cap (he leader s learning urned off). In he duopoly case we consider one large capaciy generaor wih capaciy 15 and a fixed marginal cos of 3, and a small capaciy generaor wih capaciy 5 and marginal cos 4. The demand in his case is uniformly disribued in he range 5

[6, 15]. In he oligopoly seing here are five generaors wih one large capaciy generaor, hree small capaciy generaors, and one medium capaciy generaor. The capaciies and coss of large and small capaciy generaors are as in he duopoly case whereas he medium capaciy generaor has a capaciy of 10 and marginal cos 3.5. The demand in his case varies in he range [15, 30]. The price cap is se a 12 in all cases. Probabiliy 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.02 Bid Price Generaor 1 Generaor 2 Figure 4. Bidding sraegies in discriminaory aucion (duopoly ) 0.20 Probabiliy 0.15 0.10 0.05 Bid Price Generaor 1 Generaor 2 Probabiliy 0.15 0.10 0.05 Generaor 1 Generaor 2 Generaor 3 Generaor 4 Generaor 5 Bid Price Figure 2. Bidding sraegies in uniform aucion (duopoly ) Figure 2 shows he bidding sraegies of he generaors in duopoly case under uniform aucion. As expeced, boh generaors employ mixed sraegies wih he large capaciy generaor more ofen bidding closer o he price cap and he smaller capaciy generaor bidding closer o is marginal cos. Figure 3 shows he bidding sraegies of he generaors in he oligopoly case under uniform aucion. In his case all generaors have mixed sraegies ha specify bidding closer o heir respecive marginal coss. This is because, as he number of generaors increase, he compeiion forces he generaors o bid closer o heir marginal coss. Figure 4 and 5 show he bidding sraegies of generaors in he duopoly and oligopoly seings respecively under discriminaory aucion forma. Probabiliy 0.12 0.10 0.08 0.06 0.04 0.02 Bid Price Generaor 1 Generaor 2 Generaor 3 Generaor 4 Generaor 5 Figure 3. Bidding sraegies in uniform aucion (oligopoly ) Figure 5. Bidding sraegies in discriminaory aucion (oligopoly ) 5.2. Effec of price cap In hese experimens we le he leader o vary he price cap seing. Figure 6 shows he plo of cos of supply for differen price caps under unform and discriminaory aucion seings in duopoly while Figure 7 shows he same in oligopoly. I can be observed ha he cos of supply is non-decreasing in he price cap in all cases here. This is in conras o he recen findings ha average price may no be nondecreasing in he price cap under demand uncerainy. Anoher ineresing observaion is ha he cos of supply under discriminaory aucion is lower han he cos of supply in uniform aucion under differen price cap seings boh in duopoly and oligopoly. This is consisen wih he analyical resuls of [7] and [14]. Toal Cos 14 12 10 8 6 4 2 Price Cap Uniform Discriminaory Figure 6. Cos of supply under differen price caps (duopoly ) 6

8. References Toal Cos 18 16 14 12 10 8 6 4 2 Figure 7. Cos of supply under differen price caps (oligopoly ) 6. Discussion Price Cap The main conribuion of his paper is a new muliagen model and he associaed compuaional learning approach for he sudy of regulaion problems. The compuaional approach is very general and can be employed o sudy a variey of regulaion problems no only in elecriciy markes bu also in many oher regulaion siuaions such as emission conrol, pricing communicaion neworks ec. The resuls in his sudy provide ineresing observaions abou bidder sraegies and price cap effeciveness in oligopolisic seings of elecriciy aucions, which are no easily amenable o heoreical analysis. In fuure work we would like o exend his sudy o cases ha include elasic demand and muliple bids. Anoher ineresing avenue for furher work would be o employ his approach o model an exising marke using acual marke parameers. One weakness of he presen sudy is he unknown convergence properies of he compuaional learning approach. Esablishing convergence condiions of he learning scheme presened is anoher imporan fuure work o be underaken. 7. Acknowledgmen Uniform Discriminaory This work was suppored by Naional Science Foundaion gran ECS-0601590 [1] Basar, T., & Olsder, G. J. (1995). Dynamic noncooperaive game heory. London: Academic Press. [2] Borensein, S. (2002). The rouble wih elecriciy markes: Undersanding california's resrucuring disaser. Journal of Economic Perspecive, 16 (1), 191 211. [3] Bower, J., & Bunn, D. (2001). Experimenal analysis of he efficiency of uniform-price versus discriminaory aucions in he england and wales elecriciy marke. Journal of Economic Dynamic Conrol, 25, 561-592. [4] Bowling, M., & Veloso, M. (2002). Muliagen learning using a variable learning rae. Arifcial Inelligence, 136, 215-250. [5] Bunn, D. W., & Oliveira, F. S. (2003). Evaluaing individual marke power in elecriciy markes vis agen-based simulaion. Annals of Operaions Research, 121, 57-77. [6] Earle, R., Schmedders, K., & Taur, T. (2007). On price caps under uncerainy. Review of Economic Sudies, 74, 93-111. [7] Fabra, N., von der Fehr, N. H., & Harbord, D. (2006). Designing elecriciy aucions. The Rand Journal of Economics, 37 (1), 23-46. [8] Fudenberg, D., & Levin, D. K. (1998). The heory of learning in games. Cambridge, MA: MIT Press. [9] Hu, J., & Wellman, M. P. (2003). Nash q-learning for general-sum sochasic games. Journal of Machine Learning Research, 4, 1039-1069. [10] Joskow, P., & Tirole, J. (2006). Reliabiliy and compeiive elecriciy markes. The Rand Journal of Economics (o appear) [11] Keyhani, A. (2003). Leader-follower framework for conrol of energy services. IEEE Transacions on Power Sysems, 18 (2), 837-841. [12] Leslie, D. S., & Collins, E. J. (2004). Individual q-learning in normal form games. Submied o SIAM J. of Conrol and Opimizaion. [13] Liman, M. (1994). Markov games as a framework for muli-agen reinforcemen learning. In Proceedings of he Elevenh Inernaional Conference on Machine Learning pp. 157-163 [14] Son, Y. S., Baldick, R., Lee, K.-H., & Siddiqi, S. (2004). Shor-erm elecriciy marke aucion game 7

analysis: Uniform and pay-as-bid pricing. IEEE Transacions on Power Sysems, 19 (4), 1990-1998. [15] Suon, R. S., & Baro, A. (1998). Reinforcemen learning: An inroducion. Cambridge, MA: MIT Press. [16] Wakins, C. J. C. H., & Dayan, P. (1992). Q- learning. Machine Learning, 8 (3/4), 279-292. [17] Zhang, N., Moun, T., & Boisver, R. (2007). Generaors' bidding behavior in he NYISO day-ahead wholesale elecriciy marke. In 40h Hawaii Inernaional Conference on Sysem Sciences (HICSS- 2007) (pp. 11-17). Hawaii. 8