ECONOMICS OF PAIR PROGRAMMING REVISITED

Similar documents
A cognitive perspective on pair programming

Pair Programming: A Contingency Approach

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

Empirical Software Evolvability Code Smells and Human Evaluations

On the Combined Behavior of Autonomous Resource Management Agents

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

A Study of the Effectiveness of Using PER-Based Reforms in a Summer Setting

CS Machine Learning

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Summary results (year 1-3)

NCEO Technical Report 27

DRAFT VERSION 2, 02/24/12

TU-E2090 Research Assignment in Operations Management and Services

Leveraging MOOCs to bring entrepreneurship and innovation to everyone on campus

Towards a Collaboration Framework for Selection of ICT Tools

Lecture 1: Machine Learning Basics

Like much of the country, Detroit suffered significant job losses during the Great Recession.

Higher Education Six-Year Plans

Software Maintenance

Reducing Features to Improve Bug Prediction

Beyond the Blend: Optimizing the Use of your Learning Technologies. Bryan Chapman, Chapman Alliance

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Longitudinal Analysis of the Effectiveness of DCPS Teachers

The Ohio State University Library System Improvement Request,

3. Improving Weather and Emergency Management Messaging: The Tulsa Weather Message Experiment. Arizona State University

Pair Programming. Spring 2015

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Early Warning System Implementation Guide

Series IV - Financial Management and Marketing Fiscal Year

Test Effort Estimation Using Neural Network

Evidence for Reliability, Validity and Learning Effectiveness

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Utilizing Soft System Methodology to Increase Productivity of Shell Fabrication Sushant Sudheer Takekar 1 Dr. D.N. Raut 2

Improving software testing course experience with pair testing pattern. Iyad Alazzam* and Mohammed Akour

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

MYCIN. The MYCIN Task

Introduction to Simulation

Success Factors for Creativity Workshops in RE

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

PROPOSAL FOR NEW UNDERGRADUATE PROGRAM. Institution Submitting Proposal. Degree Designation as on Diploma. Title of Proposed Degree Program

Word Segmentation of Off-line Handwritten Documents

Conceptual Framework: Presentation

Oklahoma State University Policy and Procedures

FORT HAYS STATE UNIVERSITY AT DODGE CITY

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

California Professional Standards for Education Leaders (CPSELs)

Axiom 2013 Team Description Paper

Probability estimates in a scenario tree

Major Milestones, Team Activities, and Individual Deliverables

November 17, 2017 ARIZONA STATE UNIVERSITY. ADDENDUM 3 RFP Digital Integrated Enrollment Support for Students

re An Interactive web based tool for sorting textbook images prior to adaptation to accessible format: Year 1 Final Report

Visit us at:

Rote rehearsal and spacing effects in the free recall of pure and mixed lists. By: Peter P.J.L. Verkoeijen and Peter F. Delaney

EDIT 576 (2 credits) Mobile Learning and Applications Fall Semester 2015 August 31 October 18, 2015 Fully Online Course

An Empirical and Computational Test of Linguistic Relativity

Higher education is becoming a major driver of economic competitiveness

Team Dispersal. Some shaping ideas

Trends in College Pricing

Len Lundstrum, Ph.D., FRM

How to Judge the Quality of an Objective Classroom Test

Introducing New IT Project Management Practices - a Case Study

learning collegiate assessment]

The open source development model has unique characteristics that make it in some

Global Television Manufacturing Industry : Trend, Profit, and Forecast Analysis Published September 2012

Pair Programming in Introductory Programming Labs

Strategic Practice: Career Practitioner Case Study

Requirements-Gathering Collaborative Networks in Distributed Software Projects

Guidelines for Mobilitas Pluss postdoctoral grant applications

HEALTH SERVICES ADMINISTRATION

Rule Learning With Negation: Issues Regarding Effectiveness

BASIC EDUCATION IN GHANA IN THE POST-REFORM PERIOD

Mathematics process categories

Pair Programming: When and Why it Works

An Introduction to Simio for Beginners

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

EDIT 576 DL1 (2 credits) Mobile Learning and Applications Fall Semester 2014 August 25 October 12, 2014 Fully Online Course

How to make an A in Physics 101/102. Submitted by students who earned an A in PHYS 101 and PHYS 102.

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

When!Identifying!Contributors!is!Costly:!An! Experiment!on!Public!Goods!

Guidelines for Mobilitas Pluss top researcher grant applications

The Relationship Between Tuition and Enrollment in WELS Lutheran Elementary Schools. Jason T. Gibson. Thesis

1GOOD LEADERSHIP IS IMPORTANT. Principal Effectiveness and Leadership in an Era of Accountability: What Research Says

MKTG 611- Marketing Management The Wharton School, University of Pennsylvania Fall 2016

The Impact of Honors Programs on Undergraduate Academic Performance, Retention, and Graduation

Deploying Agile Practices in Organizations: A Case Study

Segmentation Study of Tulsa Area Higher Education Needs Ages 36+ March Prepared for: Conducted by:

University of Groningen. Systemen, planning, netwerken Bosman, Aart

STANDARDS AND RUBRICS FOR SCHOOL IMPROVEMENT 2005 REVISED EDITION

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

School Inspection in Hesse/Germany

Examining the Structure of a Multidisciplinary Engineering Capstone Design Program

Colorado State University Department of Construction Management. Assessment Results and Action Plans

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Standards and Criteria for Demonstrating Excellence in BACCALAUREATE/GRADUATE DEGREE PROGRAMS

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Rule Learning with Negation: Issues Regarding Effectiveness

The Future of Consortia among Indian Libraries - FORSA Consortium as Forerunner?

Transcription:

Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2012 Proceedings Proceedings ECONOMICS OF PAIR PROGRAMMING REVISITED Wenying Sun Computer Information Sciences, Washburn University, Topeka, KS, United States., nan.sun@washburn.edu Miguel Aguirre-Urreta School of Accountancy and Management Information Systems, DePaul University, Chicago, IL, United States., maguirr6@depaul.edu George Marakas Accounting Information Systems, University of Kansas, Lawrence, KS, United States., gmarakas@ku.edu Follow this and additional works at: http://aisel.aisnet.org/amcis2012 Recommended Citation Sun, Wenying; Aguirre-Urreta, Miguel; and Marakas, George, "ECONOMICS OF PAIR PROGRAMMING REVISITED" (2012). AMCIS 2012 Proceedings. 2. http://aisel.aisnet.org/amcis2012/proceedings/systemsanalysis/2 This material is brought to you by the Americas Conference on Information Systems (AMCIS) at AIS Electronic Library (AISeL). It has been accepted for inclusion in AMCIS 2012 Proceedings by an authorized administrator of AIS Electronic Library (AISeL). For more information, please contact elibrary@aisnet.org.

ECONOMICS OF PAIR PROGRAMMING REVISITED Wenying Sun Department of Computer Information Sciences Washburn University nan.sun@washburn.edu Miguel I. Aguirre-Urreta School of Accountancy and MIS DePaul University maguirr6@depaul.edu George M. Marakas Department of Accounting and Information Systems University of Kansas gmarakas@ku.edu ABSTRACT This study aimed to answer two research questions. First, is pair programming more cost effective than solo programming in all situations? Second, in what situations is pair programming more cost effective than solo programming? We adopted and extended economic models specified by prior researchers. We examined two different scenarios and conducted simulations where we varied across a wide range of possible values. A couple of conclusions were drawn from the study. First, across the ranges of parameters studied, pair programming is more economically feasible in only a limited number of instances. Second, in order to achieve the economic benefit, pair programming either needs to have advantages in all of three areas (speed, defect, defect removing) or have substantial advantages in two areas if one area is roughly equivalent to solo programming. To address the second research questions, we identified specific parameter ranges for situations where a) pair programming is more economical, b) solo programming is more economical, and c) the two programming methods are equivalent. Keywords pair programming, solo programming, net present value, simulation, classification tree INTRODUCTION Solo programming is the traditional programming method where one programmer works on the programming task alone. Pair programming is a programming method where two programmers work on the same programming task side by side in front of one computer (Beck, 2000; Williams, Kessler, Cunningham, and Jeffries, 2000; Arisholm, Gallis, Dybå, and Sjøberg, 2007). In pair programming, one programmer is the driver, and the other is the navigator. The driver sits in front of the computer screen, types the code, and pays close attention to the coding details. The navigator sits beside the driver, reviews the code, and takes the lead in developing alternative strategies in the event of a problem. The programmers change roles periodically during the project to avoid role fatigue. Pair programming has recently emerged as an attractive alternative to solo programming. Pair programming has enjoyed its worldwide prevalence (Cusumano, MacCormack, Kemerer, and Crandall, 2003) and increased adoption. According to the annual survey conducted by Version One, pair programming adoption has increased by nine percent since 2008 (VersionOne, 2011). Despite growing interest in pair programming, issues remain that prevent the majority of organizations from adopting pair programming. The concern of increased overall project cost is a major obstacle. Two major studies attempted to address the cost of pair programming but yielded different conclusions. Erdogmus and Williams (2003) present a positive economic picture for pair programming in all situations while Padberg and Müller (2003) suggest the economic benefit of pair programming depends on factors such as pair speed advantage. Since the existing literature provides conflicting answers to the issue of cost, corporate decision makers do not have guidelines to follow to resolve this bottom line issue. Without clear cost benefits, given the fact that the majority of the software development companies are financially conscious, transition from solo to pair programming is difficult to argue for. Proceedings of the Eighteenth Americas Conference on Information Systems, Seattle, Washington, August 9-12, 2012. 1

To fill this gap, this research extends findings from the two studies and aims to answer the following research questions: 1) Is pair programming more cost effective than solo programming in all situations? 2) If not, then in what situation is pair programming more cost effective than solo programming? Several steps were taken to address these questions. First, results from the two studies were examined and replicated by using the same models and parameters. Second, for each of the models, a wide range of parameters based on distributions were adopted. Third, parameter ranges for three situations were identified: 1) pair programming is more cost effective than solo; 2) solo programming is more cost effective than pair; 3) there is no difference between the two programming approaches. This study makes two major contributions. First, it synthesizes existing literature, and extends those findings through replication and extension. Second, it provides practical guidance to the industry when one needs to decide which programming method to adopt. The paper is organized as follows. The first section provides literature review on economics of pair programming. Next, we discuss the theoretical foundation. We then propose research hypotheses and research model, followed by research methods and analysis results. Finally, we note some limitations of our approach and discuss directions for future research. LITERATURE Since the focus of this paper is on the economics of pair programming, only literature related to this specific topic is presented. For a comprehensive literature review on pair programming, please see Sun (2011). Studies are split in their conclusions regarding whether pair programming reduces the overall cost of a software development project compared to solo. Some anecdotal evidence suggests pair programming will reduce the overall cost of a project while others believe the benefits of pair programming do not justify the increased expense of the second programmer. Stephen Hutchinson, senior technical architect at Royal & Sun Alliance Insurance Group, claims pairing two developers on each assignment helped the company come in 15% lower than the projected budget (Copeland, April 2001). An application development manager at a major U.S. bank commented the cost issue was moot because through pair programming there would be fewer defects and less time would be spent on bug fixing (Radding, 2002). In contrast, Larry Zucker, executive director of application development at Dollar Rent-a-Car Systems in Tulsa, Oklahoma, said that while he appreciated the benefits of having two programmers on one task, the gains did not justify the doubled expense. He also expressed the fear that the programming process could turn into a social event (Copeland, 2001a). This same view was echoed by several other reports. For example, Stephens and Rosenberg (2003) identified cost as the major issue facing a decision to employ pair programming; Aiken s (2004) interview of three developers identified the same view: there are surely additional development costs, especially because productivity might suffer at first while people are adjusting; and Luck (2004) reported a 15% extra cost from an industrial experience. As with the practitioner community, conclusions drawn from the academic literature are also mixed. Müller (2006) found no difference in terms of development cost between a pair and a solo implementation if the cost for developing programs of a similar level of correctness was concerned, while Rostaher and Hericko (2002) revealed the average time spent to complete all three tasks by solo and pair programmers was very similar, which means pairs needed almost twice as much time and basically doubled the cost. Furthermore, two major studies attempted to address the economics of pair programming and yielded markedly different conclusions. In one study pair programming was more cost effective than solo programming in all situations (Erdogmus and Williams, 2003), whereas in the other the economic benefit of pair programming depended on several factors (Padberg and Müller, 2003). Given that these two studies represent major efforts in addressing the cost benefits of pair programming vs. solo, we present a summary of the studies below. It should be noted that Müller and Padberg (2002) and Müller and Padberg (2003) appear to be earlier reports of studies similar to Padberg and Müller (2003). Since Padberg and Müller (2003) provided more comprehensive discussions of the study, only Padberg and Müller (2003) is presented here. Erdogmus and Williams (2003) conducted a major research effort on the economics of pair programming. Three empirical parameters were crucial to their model: productivity (LOC/hour), defect rate (defects/loc), and rework speed (defects fixed/hours). The abstract models for solo and pair used by the researchers is shown below (where π is productivity, β is defect rate, and p is rework speed): Solo = {N=1, π=25.0, β=0.00585, p=0.0303} Pair = {N=2, π=43.478, β=0.00351, p=0.0527} Proceedings of the Eighteenth Americas Conference on Information Systems, Seattle, Washington, August 9-12, 2012. 2

Based on these parameters, the authors compared solo and pair on three measures: efficiency, unit effort, and unit time, and revealed that pair was better in all of the three metrics: nearly 100% improvement in efficiency, over 40% reduction in unit effort, and over 70% reduction in unit time. The authors then considered two value realization models: single-point delivery (value realized at the end) and incremental delivery (value realized incrementally on a continuous basis). The comparison of solo and pair based on breakeven unit value ratio (solo breakeven unit value/pair breakeven unit value) suggested that pair was better in both situations. Padberg and Müller (2003) constructed a mathematical model and applied the model in two different scenarios: solo development and pair development. To realize the models, the authors adopted several parameter values: pair speed advantage ranged from 1.3 to 1.8 (which should be interpreted as pair being 23 to 44 percent faster than solo), and pair defect advantage was set to 15% (meaning pair produced 15 percent fewer defects than solo). Based on the model analysis, several conclusions were drawn. First, the pair speed advantage and pair defect advantage have a strong impact on the value of a pair programming project. Second, pair programming appears beneficial when the market pressure is really strong and programmers are much faster when working in pairs as compared to working alone. Third, if the workforce is limited, it will take a pair programming project a very strong market pressure, a large pair speed advantage, and a significant pair defect advantage to break even with the solo programming project. To summarize, there is little consensus regarding the gains of pair programming on overall project cost. In particular, the two main economic studies took similar approaches yet yielded different conclusions. Both of the models were severely restricted by the lack of reliable parameter values. Erdogmus and Williams (2003) heavily relied on productivity, defect rate, and rework speed, and Padberg and Müller (2003) depended on the data of pair speed advantage and pair defect advantage. However, empirical evidence of these data items was very limited (Erdogmus and Williams, 2003; Padberg and Müller, 2003). The research contained herein attempted to mitigate this limitation through the exploration of wide range of parameters in pair and solo programming. THEORETICAL BACKGROUND Several methods are present to measure the economic feasibility of a software project. One is net present value (NPV) which measures the differences between the present value of benefits derived from the project and the present value of costs incurred in its development. Two different formulation have been employed. One is used by Erdogmus (1999): NPV = (Asset value Operation cost)/(1 + Product risk)development cost Development cost +Flexibility value. The other is applied by Padberg and Müller (2003):NPV = (Asset value)/(( 1 + Discount rate )^Dev Time Dev Cost) Another method to examine economic feasibility is breakeven analysis. Ergomus and Williams (2003) use the following formula to compare two development methods: Breakeven Unit Value Ratio (BUVR) = BUV (solo) / BUV (pair). BUV is the threshold value of V above which NPV is positive; V is measured in $/LOC and represents the fixed increase in earned value per each additional unit of output produced. They state that as the ratio increases, the advantage of pair over solo also increases. In this study, we adopted the mathematical formulation developed by Padberg and Müller (2003). Future studies will investigate other alternatives. RESEARCH MODEL In this section, we discuss the basis of our parameter assumptions and how they are different from the ones made by the previous two economic studies. We also derive our hypothesis based on the assumptions. Defect Rate Numerous stories from industry indicate pair programming has led to lower defect rate (e.g. Jensen 2003; Anthes 2004; Fitzgerald and Hartnett 2005). This anecdotal claim has been supported by academic-based empirical studies involving college students as well as practitioners (e.g. Nosek 1998; Lui and Chan 2003; McDowell et al., 2003; 2006; Williams et al., 2003; Vanhanen and Lassenius 2007). Despite the overwhelming support for the relationship between pair programming and the reduction in defect rate, several studies yielded mixed results for this assertion. Studies found defect density was not affected by development methods (e.g. Proceedings of the Eighteenth Americas Conference on Information Systems, Seattle, Washington, August 9-12, 2012. 3

Hulkko and Abrahamsson 2005), and reduction in defect rate was not consistent within a task nor across tasks or projects (e.g. Vanhanen and Lassenius 2005; Vanhanen and Korpi 2007). Some studies noted the defect advantage gained by pair programming might be dependent on the complexity of the task (Al-Kilidar, Parkin, Aurum, and Jeffery 2005; Arisholm et al. 2007). Additionally, Balijepally et al. (2009) concluded a pair was not necessarily better than a solo. They compared the performance of pairs with those of the best performers and the second best performers and found that pairs performed at the level above the second best performers but no better than the best performers. Based on findings from previous study, we make the following assumption: Pair programming does not warrant a lower defect rate than solo programming in all situations. This is different from the assumption adopted by Padberg and Müller (2003) and Erdogmus and Williams (2003). Both of the previous economic studies assumed pair programming produces fewer defects than solo programming. In Padberg and Müller (2003), pair programming had 15 percent fewer defects than solo; in Erdogmus and Williams (2003), the parameters for defect rates were 5.85/KLOC for solo, and 3.51/KLOC for pair, suggesting pair had 40 percent fewer defects than solo. Productivity Similar to defect rate, results on which programming method leads to faster delivery are mixed. Some reported pair programming was faster than solo programming (e.g.. VanDegrift 2004; Williams, Shukla, and Anton 2004; Nedland 2005) while others noted pair programming had no speed advantage at all and in some cases solo was faster than pair due to parallelism (e.g. Parrish et al. 2004; Hulkko and Abrahamson 2005; Vanhanen and Lassenius 2005). Therefore, we make the following assumption: Pair programming is not faster than solo programming in all situations. This, once again, is different from what was adopted by Padberg and Müller (2003) and Erdogmus and Williams (2003). In the simulations conducted by Padberg and Müller, values ranging from 1.3 to 1.8 were used for the pair speed advantage, meaning pair programming requires 23 to 44 percent shorter time to completion for tasks than solo programming. In Erdogmus and Williams (2003), the parameters for productivity were 25 LOC/hour for solo, and 43.478 for pair, suggesting pair programming was 42.5 percent faster than solo. Rework Speed Empirical studies regarding rework speed is very limited. When deriving rework speed for pair programming, Erdogmus and Williams (2003) assumed pairs could achieve rework productivity gains comparable to those reported for the initial development activities. Following the same argument, given our earlier statement on productivity, we make the following assumption: Pair programming is not faster than solo programming regarding rework in all situations. This is different from the assumption Erdogmus and Williams (2003) made. In their model, Erdogmus and Williams used 0.0303/hour for solo, and 0.0527/hour for pair, implying pair programming was 1.4 times faster than solo programming fixing defects. Padberg and Müller (2003) didn t use this parameter in their model. Market Pressure and Number of Developers Market pressure is measured by discount rate. The higher the discount rate, the higher the market pressure. Net present value is calculated by using the following formula: Proceedings of the Eighteenth Americas Conference on Information Systems, Seattle, Washington, August 9-12, 2012. 4

Therefore, the higher the discount rate, the lower the net present value. The number of developers has an effect on net present value as well. As suggested by the model (presented in the Analysis section), a higher number of developers leads to low net present value through its effects on development cost. Net Present Value Based on the above discussions and the nature of the mathematic formula, we hypothesize: Pair programing does not necessarily have higher NPV than solo programming. Whether pair programming is more cost effective than solo programming depends on the ranges of parameters used in the NPV calculations. Figure 1 below is the research model. DATA ANALYSIS Figure 1. Research model for this study The results from the previous two economic studies were examined and replicated in order to make sure we understood their procedures accurately. The study results reported by Padberg and Müller (2003) were replicated with ease thanks to its clear presentation of the analysis steps and results. Therefore, the following discussion largely focused around the work we did based on Padberg and Müller (2003). In this research we conducted a simulation where we varied variables across a wide range of possible values. We adopted the economic model specified by Padberg and Müller (2003) and modified it to incorporate an additional variable, representing a potential advantage of pair programming when it comes to removing defects, which was included by Erdogmus and Williams (2003). We also examined two scenarios identified by Padberg and Müller (2003) with regards to the relationship between number of solo developers and number of pairs. These modifications are discussed next. In the base model by Padberg and Müller (2003), a quality assurance phase was included only for the solo programming approach, as the formula for this phase was developed to represent the differential advantage that pair programming would have with respect to solo programming regarding the number of defects left in the code after original development was finished. However, Erdogmus and Williams (2003) argue that pair programming could also exhibit improved performance in removing all defects, in addition to having fewer defects in the original code. In order to incorporate this variable into the base model by Padberg and Müller (2003), we developed separate formulas for quality assurance that represent the time both solo and pair programming approaches would take to complete this task, not just the difference between the two, as in Padberg and Müller (2003). Proceedings of the Eighteenth Americas Conference on Information Systems, Seattle, Washington, August 9-12, 2012. 5

Additionally, two alternative scenario specifications were examined. In the first scenario, which represents a workforce that is limited to developers currently employed by the organization, the number of pairs equals half the number of developers. This scenario represents the case of a manager who is tasked with choosing the best approach to organize a fixed number of developers (solo or in pairs) for a project. In the second scenario, we assumed that a manager has a fixed number of tasks that need to be carried out and can thus choose to assign either a single developer to each or a pair of developers to each task. This is consistent with the possibility of augmenting the number of available developers if a pair programming approach is chosen. In this scenario, the number of solo developers is constrained to equal the number of pairs. Our approach builds on the base formulation of the economic model by Padberg and Müller (2003) and further extends it by incorporating an additional variable that is representative of potential differences between developers working alone or in pairs; that is, whether pairs have an advantage when it comes to the time taken to remove defects (note that this is different than arguably producing fewer defects to begin with). In addition, our analysis is based on a wide range of possible values for the independent variables, whereas the work of Padberg and Müller (2003) presented only results for selected values of those, which limits the generalizability of their conclusions. The formulas employed for the economic model are as follows, where the subscript SP indicates solo programming, whereas PP indicates pair programming: Inputs to these equations were either kept fixed or varied across a range of plausible conditions (see Table 1 for the input parameters used in the simulation). Fixed variable values are adopted from Padberg and Müller (2003) to ease the comparability of results (even though other values are possible). The remaining inputs, which represent the variables of interest in the simulation, were varied in 0.05 increments except number of developers which is in increments of 2. This yielded a total of 15x33x33x33x5 = 2,695,275 possible combinations. Using the same inputs, the simulation was conducted using the two alternative constraints on the relationship between number of solo developers and number of pairs. Proceedings of the Eighteenth Americas Conference on Information Systems, Seattle, Washington, August 9-12, 2012. 6

Table 1. Input Parameters The dependent variable of interest was the difference between the net present value for each of the two alternative approaches to programming (i.e., solo vs. pair). Main results were obtained with an indicator variable coded to indicate whether the net present value for the two approaches was within five percent of each other (which was considered equally acceptable), whether the net present value of pair approach was more than five percent larger than solo, or the other way around, resulting in three different values for this variable. The resulting dataset was then subject to a classification tree analysis, whereas the data are split into segments that are as homogenous as possible with regards to the dependent variable by choosing appropriate cutoff values for the independent variables. In this analysis, the independent variables were those that were varied in the simulation, and the dependent variable was the indicator variable just discussed. The analysis was conducted separately for the two scenarios depicting the relationship between number of solo developers and number of pairs. RESULTS As stated in the Analysis section, two scenarios were considered. In the first scenario, the number of pairs equals half the number of developers, in which case the total number of developers assigned to the programming tasks is the same for both programming methods. In the second scenario, the number of solo developers is constrained to equal the number of pairs; therefore pair programming has twice as many developers as solo programming. Simulation results from both scenarios are presented below. Scenario 1 Fixed Number of Total Developers When there are a fixed number of developers that can be organized as either solo programmers or working in pairs - includes three values: solo is better, pair is better, or they are within a 5% range of each other and thus considered equally acceptable alternatives to organize programmers, the classification tree algorithm converges to a solution with 23 terminal nodes. In this scenario, only three of the independent variables were needed to obtain the classification (PairSpeedAdvantage, PairDefectRemovalAdvantage, and PairDefectAdvantage), whereas the other two variables (DiscountRate and NumOfDevelopers) did not influence the results. This is consistent with findings by Padberg and Müller (2003), who note that in this scenario the discount rate employed did not seem to affect the results (the authors did not vary the number of developers in their models). Table 2 shows the resulting nodes, the cutoff values of the three independent variables, the number of cases in each node, and the expected outcome. Of all possible combinations, 91.8% (or 2,473,438 cases) were classified as favoring solo programming, and 4.5% (or 120,363 cases) as favoring pair programming, with the remainder classified as equally acceptable. The accuracy of the classification is not shown as the algorithm provides a mean expected value for each node that cannot be meaningfully converted as was the case when the dependent variable was only binary. However, the mean value within each node is employed to split each branch into nodes where their mean values are significantly different, even if leading to the same qualitative conclusion; for example, two contiguous nodes may lead to the conclusion that solo programming is more beneficial than working in pairs, but with significantly different levels of accuracy in that conclusion, which leads the algorithm to create two nodes instead of merging them into a single one. Proceedings of the Eighteenth Americas Conference on Information Systems, Seattle, Washington, August 9-12, 2012. 7

Table 2. Scenario 1 - Simulation Results In our results, 4.5% (or 120,363 cases) resulted in the net present value of the pair programmers being higher than that of the same number of programmers working in solos. That is, nodes 8, 15, 21-23, see Table 3. 1.2% (or 32,700 cases) resulted in the net present values of the two programming approaches are about the same. That is, nodes 7 and 19, see Table 4. Table 3. Scenario 1 - Pair Better than Solo Proceedings of the Eighteenth Americas Conference on Information Systems, Seattle, Washington, August 9-12, 2012. 8

Table 4. Scenario 1 - Pair and Solo Equally Acceptable Scenario 2 Fixed Number of Tasks The second part of this simulation includes those conditions where the number of tasks in a development project is fixed and managers face the alternative of assigning each task to a solo developer or to a pair of them, with the consequent increases in personnel expense associated with the second alternative. Furthermore, it is assumed here that managers can either hire qualified personnel or allocate existing developers from other projects to the focal one. This is consistent with the scenario presented by Padberg and Müller (2003) as well. Thus, in these results, the simulation described above is constrained such that NumOfDevelopers = NumOfPairs. Using the same ranges for the variable parameters as before the number of combinations totaled 2,695,275. A similar approach of having a trinary dependent variable (solo is better or pair is better or the two methods are equally acceptable when their net present value are within 5% of each other) was employed. The classification tree algorithm employed four independent variables to classify the outcomes of the simulation, in order of importance: PairSpeedAdvantage, PairDefectRemovalAdvantage, PairDefectAdvantage, and DiscountRate. Out of all cases 80.5 percent (or 2,168,381 cases) were classified as solo programming superior to pair programming, 5% (or 135,170 cases) as equally acceptable, and the remaining 14.% (or 391,724) as pair programming more profitable than solo programming. Table 5 shows the results of the classification tree analysis. Table 5. Scenario 2 - Simulation Results Proceedings of the Eighteenth Americas Conference on Information Systems, Seattle, Washington, August 9-12, 2012. 9

Table 6 lists nodes where pair programming is more profitable than solo programming, whereas Table 7 shows those combinations of the independent variables for which both alternatives are considered to be equivalent (e.g., within 5% of each other). Table 6. Scenario 2 Pair Better than Solo Table 7. Scenario 2 Pair and Solo Equally Acceptable DISCUSSIONS This study aimed to answer two research questions. First, is pair programming more cost effective than solo programming in all situations? Second, in what situations is pair programming more cost effective than solo programming? We adopted the economic model specified by Padberg and Müller (2003) and modified it to incorporate rework speed, which was included by Erdogmus and Williams (2003). We examined two different scenarios and conducted simulations where we varied across a wide range of possible values. From the 1 st scenario fixed number of total developers, by focusing on those nodes where the results were classified as favoring the pair programming approach (that is, nodes 8, 15, 21-23 in Table 3), the following conclusions can be drawn. First, across the ranges of parameters studied here, pair programming emerges as more economically beneficial in only a limited number of instances. Second, those occur only when the speed advantage of pair over solo is quite large, in the order of pairs begin 40% to 50% faster than solo or more in their development work, while producing also significantly fewer defects in their code. Third, the discount rate applicable to the project does not seem to have a major effect in the outcome; this is consistent with conclusions reached by Padberg and Müller (2003) in this regard. From the 2 nd scenario fixed number of tasks, several conclusions can be reached. First, though more than in the first scenario, it is still the case that solo programming appears superior to pair programming in a wide range of cases. Second, though discount rate was included by the classification tree algorithm as valuable in describing the data resulting from the simulation, its explanatory power is somewhat limited, as there was a single combination of parameters where discount rate provided value in addition to the other variables in the model. Third, the ranges of the independent parameters in which pair programming is considered better and broader than that for the first scenario, which is also consistent with limited results from Padberg and Müller (2003). In particular, these indicate that there are a number of conditions where solo programming may be ahead in terms of specific aspects of development, such as introducing fewer defects than when working in pairs, and still pair programming could come ahead nonetheless. This emphasizes the importance of pair speed advantage as a central variable in the comparison between the two approaches, which underscores the need for further research to understand its behavior in more detail. In conclusion, our answer to the first research questions is: pair programming is not cost effective than solo programming in all situations. As a matter of fact, across the ranges of parameters studied, pair programming is more economically feasible in only a limited number of instances. Our answer to the second research question is: in order to achieve the economic benefit, pair programming either needs to have advantages in all of three areas (speed, defect, defect removing) or have substantial advantages in two areas if one area is roughly equivalent to solo programming. To address the second research questions, we identified the parameter ranges where pair is better than solo, pair is equivalent to solo, and solo is better than pair. The specific parameter ranges were stated explicitly in the Results section. Proceedings of the Eighteenth Americas Conference on Information Systems, Seattle, Washington, August 9-12, 2012. 10

This study makes several contributions. First, it synthesizes existing literature, and extends those findings through replication and extension. Second, our results suggest whether pair programming is more economically feasible than solo programming depends on combination of multiple factors. Focusing on isolated factors or having too restricted assumptions on the parameter values tends to produce incomplete picture of the two programming methods. It reveals the importance of considering multiple factors simultaneously when building the economic theories of pair programming. Finally, since our study provides specific parameter ranges of when one programming is more economical than the other, organizations can use these as guidelines to decide which approach to take based on data collected from their own projects. LIMITATION AND FUTURE RESEARCH This study has a couple of limitations. First, we didn t vary several important variables, e.g. project size, salary. However, we did this so we could more easily compare results of our approach to Padberg and Muller s. Second, in our simulation, we adopted a static mathematic model, where the complex dynamics of software development was not reflected. This was partly due to the fact that our primary goal of this study was to replicate and extend previous studies, therefore our choice was limited to what was previously done. In addition, the mathematical model is a legitimate approach and provides insights to the economics of software development from its own unique angle. There are several areas for future research. One is to complete the data analysis using breakeven points as outlined by Ergomus and Williams (2003) and report their results. The other is to supplement the current approach with a different way to calculate software development cost (e.g. the total cost is the sum of development cost, defect cost, etc.) by using data collected from other studies. We would also like to dynamically simulate the software development process and hope to gain a more complete picture of how programming methods such as solo vs. pair influence different phases of software development activities. REFERENCES (PARTIAL LISTING) 1. Arisholm, E., Gallis, H., Dybå, T., and Sjøberg, D.I.K. (2007) Evaluating Pair Programming with Respect to System Complexity and Programmer Expertise, IEEE Transactions on Software Engineering, 33, 2, 65-86. 2. Balijepally, Mahapatra, Nerur, and Price (2009) Are Two Heads Better than One for Software Development? The Productivity Paradox of Pair Programming, MIS Quarterly, 33, 1, 91-118. 3. Cusumano, M., MacCormack, A., Kemerer, C.F., and Crandall, B. (2003) Software Development Worldwide: The State of the Practice, IEEE Software, 20, 6, 28-34. 4. Erdogmus, H., and Williams, L. (2003) The Economics of Software Development by Pair Programmers, Engineering Economist, 48, 4, 283-319. 5. Lui, K., & Chan, K. C. C. (2003). When Does a Pair Outperform Two Individuals? Lecture Notes in Computer Science, 2675, 225-233. 6. Nosek, J. T. (1998). The Case for Collaborative Programming. Communications of the ACM, 41(3), 105-108. 7. Padberg, F., & Müller, M. M. (2003). Analyzing the Cost and Benefit of Pair Programming. Proceedings of the Ninth International Softwaree Metrics Symposium. 8. Williams, L., McDowell, C., Nagappan, N., Fernald, J., & Werner, L. (2003). Building Pair Programming Knowledge through a Family of Experiments. International Symposium on Empirical Software Engineering. Proceedings of the Eighteenth Americas Conference on Information Systems, Seattle, Washington, August 9-12, 2012. 11