Empirical Validation of Pair Programming

Empirical Validation of Pair Programming Corrado Aaron Visaggio visaggio@unisannio.it, Research Centre on Software Technology - RCOST University of Sannio Benevento, Italy PhD Symposium Corrado Aaron Visaggio 1 Motivation Plan driven approaches for developing software can fail in contexts where: the availability of resource may vary in an unpredictable way the time pressure is much stronger than expected the requirements of the system to develop are emerging or unstable. Some alternatives have been explored in order to face such situations and save quality of process and product: Boehm s spiral process, Radip Application Development, Rational Unified Process (...). There was an urging need to achieve an higher flexibility than the ones these processes offered. In the last decades the Agile Methods for software developments burst into the scene, proposing a radically different way to manage software process. Corrado Aaron Visaggio 2

The Problem (1/2) In the 2001 the Agile Manifesto was published, defining the novel agile way of the software production with for principles: Individual and interactions over process and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan The doubt: may agile method deteriorate the engineering rigor and discipline achieved with the plan -driven approach? Individual and interactions over process and tools: does the process remain repeatable? Working software over comprehensive documentation: does the process remain repeatable? Is the process measurable? Customer collaboration over contract negotiation: what happens to the product s quality when the architecture emerges from the process? Responding to change over following a plan: is it possible to realise dependable estimates on the process? Corrado Aaron Visaggio 3 The problem (2/2) and the research goals There is not a large consensus about one relevant issue: is it worth to adopt agile methods when developing software or is it too risky,provided that it contrasts with some good practices of software engineering? It was not feasible to deal with the entire set of agile practices in the space of a thesis: Pair Programming (2P) was selected for focusing my investigation. Pair Programming was analysed according to three dimensions: Specific Benefits Suitable contexts Pair Programming Costs/Benefits Suitable contexts: Is 2P Suitable for distributed Process? Costs/Benefits: Is 2P advantageous in terms of Return on Investement? Specific Benefit: Is 2P helpful for knowledge leveraging? Corrado Aaron Visaggio 4

The research Plan and Method the Purpose of the Research: validate, by empirical investigation, pair programming, according to three dimensions: sutibale contexts, ratio costs/benefits, and specific benefit. Establish Research Questions: success factors Controlled experiments with students: defects removal thesis yes Bugs? no Controlled experiments with professionals: confidence of industry Field Experiments: dependable results Post-doc Technological transfer Corrado Aaron Visaggio 5 The First Dimension Suitable contexts Pair Programming Specific Benefits Costs/Benefits Ratio costs/benefits Specific Benefits of the Practice Suitable Contexts of the Practice Warning: this research activity is still ongoing! Does Pair Programming cost more than Solo Programming? Is Pair Programming more beneficial than Solo Programming in terms of quality achieved? Corrado Aaron Visaggio 6

Productivity and quality Research Objects: Productivity: pair programming is supposed to fasten production cycles. Quality: pair programming is supposed to increase the quality of code s modules and overall architecture of the system. Research Question: Can pair programming improve the performances of project s teams, in terms of productivity and quality? Experiments: An experiment at University of Sannio, Benevento, Italy Hypotheses: Hoa: the pair programming does not affect the speed of programming. Hob: the pair programming does not affect the quality of code and architecture. Corrado Aaron Visaggio 7 The Experiment Outlook An initial experiment on Productivity of Pair Programming suggests that pair programming can fasten production cycles. 60 Subjects (graduate students of Computer Engineering) are grouped in teams of two kinds: paired programmers and solo programmers teams. Each team was responsible for the development of a system for the software requirements traceability. The teams follow an incremental process: at each iteration they receive the new group of features to implement and each iteration corresponds to a point of observation.this experimentation is yet ongoing. Kick off Points of data collection Demos of the teams products 1st 2nd 3rd 4th 1st group of features 2nd group of features 3rd group of features 4th group of features iteration Corrado Aaron Visaggio 8

The Second Dimension Suitable contexts Pair Programming Specific Benefits Costs/Benefits Ratio costs/benefits Specific Benefits of the Practice Suitable Contexts of the Practice Is Pair Programming an effective means for diffusing and enforcing design knowledge in a project s team? Corrado Aaron Visaggio 9 Knowledge Transfer One of the expected benefits of pair programming is fostering the knowledge transfer. Software design requires an efficient management of knowledge at team level and documentation is not enough because: strategies for problem solving are scarcely captured; it is necessary to deal with different levels of abstraction: implementation, database, business logic, presentation, deployment, interaction with other systems, and communication protocols; documentation has a very low bandwidth: face to face communication can be most effective and time-saving. Could pair designing be an appropriate alternative for diffusing and enforcing software system knowledge among project team s members? Corrado Aaron Visaggio 10

The Experimentation Research Objects: Diffusing Knowledge: disseminating knowledge within project team - initial phases of the project. Enforcing knowledge: improving the individual knowledge of project s participants -advanced phases of the project. Research Question: Is pair designing effective for diffusing and improving knowledge within project s teams? Experiments: An explorative experiment (demonstrating that pair design can foster knowledge leveraging) One Experiment at University of Sannio, Benevento, Italy. A replica at University of Castilla-La-Mancha, Ciudad Real, Spain. Hypotheses: Hoa: the pair designing does not affect the diffusion of design knowledge when performing evolution tasks. Hob: the pair designing does not affect the improvement of design knowledge when performing evolution tasks. Corrado Aaron Visaggio 11 The Experiment Subjects Treatment Input Output 5 MUTEGS 5 MUTEGS 5 MUTS 5 MUTS Paired MUTEGS MUTEGS Paired MUTS MUTS 8 MUTS Solo 8 MUTEGS Solo Requirement Specification; Use case Diagram; Class Diagram; Entry questionnaire QA (or QB); Exit questionnaire QB(or QA). Modifications to Use Case Diagram and Class Diagram; Answered entry questionnaire QA (or QB); Answered exit questionnaire QB(or QA). Experimental Design Experiment # 1 (Italy) Subjects Treatment Input Output Paired Requirement Specification; 3BScMngmt-3BScMngmt Use case Diagram; 3BscSys-3BscSys Class Diagram; 5MSc-5MSc Entry questionnaire QA (or QB); Exit questionnaire QB(or QA). 64 students 3BScMngmt 3BscSys 5MSc 32 students 3BScMngmt 3BscSys 5MSc Solo Modifications to Use Case Diagram and Class Diagram; Answered entry questionnaire QA (or QB); Answered exit questionnaire QB(or QA). Experimental Design Corrado Aaron Visaggio 12 Experiment # 2 (Spain)

The Experiment s Process start start 1. each subject studied documentation for 30 minutes, individually 2. an entry questionnaire, individually, for about 15 minutes; 4. each subject answered an exit questionnaire individually 3. the pairs and the solo designers performed the maintenance tasks for 2 hours; end end Corrado Aaron Visaggio 13 The Randomisation tests Test Between Entry Questionnaires of Subjects of MUTS Pairs sample (α) Subjects of MUTS Solos sample (β) Entry Questionnaires of Subjects of MUTEGS Pairs sample(α) Subjects of MUTEGS Solos sample(β) Entry Questionnaires of Solos of the 3BScSys sample(α) Pairs of the 3BScSys sample (β) Entry Questionnaires of Solos of the 5MSc sample(α) Pairs of the 5MSc sample (β) Entry Questionnaires of Solos of the 3BScMngmnt sample(α) Pairs of the 3BScMngmnt sample (β) Rank Sum α Rank Sum β p-level 171,000 39,000 0,214768 112,000 59,000 0,130919 425,000 395,000 0,214741 31,500 46,5000 0,229767 425,000 395,000 0,321966 Experiment Italian Experiment Spanish Experiment The experiments samples of pairs and those of solos were formed by equivalent subjects. Corrado Aaron Visaggio 14

The Satistical Tests: Knowledge Diffusion Test Between MUTS Pairs (α) MUTS Solos (β) MUTEGS Pairs (α) MUTEGS Solos (β) MUTS Pairs (α) MUTEGS Pairs (β) Pairs 5MSc(α) Solos 5MSc(β) Pairs 3BScSys(α) Solos 3BScSys(β) Pairs 3BScMngmnt(α) Solos 3BScMngmnt(β) Rank Sum α Rank Sum β p-level 116,500 54,50 0,049 78,50 57,50 0,270 135,00 75,00 0,023 51,500 26,500 0,030912 253,000 567,000 0,00017 447,000 778,00 0,00000 experiment Italian experiment Spanish experiment Results and Interpretation: Empirical Evidence: pairs outperformed solos: pair design is a candidate means for diffusing knowledge. Side Effect: pair design success in diffusing knowledge may depend on the individual skills. Corrado Aaron Visaggio 15 The Statistical Tests: Knowledge Improving Test Between MUTS Pairs (α) MUTS Solos(β) MUTEGS Pairs (α) MUTEGS Solos (β) MUTS Pairs (α) MUTEGS Pairs (β) Spanish Pairs 3BScSys (α) Spanish Solos 3BScSys (β) Spanish Pairs 5MSc (α) Spanish Solos 5MSc (β) Spanish Pairs 3BScMngmnt (α) Spanish Solos 3BScMngmnt (β) Rank Sum α Rank Sum β p-level 123,500 47,500 0,0102 53,500 66,500 0,2164 110,500 42,500 0,0428 49,500 28,500 0,086984 551,000 269,000 0.000942 51,500 26,500 0,042337 Experiment Italian experiment Spanish experiment Results and Interpretation: Empirical Evidence: confirmation of knowledge diffusion results: pair design is a candidate means for improving knowledge pair design success in improving knowledge may depend on the individual skills. Corrado Aaron Visaggio 16

Qualitative Analysis Pairs Std Dev. Average Max Min Experiment MUTS Pairs 1,75 5,8 9 4 MUTEGS Pairs 1,60 3,9 7 1 Italian MUTS Solos 1,03 4,25 6,00 3,00 Experiment MUTEGS Solos 1,55 5,13 7,00 3,00 Pairs 3BScSys 1,02 6,00 7,00 3,00 Solos 3BScSys 1,26 4,44 6,00 3,00 Pairs 5MSc 0,98 6,17 7,00 5,00 Solos 5MSc 0,82 5,33 7,00 5,00 Pairs 3BScMngmnt 0,73 6,30 8,00 5,00 Solos 3BScMngmnt 0,94 4,21 5,00 1,00 Spanish Experiment Statistical MUTS Pairs MUTS Solos MUTEGS Pairs MUTEGS Solos Parameter average 2,000-1,400-0,800-0,750 max 5,000 2,000 1,000 1,000 min -1,000-3,000-3,000-2,000 std dev 1,915 2,074 1,643 1,500 Statistical 5MSc Pairs 5MSc Solos 3BScSys 3BScSys 3BScMngmnt 3BScMngmnt Parameter Pairs Solos Pairs Solos average 1,167-0,500 1,714-0,579 1,111-1,036 max 3,000 3,000 4,000 3,000 3,000 2,000 min -1,000-2,000-1,000-4,000-1,000-5,000 std dev 1,722 1,871 1,736 1,865 Corrado Aaron Visaggio 1,278 1,85617 The Questionnaires Two Questionnaires were used to evaluate knowledge built Test Between Questionnaire A (α) Questionnaire B (β) in the experiment Questionnaire A (α) Questionnaire B (β) in the replica Rank Sum α Rank Sum β p-level 540,00 406,00 0,161 598,00 677,00 0,2068 The Experiment results were independent by the specific questionnaire used Corrado Aaron Visaggio 18

Pair designing is helpful for: diffusing knowledge, when the team is not familiar with the project, at the initial phases; Improving knowledge when the team needs a better and deeper understanding of the project, at the advanced phases. pair designing results and performance may depend on the individual skills of components. Corrado Aaron Visaggio 19 The Third Dimension Suitable contexts Pair Programming Specific Benefits Costs/Benefits Ratio costs/benefits Specific Benefits of the Practice Suitable Contexts of the Practice Are distributed processes suitable for pair programming? Corrado Aaron Visaggio 20

How Distribution Affects Pair Programming Benefits A more than emerging trends Global Software Development 24h production cycles, reduce costs of resources, and enhance mobility Global software development process; Virtual teaming Pair programming Pair Programming increases software quality without increasing significantly the time of developing Distribution hinders fluidity for communication and comfort for collaboration what is the impact on working practices that rely on C&C? Corrado Aaron Visaggio 21 Experimentation Research Objects: Quality: Pair Programming helps to achieve high quality of code, thanks to contemporary reviews of code and design Performance: the pair s work fastens the production, thanks to intense collaboration. Research Questions: RQ 1 Are there significant differences in effort when the pair s components are distributed, referring to co-located pair s components? RQ 2 Are there significant differences in quality produced when the pair s components are distributed, referring to colocated pair s components? Experiment: Subjects were volunteer Students Universities of Sannio and Naples Corrado Aaron Visaggio 22

Hypotheses Null hypotheses H 0RQ1 : Does not exist a significant difference in effort required for implementing modifications between distributed pair programming and co-located pair programming, μ distr_time = μ co-loc_time H 0RQ2 : Does not exist a significant difference between the quality of maintenance performed, μ distr_quality = μ co-loc_quality Alternative hypotheses H 1RQ1 : A significant difference in effort required for implementing modifications between distributed pair programming and co-located pair programming does exist μ distr_time μ co-loc_time H 1RQ2 : A significant difference between quality of maintenance realised does exist μ distr_quality μ co-loc_quality Corrado Aaron Visaggio 23 Experiment s Characterisation (1/2) Effort spent, Measured as the difference of the start time and the end time required to accomplish the maintenance tasks Ratio scale Quality of the maintenance realised, A scoring function counting the successful test cases Ordinal scale Subjects were trained with: an introductory seminar (4hrs), lab exercises (2hrs), a proof run (2hrs), an assessment seminar (2hrs) Documentation to to students listings of of the the programs textual description of of maintenance tasks time time sheet to to fill fill in in description of of the the correct execution of of pair pair programming roles roles questionnaire to to be be compiled at at the the end end of of the the experiment Corrado Aaron Visaggio 24

Experiment s Characterisation(2/2) Technological platform Tools Function Purpose Motivation VNC Share the desktop: it lets the remote control of a PC. Collaboration The experimenters had experience in using it in previous projects; Open Source. NetMeeting Text chat. Its usage was well known to all the Communication experimental subjects. JBuilder IDE for Java Programs. Subjects had experience in using it in Programming previous projects. Experimental design Round I Round II Group A Co-located P 1 Distributed P 2 Group B Distributed P 1 Co-located P 2 Corrado Aaron Visaggio 25 Tests and Results Round I Group A co-located Mann Whitney Group B distributed Round II Group A distributed Mann Whitney Group B co-located p-level Effort round I 0,564 Effort round II 1,000 Quality round I 0,465 Quality round II 0,011 description Mann Whitney test on effort data between Group A (colocated) and Group B (distributed) in round I. Mann Whitney test on effort data between Group A (distributed) and Group B (co-located) in round II. Mann Whitney test on quality data between Group A (colocated) and Group B (distributed) in round I. Mann Whitney test on quality data between Group A (distributed) and Group B (co-located) in round II. Only the round II quality s results are statistically significant Corrado Aaron Visaggio 26

Dismissal Hypothesis 180 Effort Box Plot ( Run 2v*4c) I 200 Effort Box Plot Run ( 2v*4c) II 160 180 140 160 120 140 100 120 80 100 60 80 40 Var25 Var26 Median 25%-75% 60Non-Outlier Range Var22 Var23 Median 25%-75% Non-Outlier Range Co-located Distributed After an an initial period of of collaboration the the distributed pairs tend to to work as as solo programmer Corrado Aaron Visaggio 27 Quality 9 Run I Box Plot ( 2v*4c) 10 Run II Box Plot ( 2v*4c) 8 9 7 8 7 6 6 5 5 4 4 3 2 Var18 Var19 3 Median 25%-75% Non-Outlier 2 Range Var21 Var22 Median 25%-75% Non-Outlier Range Co-located Distributed Quality results give a confirmation of the dismissal hypothesis Corrado Aaron Visaggio 28

Replica s characterisation Replica aimed at confirming the dismissal hypothesis What changed University of Naples student subjects C++ rather than Java More intensive and focused training Reduce the time for performing the tasks From 180 min to 90 min Corrado Aaron Visaggio 29 Replica s results p-level Effort 0,083 Quality 0,043 Description Mann Withney tests on effort data between colocated and distributed pairs Mann Whitney tests on quality data between colocated and distributed pairs There is empirical evidence that distribution affects quality 90 Box Plot ( 2v*4c) Effort 9,5 Quality Box Plot ( 2v*4c) 80 9,0 70 8,5 8,0 60 7,5 50 7,0 40 6,5 30 20 6,0 Median 25%-75% 5,5Non-Outlier Range Median 25%-75% Non-Outlier Range Var14 Var15 Var9 Var10 Co-located Distributed Co-located Distributed Corrado Aaron Visaggio 30

Experimental Validity (1/2) Round I Group A co-located Wilcoxon Round II Group A distributed Group B distributed Group B co-located Wilcoxon p-level Effort Group A 0,465 Effort Group B 0,715 Quality Group A 0,345 Quality Group B 0,969 description Wilcoxon test on effort data of the Group A between round I and II Wilcoxon test on effort data of the Group B between round I and II Wilcoxon test on quality data of the Group A between round I and II Wilcoxon test on quality data of the Group B between round I and II There is no empirical evidence of maturation Corrado Aaron Visaggio 31 Experimental Validity (2/2) Round I Round II Group A co-located Group A distributed Group B distributed Wilcoxon Group B co-located Effort first experiment Quality first experiment Effort replica Quality replica p-level Description 0,508 Wilcoxon test on effort data between round I and round II in the first experiment. 0,445 Wilcoxon test on quality data between round I and round II in the first experiment. 0,715 Wilcoxon test on effort data between round I and round II in the replica. 0,109 Wilcoxon test on quality data between round I and round II in the replica. There is no empirical evidence that monooperation bias affects experiment validity Corrado Aaron Visaggio 32

Qualitative analysis Post experiment assessment Questionnaire Open discussion Communication: a vocal support preferable No need for video Acquaintance: pairs have to be used working together Anarchic behaviour: distribution emphasises the lack of a proper protocol for working in pair Corrado Aaron Visaggio 33 Distribution seems to affect pair programming quality No empirical evidence that effort increases when distributing pair programming Pair dismissal because of a poor technology Corrado Aaron Visaggio 34

Corrado Aaron Visaggio 35