Improving software testing course experience with pair testing pattern. Iyad Alazzam* and Mohammed Akour

244 Int. J. Teaching and Case Studies, Vol. 6, No. 3, 2015 Improving software testing course experience with pair testing pattern Iyad lazzam* and Mohammed kour Department of Computer Information Systems, Yarmouk University, P.O. Box 566, Irbid 21163, Jordan Email: eyadh@yu.edu.jo Email: mohammed.akour@yu.edu.jo *Corresponding author bstract: Pair programming is a software engineering agile technique where two programmers use a single computer to perform certain tasks together. Research in using pair programming in education showed that pair programming has a positive effect in improving students ability and willingness to continue learning and practising. In comparison with research in this specific area little research was reported in evaluating the use of pair programming for teaching other software engineering skills (e.g., pair testing, pair design, etc.). This paper focuses on assessing and evaluating the use of pair programming techniques in developing and writing test cases in order to test software systems. Pair programming experiments was used to evaluate the ability of students to learn how to create test cases with different perspectives of coverage in mind. Keywords: pair programming; mutation; testing; code coverage. Reference to this paper should be made as follows: lazzam, I. and kour, M. (2015) Improving software testing course experience with pair testing pattern, Int. J. Teaching and Case Studies, Vol. 6, No. 3, pp.244 250. Biographical notes: Iyad lazzam is an ssistant Professor in the Department of Computer Information Systems at the Yarmouk University in Jordan, He received his PhD in Software Engineering from NDSU, US, Master from LMU, UK in Electronic Commerce and BSc in Computer Science and Information Systems from Jordan University of Science and Technology in Jordan. His research interests lays in software engineering and software testing. Mohammed kour is an ssistant Professor in the Department of Computer Information System at the Yarmouk University (YU). He received his Bachelor (2006) and Master (2008) degree from Yarmouk University in Computer Information System with honour. He joined YU as a Lecturer in ugust 2008 after graduating with his Master in Computer Information System. In ugust 2009, he left YU to pursue his PhD in Software Engineering at the North Dakota State University (NDSU). He joined YU again in pril 2012 after graduating with his PhD in Software Engineering from NDSU with honour. He serves as a reviewer for several conferences and journals. He has participated as a co-chair for several conferences. Copyright 2015 Inderscience Enterprises Ltd.

Improving software testing course experience with pair testing pattern 245 1 Introduction Pair programming is a programming technique in which two people work on the same programming task together using only one computer screen (Beck, 1999). Pair programming is considered one of the main practices in the agile development especially in the extreme programming (XP) (Beck, 2000). Pair programming is considered one of the main practices in the agile development especially in the XP (Beck, 2000). The main objective of using pair programming is to enhance the knowledge gain and dissemination and also help learners work in a group and effectively communicate with each other. Literature reports also goals related to improving production time and quality (Beck, 1999). In the pair-programming structure, the one who actually write the code or the test cases is called the driver and the second one is called the navigator or observer. s software developers have the trend of creating their own style of programming, observer role is to make sure that standards and quality constraints are followed. He/she is expected to have their own look on the code to find possible errors. The purpose of this paper is to assess and evaluate the using of pair programming technique in developing and writing test cases in order to test software systems. The remains of this paper is organised as follows. In the next section, a background for certain and related papers is included. Section 3 explains and illustrates the research hypotheses and study design. Section 4 provides and discusses the results of the experiments and the paper is concluded in Section 5. 2 Background This background section will concentrate mainly on the metrics of the source code quality, mutation testing and to some extent on pair programming. Max Goldman applied pair programming technique in the test driven development through suggesting a model of pair programming structure. In his model, the procedure of the test driven development is to mix a pair of a programmer and a tester (i.e., pair-programming and testing) (Goldman and Miller, 2010). In this case, the pair is not doing similar tasks. nother research in Sillitti et al. (2012) studied how the developer attention is affected by the pair programming practice. The results of this research showed that programmers working as a pairs spend extra time in creative and useful activities. They also found that pair programming reduced the switching time among using different tools. dditionally, when programmers switch to a different tool, they have a more specific goal. Moreover they found that pair programming makes the programmers focus much more on each task prior to going to another task. kour et al. (2013) conducted a formal pair programming experiments at Yarmouk University to assess and evaluate empirically the impact of pair programming in improving the learning effectiveness, efficiency, and gratification of students in software engineering course. l-ramahi et al. (2013) addressed the effect of pair programming practice on the performance of computer science students. They conducted a case study of several programming tasks in third-level practical programming course using individual and pair programming options. comparison study was performed with other works in the literature in the area of pair programming.

246 I. lazzam and M. kour 2.1 Empirical experiments of pair programming and Quality 2.1.1 Density of coding standard deviation The initial metric applied to explain the quality outcomes of pair programming is associated with the compliance to programming standards. The density of coding standard deviation is calculated throughout the total number of deviations found from the programming standards regarding the total code size created with every programming style (Hulkko and brahamsson, 2005; Elish and Offutt, 2002; Fang, 2001). The density of coding standard deviation per 100 lines of code is calculated through the following equation: DCSTD = B P 100 N N N where B N is the total number of breakdowns to stick to programming standards prepared with programming type n. P N is the total number of code lines created according to the programming type n. 2.1.2 Comment ratio The comment ration metric is an additional quality metric used in measuring and assessing the quality of the source code. It is computed as the percentage of the comment lines to the total actual lines of code (Hulkko and brahamsson, 2005; ggarwal et al., 2002). Therefore the comment percentage for a program N is calculated through the following equation. ( ) CR =1 lc P N N N where lc N is the total number of logical lines of code created along with program N, and P is the total number of physical lines of code created along with program N. N The study in Hulkko and brahamsson (2005) showed that pair programming has increased the percentage of the comments almost 60% higher than in individual programming. 2.1.3 Relative defect density The relative defect density is the total number of defects divided by the total number of logical lines of code (Hulkko and brahamsson, 2005). This metric of code quality is used when the program number of defects is precisely not known. It is calculated through using the comparative total effort used with diverse programs where the defects have been occurred. Low relative defect density means the produced source code is more reliable and matures (Humphrey, 1995). The relative defect density is calculated through the following equations for 1,000 lines of code. RDD = C%i/lCn 1,000 i

Improving software testing course experience with pair testing pattern 247 where I is the variable index which represents every discovered defect, C% is the comparative total effort used with program n where the defects have been occurred, and lcn is the total number of logical lines created along with program N. The results of study on pair programming to estimate the impact of pair programming on relative defect density showed that pair programming has a positive impact on reducing and minimising the relative defect density comparing with individual programming in the magnitude of six times. 2.2 Mutation testing Mutation testing is used to evaluate and assess the quality of test cases through examining if test cases have the ability to reveal faults in the system under test (SUT). The idea of mutation testing technique is that if test cases can uncover simple faults, it can also uncover difficult faults (DeMillo et al., 1978; ndrews et al., 2005). Earlier researches have shown that mutation testing is a dependable and trustworthy mean to evaluate and assess the effectiveness of test cases in revealing faults, and that created mutants represent real errors or problems (ndrews et al., 2006). Faults are produced through making numerous versions of the system under test; each version includes one fault. For a system under test S, the mutation testing produces numerous versions of S and places one fault in each version. These versions (S1, S2,, Sn) are called mutants. Then, the test cases are executed against mutants. If a test case showed different results between actual code and mutated one, this test case is said to kill the mutant. mutant that is not killed by any test case is called a live mutant. ccording to ndrews et al. (2006), mutants are alive if the test cases in not strong and sufficient to find the differences between the original program and the updated version or if the changes on the program do not have any effects on the external behaviour. The mutants are produced by applying mutation operators on the systems under test. There are many mutation operators levels that are used to create mutants such as statement, method and class operators. Mutation operators at statement level create mutants through placing in a syntactic modification in the statement of program. The mutants created using the statement level operators are used to evaluate the unit test cases. Mutation operators at method level and class level are categorised into inter-method, intra-method, intra-class and inter-class respectively (Offutt et al., 1992; Gallagher et al., 2006). 2.3 Cobertura and code coverage Cobertura (Doliner et al., 2002) is a free Java tool that calculates the percentage of code accessed by tests. It is commonly used in the continued integration development process to guarantee the unit test is setup and written to exercise the application source code. We used Cobertura to evaluate the coverage of the test cases that were generated by pair programming teams and individuals. Cobertura allows us to indicate the test coverage percentage and highlights the pieces of code that is not covered by the test cases. Cobertura generates report automatically to indicate the line and branch coverage as the test cases were generated by each paired and individual groups for each application under test.

248 I. lazzam and M. kour 3 Research hypotheses and study design The main purpose of this study is to assess the impact of using pair programming on student s capability on building test cases and whether pair programming increases the quality of the generated test cases. The following hypotheses were evaluated in this research: H1 H2 Written test cases based on pair programming increase the number of killed mutants. Written test cases based on pair programming provide better code coverage. To carry out this study we used data gathered from one course taught at CIS Department, Yarmouk University (CIS 492, Section 1), where 38 students were registered in the course in Spring 2013. We divided the students randomly into two groups; 1 group 1, which included 20 students 2 group 2, which included 18 students. The involvement in this study was optional, three students rejected to contribute. For that reason, we had 20 students in group 1 and 15 students in group 2. Students were randomly assigned into two groups, group 1 (i.e., those who are expected to work in pairs on the task), or group 2 (i.e., they are expected to work independently). Student were paired based on their grades to make sure that each pair has a good student in terms of grades or GP. To ensure the students are knowledgeable and familiar with test cases and how to write and create test cases for systems under test, we illustrated and provided four lectures about test cases and how test cases are created using Junit in the lab. We have asked the two groups to write test cases for two different applications written in java language. We have used the mutation testing to evaluate the quality of test cases that are created by each group by determining the number of killed mutants for each application. Then we used Cobertura to evaluate the quality of test cases based on the coverage criterion, i.e., lines of code and branches. We have used two applications in our experiments, PureMVC (http://puremvc.org/) and pachecli (http://puremvc.org/). PureMVC is a framework for creating applications based on the classic model, view and controller (MVC) concept. In our study we used the standard framework which provides a methodology for separating the coding interests according to the MVC concept. The pachecli is a library that provides an PI for parsing command line options passed to programs. The library can also print help messages describing the available options for a command line tool. 4 Results and discussions Table 1 shows the results of the first experiment. It shows that the total number of mutants that were created for both applications under study. The results revealed how students who were paired (PP) achieved better results in terms of the number of killed

Improving software testing course experience with pair testing pattern 249 mutants than the students who were working individually. The mutation score for both applications was calculated for each group. We calculated the mutation score by dividing the total number of killed mutants by the total number of mutants. Paired group mutation score in pchecli application was 47% which is better than the score of individual group which was 31%. Moreover, in the second application PureMVC, the obtained mutation score of the paired group was 51% while individual group score was 30%. Therefore, the H1 hypothesis: written test cases based on pair programming increase the number of killed mutants is accepted in our study. Table 1 Mutation scores pplication # of mutants Killed Live Mutation score PP pachecli 313 147 166 47% PP PureMVC 51 26 25 51% pachecli 313 96 217 31% PureMVC 51 15 36 30% In order to evaluate Hypothesis 2, we utilised Cobertura (Doliner et al., 2002) to automatically calculate the percentage of code accessed by tests that were generated by paired and individual groups. Table 2 shows how paired group test cases attain higher percentage in terms of accessed source code. For PCLI, the paired group test cases produced higher line and branch coverage than individual group test cases. lthough individual group test cases branch coverage was very close to the branch coverage that was achieved by paired group in the second application PurMvc, there was yet a significant difference between the two groups in terms of line coverage. From the above results, Hypothesis 2 is accepted as well. Table 2 Line and branch coverage from Cobertura pplication PP-lines coverage PP-branches coverage Lines coverage Branches coverage PCLI 88.51% 76.29% 80.32% 59.45% PurMVC 82.58% 70.10% 71.61% 68.23% 5 Conclusions Pair programming pattern has shown significant effects in increasing the students skills effectiveness in a software engineering development course. However, the research on the effects of pair programming on the quality of testing software in software engineering and testing classes are neglected. This paper concentrated on assessing and evaluating the use of pair programming scheme in developing and writing test cases in order to test software systems through coverage and killed mutant s criteria. The results showed that the test cases based on pair programming increase the number of killed mutants and provide better code coverage.

250 I. lazzam and M. kour References ggarwal, K.K., Singh, Y. and Chhabra, J.K. (2002) n integrated measure of software maintainability, nnual Reliability and Maintainability Symposium. kour, M., l-radaideh, K., lazzam, I. and lsmadi, I.M. (2013) Effective pair programming practice: toward improving student learning in software engineering class, International Journal of Teaching and Case Studies, Vol. 4, No. 4, pp.336 334. l-ramahi, M., lazzam, I. and lsmadi, I. (2013) The impact of using pair programming: a case study, International Journal of Teaching and Case Studies, Vol. 4, No. 4, pp.313 329. ndrews, J., Briand, L. and Labiche, Y. (2005) Is mutation an appropriate tool for testing experiments?, Software Engineering, ICSE, Proceedings, 27th International Conference on, IEEE, pp.402 411. ndrews, J., Lionel Briand, Yvan Labiche, and kbar Siami Namin, (2006) Using mutation analysis for assessing and comparing testing coverage criteria, Software Engineering, IEEE Transactions on, Vol. 32, No. 8, pp.608 624. Beck, K. (1999) Extreme Programming Explained: Embrace Change, ddison-wesley, US. Beck, K. (2000) Extreme Programming Explained: Embrace Change, ddison-wesley, Reading, Massachusetts. DeMillo, R., Lipton, R. and Sayward, F. (1978) Hints on test data selection: help for the practicing programmer, Computer, Vol. 11, No. 4, pp.34 41. Doliner, M., Lukasik, G. and Thomerson, J. (2002) Cobertura 1.8 [online] http://cobertura.sourceforge.net/. Elish, M. and Offutt, J. (2002) The adherence of open source java programmers to standard coding practices, The 6th ISTED International Conference on Software Engineering and pplications, US. Fang, X. (2001) Using a coding standard to improve program quality, 2nd sia-pacific Conference on Quality Software, Hong Kong. Gallagher, L., Offutt, J. and Cincotta,. (2006) Integration testing of object-oriented components using finite state machines: research articles, Softw. Test. Verif. Reliab., Vol. 16, No. 4, pp.215 266. Goldman, M. and Miller, R.C. (2010) Test-driven roles for pair programming, Proceedings of the ICSE Workshop on Cooperative and Human spects of Software Engineering. Hulkko, H. and brahamsson, P. (2005) multiple case study on the impact of pair programming on product quality, Proceedings of the 27th International Conference on Software Engineering. Humphrey, W.S. (1995) Discipline for Software Engineering, ddison-wesley, US. Offutt, J., Pargas, R., Fichter, S. and Khambekar, P. (1992) Mutation testing of software using a mind computer, in International Conference on Parallel Processing, Citeseer. Sillitti,., Succi, G. and Vlasenko, J. (2012) Understanding the impact of pair programming on developers attention: a case study on a large industrial experimentation, Proceedings of the International Conference on Software Engineering.