- PDF Free Download

Size: px

Start display at page:

Download ""

Austen Neal
6 years ago
Views:

9 To Maria

11 Acknowledgements Acknowledgements Dear Reader, thank you for your interest on this doctoral dissertation. I hope you will find the work interesting and useful for your needs. Before considering the ideas and results, however, I wish you could use a few moments for the acknowledgments presented below. This work was done in the Aalto University department of Computer Science and Engineering. The daily work was conducted in the research projects of Software Process Research Group funded by TEKES and industry partners. Part of this work was also funded by SoSE (Doctoral Programme on Software and Systems Engineering). These organizations deserve my acknowledgements. They have enabled the doctoral dissertations of many others, too. Dear professor Casper Lassenius, the head of Software Process Research Group, you provided me the necessary funding and supported me with interesting insights and critical comments, which greatly improved this dissertation. You also set me tough goals and schedules that forced me keep on going. Dear professor Mika V. Mäntylä and postdoctoral researcher Juha Itkonen, I truly admire your in-depth knowledge about software engineering. Your creativity is beyond comparison. Mika, you helped me through the research articles. Juha, you provided valuable instructions and academic guidelines especially in the latter parts of the dissertation. I also got help from the researches of Software Business and Engineering Institute. Dear SoberIT people, you reviewed my articles and supported me by being yourself. Thank you, Jari Vanhanen, Kristian Rautiainen, Jarno Vähäniitty, Ville Heikkilä, Maria Paasivaara, and many others. Either should be forgotten the co-authors who greatly improved the articles. Thank you, Mika V. Mäntylä, Jari Vanhanen, Casper Lassenius, Juha Itkonen, Risto Virtanen, and Juha Viljanen. It has been a pleasure to work with you. Furthermore, our industrial partners made this work possible and reasonable. They opened the doors for real-world software engineering, which made my observations and research ideas possible. I also want to thank the anonymous software engineering students who participated in my research as subjects. Additionally, I want to thank the experienced software engineering students who participated in the development of ARCA-tool. Thank you Risto Virtanen, Juha Viljanen, Helin Anssi Matti, Hovi Roope, Jaanto Jari, Kekäle Mika, Kere Markus, Koistinen Joona, Laukkanen Eero, Patana Jussi, Rihtniemi Pekka, Saarinen Jerome, Sevenius Toni, Valjus Mikko, and Viitanen Jonne. i

12 Acknowledgements I also acknowledge all of my friends and family members supporting me, and my closest ones, during this work. Especially, thank you Lasse Makkonen, Jarkko & Anneli Lehtinen, and Pirkka T. Pekkarinen whose experience and personal example guided me to finalize this project. Finally, the greatest support came from my daughter and wife. Dear Iia Lehtinen, your birth is the moment of my life and it encouraged me to finalize this work. Every single breath you have taken has also changed my life. You have truly told me what to do next. Dear Maria Lehtinen, this thesis would not have been made without your love, support, and understanding. Your encouraging words and the newer ending trust on my work were the success factors of this dissertation. This book is dedicated to you. I love you from the bottom of my heart. Espoo, October 2014 Timo Lehtinen Timo Olli Antero Lehtinen ii

13 List of publications List of publications This doctoral dissertation consists of two parts, a summary, and of the following articles which are referred to in the text by their numerals (I-V). I II III IV V Development and evaluation of a lightweight root cause analysis method (ARCA method) Field studies at four software companies Timo O.A. Lehtinen, Mika V. Mäntylä and Jari Vanhanen Journal of Information and Software Technology, Volume 53, Issue 10, October 2011, Pages What are problem causes of software projects? Data of root cause analysis at four software companies Timo O.A. Lehtinen and Mika V. Mäntylä Proceedings of International Symposium on Empirical Software Engineering and Measurement, 2011, Pages A tool supporting root cause analysis for synchronous retrospectives in distributed software teams Timo O.A. Lehtinen, Risto Virtanen, Juha O. Viljanen, Mika V. Mäntylä and Casper Lassenius Journal of Information and Software Technology, Volume 56, Issue 4, April 2014, Pages Perceived causes of software project failures An analysis of their relationships Timo O.A. Lehtinen, Mika V. Mäntylä, Jari Vanhanen, Juha Itkonen and Casper Lassenius Journal of Information and Software Technology, Volume 56, Issue 6, June 2014, Pages An experimental comparison of using cause-effect diagrams and simple memos in software project retrospectives Timo O.A. Lehtinen, Mika V. Mäntylä, Juha Itkonen and Jari Vanhanen Journal of Systems and Software (2014), 26 pages, in revision. iii

14 Author s contribution Author s contribution In articles I V, the author contributed significantly to the creation of the research ideas. He also contributed significantly to the data collection and analyses including revealing the main findings for conclusions. Additionally, he wrote the original manuscripts. The co-authors provided comments, improvement ideas and criticisms for each article. Additionally, they helped to refine the text by changing wording, providing clarifications, adding some references, improving argumentation, and refining the discussion. Article I: Development and evaluation of a lightweight root cause analysis method (ARCA method) Field studies at four software companies The author developed the ARCA method and conducted the literature review. He also collected and analyzed the data from the industrial cases. The coauthors participated in the observations and they provided minor improvement ideas for the ARCA method. Article II: What are problem causes of software projects? Data of root cause analysis at four software companies The author collected and analysed the research data. The co-author participated in the interpretation of results. Article III: A tool supporting root cause analysis for synchronous retrospectives in distributed software teams The author steered the development of the software tool and provided its requirements. He also observed the industrial data collection and wrote down notes for the data analysis. He conducted the data analyses. Article IV: Perceived causes of software project failures An analysis of their relationships The author conducted the data collection and analysis. The results were interpreted together with the co-authors. Additionally, one co-author helped the author to conduct inter-rater agreement on the results. Article V: An experimental comparison of using cause-effect diagrams and simple memos in software project retrospectives The author conducted the data collection and analysis. The co-authors participated in the interpretation of results. iv

15 Terminology and abbreviations Terminology and abbreviations ARCA Bridge cause Cause-effect diagram Causal relationship Causal structure Causal model Cause entity Cause type Cause sub-type Characteristic of detected cause (CDC) Depth level The RCA method that was developed in this study. A cause which is related to another process area than the one of its effect. A bridge cause explains how two process areas are related to one another. A diagram of causes and effects including two types of causal structures. List-based structures: a fishbone diagram, a fault tree diagram, a logic tree, and a causal factor chart. Network-based structures: a directed graph and a matrix diagram. A cause-and-effect relationship between two mutually exclusive events, i.e. a cause and its effect. An assembly of causes and effects which structures their mutual relationships. A complete specification of the causal relationships that govern a given domain. An entity of causes and effects that are reasonable to process together. Expresses what the cause is. For example, a cause There is lack of instructions on what I should do has a type People. More detailed description of the cause type. For example, instructions & experiences is a sub-type of People. A combination of the process area and cause type for an individual cause. The number of cause-effect pairs from a cause to the target problem. For example, Depth level=1 indicates causes which directly explain the target problem and Depth level=2 indicates the causes which explain the causes having Depth level=1. v

16 Terminology and abbreviations Hub cause Method effectiveness (ME) Number of hub causes (NoH) Perception of participants (PP) Process area Proposed cause Size of depth levels, SoDL(x) Software project failure Software project retrospective Root cause analysis (RCA) RCA facilitator Root cause Selected cause Sub-cause Target problem A sub-cause which explains more than one cause. See also NoH. ME indicates the number of detected causes per time unit. The number of hub causes in a causal structure. PP reflects the evaluations of participants. An area of work and responsibility which represents one part of the whole software development process. For example, software testing. A cause of an event which is proposed for process improvement activities. See also Selected cause. SoDL(x) is a function that indicates the number of causes in a depth level x. See also Depth level. A recognizable failure to succeed in the cost, schedule, scope, or quality goals of the project. The recognizable means a failure perceived as severe enough to be prevented in the upcoming projects. A post-project activity where a group of people looks back on the software project in order to facilitate learning and improvements based on the experiences gained during the project. Post-project review is a synonym for a retrospective. A structured investigation of a problem which takes a problem as an input and provides a set of its causes as an output. A person who leads an RCA team. An underlying cause of the target problem that the management has the power to control. A cause of an event which is selected for software process improvement activities. See also Proposed cause. A cause which explains another cause. A problem which is analysed by using RCA. vi

17 Table of contents Table of contents Part I: Summary Introduction Motivation Study objectives Structure of the thesis Related work The law of causality in software engineering Definitions of root cause analysis RCA in software process improvement RCA of retrospective methods The environment of use Gaps in the prior studies of RCA Work practices of RCA Perceptions of practitioners Outcome of RCA Research approach and methodology Research questions Development of the ARCA method and ARCA-tool Ease of use and cost-efficiency evaluations Outcome of RCA with software project failures Research articles The framework of design science The environment The knowledge base The artefact design Development of the ARCA method and ARCA-tool Development of the ARCA method Development of ARCA-tool Field study evaluations Field studies at Cases Field studies at Cases Controlled experiment evaluations Research context Experiment design vii

18 Table of contents Response variables and research hypothesis Controlling undesired variation Data analysis Case study evaluations Data collection Data analysis The ARCA method Synthesis of RCA methods from literature Target problem detection Root cause detection Corrective action innovation Overview of the ARCA method Step 1: Target problem detection Step 2: Root cause detection Step 3: Corrective action innovation Step 4: Documentation of the results ARCA-tool Comparison of RCA software tools Ease of adoption Real-time collaboration Cause-effect diagramming Corrective action development Support for voting Support for knowledge management Costs Overview of ARCA-tool Initializing ARCA-tool Target problem detection Root cause detection Corrective action innovation The documentation of results Evaluation results Evaluation of the ARCA method Evaluation of the ARCA method ease of use Evaluation of the ARCA method cost-efficiency Evaluation of the ARCA method outcome Evaluation of ARCA-tool Evaluation of the ease of use of ARCA-tool Evaluation of the usefulness of ARCA-tool The cause types, process areas, and their relationships Process areas Cause types Similarities of the causes of failures...49 viii

19 Table of contents Common causal relationships bridging the process areas Discussion Lightweight RCA method and software tool Common steps of RCA methods and their work practices Software tools for the RCA of retrospectives Perceived ease of use and cost-efficiency Ease of use and cost-efficiency of the ARCA method Improving the ARCA method with ARCA-tool The outcome of RCA with software project failures Frequently used process areas and cause types The role of bridge causes Feasible targets for process improvement activities Implications Evaluation of the research Construct validity Internal validity External validity Reliability Conclusions and future work Conclusions Future work References Part II: Articles ix

21 Part I: Summary Part I: Summary 1

23 Introduction 1. Introduction Everything that exists, and everything that happens, exists or happens as a necessary consequence of a previous state of things. T. N. Thiele (1931) The discipline of today s software engineering (SE) originates from the software project problems introduced in the late 1960s (Naur and Randel 1969). Up to 34 percent of today s software projects are either unsuccessful or cancelled (El Emam and Koru 2008). Software project retrospectives have been used to increase the success rate of upcoming software projects. Software project retrospectives are post-project activities wherein a group of people looks back to the software project in order to facilitate learning and make improvements based on the experiences gained during the project (Birk, Dingsøyr, and Stålhane 2002). 1.1 Motivation Root cause analysis (RCA) is a structured investigation of a problem to detect the underlying causes that need to be prevented (Latino and Latino 2006). It is a commonly recommended technique for problem prevention (Latino and Latino 2006; Andersen and Fagerhaug 2006; Ammerman 1998; Cooke 2003; Rooney and Vanden Heuvel 2004). It takes the problem as an input and provides a set of its perceived causes including the perceived causal relationships as an output. A causal relationship refers to the causal relationship between the cause and its effect (Chillarege et al. 1992). In the SE context, RCA has been introduced as a method for software project retrospectives (Dingsøyr, Moe, and Nytrø 2001). It has been claimed to help in developing effective corrective actions (Rooney and Vanden Heuvel 2004). For example, a 50 % decrease in defect rates (Card 1998), a 53 % savings in costs, and a 24 % increase in productivity (Leszak, Perry, and Stoll 2000) have been reported. However, the work practices of the RCA methods have been introduced on too general a level to be adopted as such. Additionally, the subject matter experts perceptions of the RCA methods have not been studied systematically. Furthermore, its added value for software project retrospectives has not been widely studied. 3

24 Introduction 1.2 Study objectives The main objective of this thesis is to develop and evaluate a lightweight RCA method and software tool for software project retrospectives in order to provide empirical evidence on its feasibility for software project failure prevention in small- and medium-sized (SME) organizations. Most of the prior studies on RCA have been conducted in large organizations (Card 1998; Leszak, Perry, and Stoll 2000; Jalote and Agrawal 2005; Gupta et al. 2008; Grady 1996; Mays 1990), but it could be useful in SME organizations too, as also noted by Stålhane et al. (2003). However, the optimal RCA method for SME organizations is likely different than the one for large organizations. Therefore, studying how to use RCA in the retrospectives of SME organizations is reasonable. The thesis contributes to three research problems, which are introduced thoroughly in Section 2.4. The first research problem is to explain how to conduct RCA in collocated and distributed software project retrospectives? The second research problem is to study whether RCA is perceived as efficient and easy to use in software project retrospectives? The third research problem is to determine whether the outcome of RCA indicates how the causes of software project failures are interconnected? While considering the research problems listed above, the thesis makes three scientific contributions. The first contribution is the lightweight RCA method and supporting software tool. These two artefacts contribute to the first research problem as they introduce how RCA can be conducted in collocated and distributed software project retrospectives (see articles I and III). The second contribution is an empirical evaluation of the lightweight RCA method and software tool (see articles I, III, and V). The empirical evaluation contributes to the second research problem as it introduces how the software engineering practitioners perceived the ease of use, cost-efficiency, and outcome of the RCA method and its software tool. The empirical evaluation is divided into various levels of software project retrospectives. These include team-level retrospectives, organization-level retrospectives, and companylevel retrospectives, ultimately aiming to reveal the causes of software project failures. Additionally, the evaluation covers the use of collocated and distributed retrospectives. The third contribution is a detailed, in-depth analysis of the outcome of RCA (see articles II and IV). The analysis is limited to four cases of software project failures. The outcome analysis contributes to the third research problem as it provides empirical evidence on the feasibility of using RCA to explain why a software project failed, additionally, where the causes of the failure occurred and how the causes were related to one another. The overall research approach in this thesis is design science (Hevner et al. 2004) including empirical evaluation with the mixed-methods approach (Shull, Sjøberg, and Singer 2008) that combines three main research approaches: observation-based industrial field studies (Lethbridge, Elliott Sim, and Singer 2005), case studies (Yin 1994), and controlled experiments (Juristo and Moreno 2003). 4

25 Introduction 1.3 Structure of the thesis This thesis consists of two parts: the dissertation summary and research articles. The dissertation summary starts with a brief introduction to the related work in Section 2, which includes the theorization and the use of RCA in SE context. The section ends with a discussion of the three research problems addressed in this thesis. Thereafter, Section 3 presents the research objectives and methods including the use of the design science framework. Section 4 introduces the lightweight RCA method, and Section 5 presents the developed software tool. The results of the empirical evaluation and the in-depth analysis of the RCA method outcome are summarized in Section 6. Section 7 discusses the research questions, implications, and threats to validity. Finally, Section 8 states the conclusions and directions for future work. The second part includes the research articles. 5

26 Related work 2. Related work This section starts with the law of causality and discusses its relevance to analysing the causes of SE problems. Thereafter, the concept of RCA and an explanation of how it is used in software project retrospectives are introduced. The section ends with a discussion of gaps in the existing research. 2.1 The law of causality in software engineering The underlying theory of problem prevention is based on the law of causality, which has been considered by scientists and philosophers starting with Aristotle (Álvarez 2009), Hume (1896), and recently Pearl (2000). The law of causality states that the occurrence of problems is the consequence of a previous state of actions (Thiele 1931). Causality refers to the causal relationship between sequential and mutually exclusive events (Granger 1988), i.e. the relationship between a cause and its effect (Chillarege et al. 1992). A causal model refers to a complete specification of the causal relationships that govern a given domain (Galles and Pearl 1997), i.e. it explains what happened, where it happened, and why it happened. I make three assumptions based on the law of causality. First, the problems of software projects follow the law of causality. This assumption is logical because the software development work is based on sequential and mutually exclusive events, a set of linked activities (Wang and King 2000) in which the previous state of actions affect the latter state of actions. Thus, the law of causality exists also with software project problems. Prior studies support this assumption. Cerpa and Verner (2009) presented that causal relationships between the causes of software project failures likely exist. McLeod and Mac- Donell (2011) presented that the factors of the software project outcome are interconnected through multidimensional relationships. Furthermore, Xiangnan et al. (2010) presented that the causes of software project failures are caused by actions being interconnected through internal and external causes. Second, the problems of software projects are interconnected over the process areas. This assumption divides the sequential and mutually exclusive events of software development into software process areas, logical areas of different types of software development work. The prior studies indicate that software development process areas are interconnected (Monteiro et al. 2010). Therefore, it is reasonable to assume that a problem in one process area could also cause problems in other process areas. 6

27 Related work Third, the problems of software projects reoccur in future projects if the related causal relationships are not detected and controlled. Prior studies have found many causes common in software project failures, which mean that the causes of failures transfer from prior projects to upcoming projects if they are not controlled or eliminated (Card 1998). Respectively, controlling the causes of problems has been introduced as valuable (Dingsøyr, Moe, and Nytrø 2001; Card 1998; Leszak, Perry, and Stoll 2000; Jalote and Agrawal 2005; Grady 1996; Kalinowski, Travassos, and Card 2008; Bjørnson, Wang, and Arisholm 2009; Al-Mamory and Zhang 2009; Siekkinen et al. 2008; Traeger, Deras, and Zadok 2008; Stålhane 2004; Bhandari et al. 1993; Jin et al. 2007). Thus, detecting and controlling the causal relationships of common software project problems is also practically useful. 2.2 Definitions of root cause analysis In the terminology of this thesis, root cause analysis is a systematic process of detecting a target problem, detecting and organizing its causes, and recognizing its root causes. This definition considers the use of RCA as a technique for detecting the causes of a problem. In the prior literature, RCA has been introduced as a method for decreasing the likelihood of the reoccurrence of the problems (Rooney and Vanden Heuvel 2004; Card 1998; Leszak, Perry, and Stoll 2000; Card 1993). However, there seems to be a slight disagreement whether RCA is considered only a structured investigation of a problem (Latino and Latino 2006; Ammerman 1998; Leszak, Perry, and Stoll 2000; Bjørnson, Wang, and Arisholm 2009) or whether it also includes the development of corrective actions (Andersen and Fagerhaug 2006; Rooney and Vanden Heuvel 2004; Card 1998; Card 1993). Furthermore, the prior literature has introduced a term, root cause, which has been used to indicate a target problem cause, which is perceived as important to control and feasible for process improvement activities. Conceptually, a target problem could be affected with numerous root causes. In the terminology of this thesis, a root cause is an underlying cause of the target problem that the management has the power to control. This definition considers the root cause as an internal cause of the company (Xiangnan, Hong, and Weijie 2010). In the prior literature, many authors have defined a root cause as a target problem cause that the management has the power to control (Andersen and Fagerhaug 2006; Ammerman 1998; Rooney and Vanden Heuvel 2004; Livingstone, Jackson, and Priestley 2001). A root cause has also been defined as any underlying cause of the target problem (Rooney and Vanden Heuvel 2004). Additionally, a root cause has been defined as the deepest cause at the end of the causal structure (Andersen and Fagerhaug 2006; Ammerman 1998). However, this latter definition is contradictory due to the law of causality (see Section 2.1) as the deepest causes do not exist if everything that exists is caused by some earlier actions. 7

28 Related work 2.3 RCA in software process improvement One goal of software process improvement (SPI) is to prevent software project failures. In order to reach this goal, SPI requires in-depth knowledge about the problems of past software projects (Boh, Slaughter, and Espinosa 2007; Edmondson 1996; Von Zedtwitz 2002). Such in-depth knowledge has been obtained from experienced individuals by using software project retrospectives (Dingsøyr 2005) at various levels, including the levels of teams (Birk, Dingsøyr, and Stålhane 2002; Bjørnson, Wang, and Arisholm 2009), organizations, and companies (Stålhane et al. 2003; Kalinowski, Travassos, and Card 2008). The team-level retrospectives are conducted by software teams and are aimed to analyse the problems relevant to the project goals of the teams. The organization-level retrospectives are conducted with participants representing the stakeholders of the whole software organization and are aimed to analyse the problems relevant to the project goals of the organization. The companylevel retrospectives are conducted with participants representing the stakeholders of the whole software company, which aims to analyse the problems relevant to the company goals. Figure 1 summarizes the flow of improvements from problematic software projects towards improved ones. The flow starts by recognizing the problems of past projects. It continues by using the team-, organization-, and companylevel retrospectives analysing why the problems occurred, and it ends by controlling those problems in future projects. It presents the use of RCA as part of making improvements over the software projects, i.e. to explain why the problems of past projects occurred (Dingsøyr, Moe, and Nytrø 2001) RCA of retrospective methods In software project retrospectives, the use of RCA results in the creation of perceived causal models for the target problems (Stålhane et al. 2003). Software project retrospectives have utilized two RCA methods: 1) defect causal analysis (Bhandari et al. 1993) and 2) post-mortem review (Collier, DeMarco, and Fearey 1996). Both of these methods follow two work phases: 1) the detection of a target problem and 2) the detection of root causes. The RCA methods Problematic prior software projects Team level retrospectives (RCA) Organization level retrospectives (RCA) Company level retrospectives (RCA) Improved future software projects Figure 1. The flow of improvements from problematic software projects. 8

29 Related work vary in terms of their aim and the work practices used in the work phase of the detection of a target problem. Instead, the methods follow similar work practices in the work phase of the detection of root causes. These two work phases are discussed below. The detection of a target problem is the first phase of RCA methods, and its goal is to define the target problem for the second work phase. In defect causal analysis, the target problems include specific types of software defects. In a post-mortem review, the target problems may include any type of SE problems faced by individuals. Furthermore, the work practices of defect causal analysis utilize formal defect sampling combined with statistical methods including defect classifications and Pareto analysis (Card 1998). In comparison, the work practices of a post-mortem review are less formal and may include project surveys (Collier, DeMarco, and Fearey 1996) and brainstorming with individuals (Bjørnson, Wang, and Arisholm 2009). Furthermore, a post-mortem review also includes the detection of project success factors (Collier, DeMarco, and Fearey 1996), whereas defect causal analysis only detects problems that have occurred. The detection of root causes is the second phase of RCA methods, and its goal is to explain why the target problems occurred. The RCA methods commonly use a retrospective meeting where a group of people analyse why the target problem occurred (Card 1998; Stålhane et al. 2003). There the detection of target problem causes is conducted by constantly asking why? for every cause of the target problem (Jalote and Agrawal 2005). Additionally, causeeffect diagrams are commonly used to organize and register the target problem causes based on their perceived causal relationships (Card 1998; Bjørnson, Wang, and Arisholm 2009; Stålhane 2004) The environment of use RCA has been used in the project retrospectives of small and large organizations. However, most prior studies on RCA have been conducted in large organizations (Card 1998; Leszak, Perry, and Stoll 2000; Jalote and Agrawal 2005; Gupta et al. 2008; Grady 1996; Mays 1990), as also noted by Stålhane et al. (2003). In large organizations, the optimal work practices for detecting and defining the target problems seem to be different than in small organizations. In large organizations, the target problems are detected with problem sampling (Card 1998; Leszak, Perry, and Stoll 2000; Jalote and Agrawal 2005; Gupta et al. 2008; Grady 1996; Mays 1990; Collier, DeMarco, and Fearey 1996). Instead, in small organizations, the target problems are detected by brainstorming (Dingsøyr, Moe, and Nytrø 2001; Stålhane et al. 2003). A lightweight RCA method has been defined as an RCA method that can be conducted in a retrospective meeting lasting half a day (Dingsøyr, Moe, and Nytrø 2001). It seems that the use of brainstorming in the target problem detection phase makes the RCA method lightweight, as the detection of target problems can be conducted in the same retrospective meeting than the detection of root causes (Dingsøyr, Moe, and Nytrø 2001). Such an RCA method does not require heavy start-up investments and is adaptable to various target 9

30 Related work problems. Regarding the prior literature, lightweight RCA methods are feasible for small companies (Stålhane et al. 2003), whereas large organizations require more effort in order to define the target problem for RCA (Card 1998). In large organizations, the target problem has to be well defined as otherwise the number of target problem causes is too high (Jalote and Agrawal 2005). Regarding the prior literature, the target problems of large organizations are often related to software defects. The target problems of small companies, on the other hand, have been related to high-level causes of software project failures, e.g. to estimation problems (Stålhane et al. 2003). 2.4 Gaps in the prior studies of RCA There are three major gaps in the prior studies, which are considered in this thesis. These gaps are presented in the following sub-sections Work practices of RCA Retrospectives should be lightweight, because otherwise they are neglected (Glass 2002). The concrete work practices of RCA are fairly little-studied in the context of software project retrospectives. There are only a few studies (Stålhane et al. 2003; Bjørnson, Wang, and Arisholm 2009; Dingsøyr 2005) on how to use RCA to detect, organize, and select the root causes of a software project failure. Additionally, the RCA methods presented by many authors are either heavyweight or too generally introduced to be adopted as such, e.g. Card (1998) introduces the mandatory phases of RCA but does not concretize how the phases are conducted. Additionally, most of the prior studies on RCA have been conducted in large organizations. Instead, an RCA approach could be useful for SME organizations too, but it has been rarely studied in such contexts (Stålhane et al. 2003). Furthermore, the need to conduct RCA in distributed retrospectives has been introduced (Stålhane et al. 2003). However, there are no prior studies on how to use RCA in such circumstances. Software tools are used in distributed retrospectives to support real-time collaboration and information exchange over the distributed sites, but the prior tools (Terzakis 2011) do not enable the cocreation of a cause-effect diagram. Thus, conducting RCA in distributed retrospectives becomes difficult. The first research problem follows. Research problem 1: How can RCA be conducted in collocated and distributed software project retrospectives? Perceptions of practitioners RCA has been presented as a feasible approach to software project retrospectives. Thus, it could be useful for software process improvement. However, its perceived ease of use, cost-efficiency, and added value have not been widely studied. 10

31 Related work There are only a few studies that have compared the use of RCA with the retrospectives which do not use it (Dingsøyr, Moe, and Nytrø 2001; Card 1998; Stålhane et al. 2003; Stålhane 2004). Similarly, the RCA methods are not widely compared with one another (Stålhane et al. 2003; Bjørnson, Wang, and Arisholm 2009). The effort required to conduct RCA (Card 1998; Grady 1996; Mays 1990) has also been neglected in most of the studies. Furthermore, how participants experience RCA (Birk, Dingsøyr, and Stålhane 2002) is not widely studied, i.e. do the software developers experience RCA as a useful approach for lightweight retrospectives? Although the prior studies of RCA are promising (Card 1998; Leszak, Perry, and Stoll 2000), they do not indicate whether RCA is useful in retrospectives in which problems other than technical quality deviations are analysed, e.g. Stålhane et al. (2003). The second research problem follows. Research problem 2: Is RCA perceived as efficient and easy to use in software project retrospectives? Outcome of RCA RCA takes a problem as an input and provides the perceived causal relationships as an output. Thus, theoretically, RCA could be a feasible approach to explaining why a software project failed. It could reveal not only what happened and where it happened, but also why it happened a gap in prior studies on software project failures, discussed further in Article IV. In practice, however, RCA has not been widely reported as being feasible for such purposes. There are only a few real-world studies (Stålhane et al. 2003) indicating that RCA reveals any interconnections between the causes of software project failures. Instead, most of the industrial cases of RCA (e.g. Card 1998; Leszak, Perry, and Stoll 2000; Jalote and Agrawal 2005; Gupta et al. 2008; Grady 1996) have studied the causal relationships between target problems and their individual causes only while disregarding the analyses of their mutual causal relationships. Thus, the real-world case studies on the use of RCA to explain the perceived causal relationships of software project failures are scarce. The third research problem follows. Research problem 3: Does the outcome of RCA indicate how the causes of software project failures are interconnected? 11

32 Research approach and methodology 3. Research approach and methodology The main objective of this thesis is to develop and evaluate a lightweight RCA method (called ARCA) and software tool (called ARCA-tool) for the software project retrospectives of SME organizations. The overall research approach is design science (Hevner et al. 2004). The approach includes artefact development and empirical evaluation with the mixed-methods approach (Shull, Sjøberg, and Singer 2008), that combines three main research approaches: observation-based industrial field studies (Lethbridge, Elliott Sim, and Singer 2005), case studies (Yin 1994), and controlled experiments (Juristo and Moreno 2003). This section starts by introducing the research questions (RQ) that are aimed at contributing to the research problems (see Section 2.4). Thereafter, Section 3.2 presents the research articles (I-V). Section 3.3 introduces the framework of design science and Sections 3.4 to 3.7 present the use of the framework in the development and evaluation of the ARCA method and ARCA-tool. Figure 2 summarizes the linkages between the research problems, the research questions, and the studies of the thesis. 3.1 Research questions The development and evaluation of the ARCA method and ARCA-tool answers a total of seven research questions. These are introduced below Development of the ARCA method and ARCA-tool There are two research questions about the first research problem, namely, How can RCA be conducted in collocated and distributed software project retrospectives? The studies of this thesis focused on creating knowledge about the environment of use and the literature of RCA methods (Article I) and RCA software tools (Article III). Research question 1: What are the common steps of RCA methods, and how are they to be conducted? The first research question reviews prior RCA methods and synthetizes their commonalities and work practices. This knowledge is thereafter used to develop the ARCA method and ARCA-tool. 12

Research approach and methodology Development Research Problem 1: How can RCA be conducted in collocated and distributed software

RQ1 RQ2 Article I Literature Review Environment Article III Literature Review Environment ARCA method ARCA-tool Evaluation Cases 1-4

Article V Article III RQ5 RQ6 RQ7 RQ3 RQ4 Research Problem 3: Does the outcome of RCA indicate how the causes of software project

Research Problem 2: Is RCA perceived as efficient and easy to use in software = Contributes = Answers = Supports Figure 2.

Research question 2: What software tools for RCA are introduced, and how do they support software The second research question

The research question reviews prior RCA software tools and compares their main features for conducting RCA in collocated and

33 Research approach and methodology Development Research Problem 1: How can RCA be conducted in collocated and distributed software project retrospectives? RQ1 RQ2 Article I Literature Review Environment Article III Literature Review Environment ARCA method ARCA-tool Evaluation Cases 1-4 Student projects Cases 5-6 Multiple Case Study Field Study 1 Controlled Experiment Field Study 2 Article II Article IV Article I Article V Article III RQ5 RQ6 RQ7 RQ3 RQ4 Research Problem 3: Does the outcome of RCA indicate how the causes of software project failures are interconnected? Research Problem 2: Is RCA perceived as efficient and easy to use in software project retrospectives? = Contributes = Answers = Supports Figure 2. The summary of the research approach linking the research problems, research questions, and studies of the thesis. Research question 2: What software tools for RCA are introduced, and how do they support software project retrospectives? The second research question considers alternative software tools for RCA. The research question reviews prior RCA software tools and compares their main features for conducting RCA in collocated and distributed software project retrospectives Ease of use and cost-efficiency evaluations There are two research questions about the second research problem, Is RCA perceived as efficient and easy to use in software project retrospectives? The studies of this thesis include two industrial field studies introduced in articles I (Cases 1-4) and III (Cases 5-6). Additionally, a controlled student experiment was conducted (Article V). 13

34 Research approach and methodology Research question 3: Is the ARCA method perceived as efficient and easy to use for analysing software engineering problems in software project retrospectives? The third research question evaluates the perceptions of practitioners on the ARCA method. The evaluation is limited to usefulness, ease of use, and the ARCA method outcome. The use of the method covers collocated and distributed software project retrospectives. Research question 4: Is the developed ARCA-tool perceived as useful and easy to use in software project retrospectives applying the developed RCA method? The fourth research question evaluates the perceptions of practitioners using ARCA-tool. The evaluation is limited to usefulness and ease of use of the tool in industrial retrospectives Outcome of RCA with software project failures There are three research questions about the third research problem, Does the outcome of RCA indicate how the causes of software project failures are interconnected? The main assumption regarding the research problem is that the outcome of RCA should express what and where the causes of failures occur and how they are interconnected over the process areas. The research data is based on the same cases (Cases 1-4) that were used in Field Study 1 (Article I), as can be seen from Figure 2. Research question 5: Which process areas and cause types were frequently used in RCA to explain software project failures? The fifth research question expresses what the perceived causes of failures were and where in the development process they occurred. Taxonomy of the perceived causes was developed, evaluated, and applied in order to answer this research question. Research question 6: What causal relationships bridge the process areas? The sixth research question considers how the process areas were interconnected in the cases of software project failures. A bridge cause refers to a detected cause of failure for which the process area of the effect is different from the one of the cause. The thesis includes qualitative analyses of bridge causes. Research question 7: Do the causes perceived as feasible targets for process improvement differ from the other detected causes, and if so, how? The seventh research question considers the role of cause types, process areas, and bridge causes for process improvement activities. The thesis analyses the perceptions of practitioners and senior management on the detected causes that are feasible targets for process improvement. 3.2 Research articles There are a total of five research articles in this thesis (see Figure 2). Articles I and III include the development of the ARCA method and ARCA-tool. Articles 14

35 Research approach and methodology I, III, and V include the perceived ease of use and cost-efficiency evaluations. Articles II and IV include the analyses of the ARCA method outcome. Article I answers RQ1 and RQ3. It reviews and synthetizes prior RCA methods and their work practices. Article I also presents the development of the ARCA method. The method is evaluated in four industrial cases (Cases 1-4) aimed to reveal the causes of software project failures. Article III answers RQ2-RQ4. It reviews and compares prior RCA software tools. Article III also presents ARCA-tool. Article introduces how the tool supports the ARCA method. The software tool and the ARCA method are further evaluated in two industrial cases (Cases 5-6). Article V contributes to RQ3. It considers the actual and perceived effect of using a cause-effect diagram in the RCA of software project retrospectives. The article compares the use of the cause-effect diagram of the ARCA method and ARCA-tool with an approach of writing down simple memos during RCA. Such a comparison was important to conduct in order to separate the effect of the cause-effect diagram from the structured investigation of ARCA. Article II answers RQ5. It presents general cause types and process areas, explaining where in the development processes the causes of software project failures occur. The outcome of the ARCA method in Cases 1-4 was used in the analysis. Article IV answers RQ5-RQ7. It extends Article II by including in-depth analyses of the cause types, process areas, bridge causes, and feasible targets for process improvement in the case companies. Article IV presents that in a case of software project failure, the ARCA method helps to express what happens, where it happens, and why it happens. 3.3 The framework of design science The development and evaluation of the ARCA method and ARCA-tool were conducted by using the framework of design science (Hevner et al. 2004; March and Smith 1995). The framework (see Figure 3) consists of the environment, knowledge base, and artefact design (Hevner et al. 2004). In this thesis, the artefact design includes the development and evaluation of the AR- CA method and ARCA-tool. Environment Design Knowledge Base Organizations - Need for process improvements due to the problems in software projects People - Need for collocated and distributed knowledge sharing Technology - Need for real-time software tool Business Needs Development - ARCA method - ARCA-tool Assess Evaluation - Field Study - Experiment - Case Study Refine Applicable Knowledge Methods - RCA methods - RCA work practices - RCA software tools Theories - Theory of causality Gaps - Lack of know-how - Lack of the perceptions of People - Lack of studies on the interconnectivity of SE problems Application in the Software Engineering Additions to the Knowledge Base Figure 3. Framework of design science (Hevner et al. 2004). 15

36 Research approach and methodology The environment Environment refers to the context in which the ARCA method and ARCA-tool were planned to be used. The environment was small- and medium-sized international software product companies that needed a lightweight problemprevention method for their distributed software organizations and individual software teams. In order to understand the environment, business needs in the problem-prevention method were considered in six software product companies (Cases 1-6). The business needs resulted in the requirements of the ARCA method (see Section 3.4.1) and ARCA-tool (see Section 3.4.2). The first business need was related to the problems of software projects. The companies had faced major problems in their software development projects due to the complex products (Article IV). The problems included product quality issues and the schedule and effort overruns of the projects. The existing problem-prevention methods of the companies were feasible for detecting problems that had occurred but infeasible for conducting in-depth analyses of the causes of problems (articles I and III). The second business need was related to the needs of distributed software development. There were problems of facilitating lightweight software project retrospectives due to the distributed software development teams (Article III). Arranging face-to-face retrospectives with geographically distributed team members required too much effort. Respectively, a lack of real-time software tool support made it difficult to conduct distributed retrospectives The knowledge base The knowledge base was the prior literature on problem prevention theories, methods, work practices, and software tools. It revealed that the literature on RCA was relevant to problem prevention and software project retrospectives. Therefore, the applicable knowledge on RCA methods, work practices, theories, and software tools was gathered. The knowledge was used to develop the ARCA method and ARCA-tool The artefact design The business needs of the environment combined with the applicable knowledge were used to design the ARCA method and ARCA-tool. The development of these artefacts was important because the applicable knowledge did not include feasible solutions in the environment. For example, defect causal analysis would have required too much effort, and it would have been adaptable to software defects only (Article I). Respectively, post-mortem review would have been infeasible for geographically distributed organizations (Stålhane et al. 2003), and it would have been adaptable to project experiences only. Additionally, the applicable knowledge revealed gaps in the prior studies (see Section 2.4), which made the development and evaluation of the ARCA method and ARCA-tool also scientifically interesting. Therefore, the development of the ARCA method and ARCA-tool became reasonable. The companies 16

37 Research approach and methodology needed a method for distributed organizations useful for understanding the causes of software project problems, and we wanted to know how to conduct RCA with SME organizations over the collocated and distributed retrospective settings. 3.4 Development of the ARCA method and ARCA-tool The ARCA method and ARCA-tool are introduced in Sections 4 and 5. This section introduces how the ARCA method and ARCA-tool were developed, which provided knowledge relevant for the research questions RQ1 and RQ Development of the ARCA method The development of the ARCA method was initialized by setting down its requirements. Based on our understanding of the environment (see Section 3.3.1), we started by brainstorming the characteristics of a beneficial RCA method for software companies. We concluded that such a method would help the companies to develop high-quality corrective actions with low effort. This conclusion resulted in the following requirements: 1. The method helps to develop feasible and effective improvements 2. The method requires low effort 3. The method is easy to use 4. The method is adaptable to different kinds of target problems Thereafter, a literature review was conducted. The review covered RCA methods introduced in industrial engineering contexts. The search was limited to the literature found in Google and Scopus. The following search words were used to find the relevant literature: RCA, root cause analysis, DCA, defect causal analysis, defect analysis, defect prevention, and problem prevention. The review answered the following research questions introduced in Article I: 1. Are there steps common to RCA methods? 2. What are the recommended work practices in the different steps of RCA? Then, the first version of the ARCA method was created. It was based on the requirements and the findings from the literature review. Analytical argumentation for alternative work practices was used to develop the method. The first version of the ARCA method was piloted with a student software project (Article I) Development of ARCA-tool The development and evaluation of the ARCA method revealed the need for using a software tool during RCA. For example, collaborative cause-and-effect diagramming and idea development was found to be important in distributed 17

38 Research approach and methodology retrospectives. Unfortunately, the existing RCA software tools were infeasible for the environment of use (Article III). Therefore, an RCA software tool, named ARCA-tool, was developed. ARCA-tool was developed in two subsequent projects on the Aalto University software capstone project course 1. During the projects, the author of this thesis acted as the customer and provided the tool requirements. The software tool was designed to be used in the synchronous retrospective meetings of small software project teams including a maximum of ten team members. Additionally, the tool was required to support collocated and distributed software project retrospectives. The tool was also required to be simple and easy to use. The main requirements included the following (Article III): 1. Supports real-time collaboration over distributed team members 2. Enables co-creation of a cause-effect diagram 3. Enables developing ideas for the causes of problems 4. Enables voting for the most severe causes and best ideas 5. Enables capturing and refining the outcome of retrospectives 6. Protects the anonymity of team members 7. Is simple and easy to use 3.5 Field study evaluations Field studies are commonly used to improve and understand real-life work practices and tools (Lethbridge, Elliott Sim, and Singer 2005). Therefore, they were useful for evaluating the ARCA method and ARCA-tool. The observationbased industrial field studies with software product companies were used to evaluate the perceptions of practitioners using the ARCA method and ARCAtool. Six industrial cases (Cases 1-6) were conducted, and they were used to answer the research questions RQ3 and RQ4 (see Figure 2). Table 1 summarizes these cases. The field studies were positioned to cover incremental and agile software development approaches. Additionally, they were positioned to cover distributed software development settings. Furthermore, the field studies were positioned to cover retrospectives at various levels of analysis, including the levels of company, organization, and team (see Section 2.3). Regarding data collection, the ARCA method was evaluated by the participating people, i.e. the employees were interviewed and asked to provide feedback with questionnaires. The participants compared the ARCA method with the existing practices of the companies. The cases were video recorded and observed. The cases varied. First, in Case 5, the participants had previously used the ARCA method and ARCA-tool. Instead, in Cases 1, 2, 3, 4, and 6, the ARCA method and ARCA-tool were not used previously. The existing practices

39 Research approach and methodology Table 1. The summary of the field study cases. Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 Organization SME SME SME SME SME SME Evaluation Product com- Product com- Product com- Product com- Product or- Software domain pany pany pany pany ganization team SW Process Incremental Incremental Incremental - Scrum Scrum Retrospective 1 x ARCA 1 x ARCA 1 x ARCA 1 x ARCA 3 x ARCA 1 x ARCA Software tools Diagram + Diagram + Diagram + Diagram + ARCA-tool + ARCA-tool monitor monitor monitor monitor monitor only Participants Various Mostly devel- Various Various Various stake- Mostly devel- stakeholders opers stakeholders stakeholders holders opers (N=9) (N=9) (N=7) (N=6) (N=11) (N=5) Focus Defects Defects Installation Lead-time Requirements Team level RCA experiences No RCA 5-whys No RCA No RCA Post-mortem reviews No RCA included discussions about problems, followed by the development of corrective actions. Neither RCA nor cause-effect diagrams were widely used previously. Case 2 had tried 5-whys approach (Article I). Second, in Cases 1-4, the ARCA method was focused on problems at the company level, i.e. high-level problems that caused software project failures. Instead, in Cases 5-6, the ARCA method was applied to more focused problems. These included a problem of weak requirement specifications in a software organization (Case 5) and work-practice problems of an individual software team (Case 6). Third, Cases 1-4 were conducted at software organizations with more traditional, incremental software development processes. Instead, Cases 5-6 were conducted at software organizations with modern agile software development processes. Fourth, in Cases 1-5, the ARCA method was conducted collocated. Instead, in Case 6, the ARCA method was conducted distributed, filling the gap of limiting the evaluations to collocated settings only. Fifth, only Cases 5 and 6 evaluated ARCA-tool. The tool was developed after Cases 1-4. Therefore, they were not used to evaluate the tool. Instead, Cases 1-4 helped to consider the requirements of the tool Field studies at Cases 1-4 Cases 1-4 (Article I) were conducted in four medium-sized software organizations. The rationale for the selection of the case sites was that together they allowed us to evaluate the ARCA method in different software engineering contexts where RCA has not been used previously. Thus, the evaluation provides rich information about the improvement over the existing practices. In the data collection, the data sources and the data collection methods were triangulated in order to increase the reliability of the results (Yin 1994; Rune- 19

40 Research approach and methodology son and Höst 2008; Jick 1979). We used interviews (Yin 1994), questionnaires (Foddy 1994), measurements, and observations (Yin 1994) to evaluate the perceived usefulness and ease of use of the ARCA method. A total of five key representatives of the companies were interviewed, and 30 participants answered the questionnaires. Interviews were conducted with key representatives. Key representatives were company managers involved in steering their RCA case who had the power to make process changes in their companies. The interviews were held before and after a company case in order to analyse how the key representatives experienced the ARCA method. The interview questions were tested with researchers before the company cases. The questionnaires were used after the two main steps of the ARCA method, namely root cause detection and corrective action innovation, in order to analyse how the case participants experienced the ARCA method and its output. Closed- and open-ended questions were included in the questionnaires, as recommended by Foddy (1994). The interval between the questionnaire items was equal. The scale in each item was symmetric (1 = very low; 2, 3, 4 = neutral; 5, 6, 7 = very high). The questionnaires were tested by researchers and students before using them. The effort used and the output of the ARCA method was measured in each case. An accurate record of the used man-hours in each step of the ARCA method was kept. Additionally, we registered the number of detected and processed causes of target problems in each case. We also registered the number of developed corrective actions and the evaluations of participants regarding the perceived feasibility and impact of each corrective action. Observations were conducted by two researchers. One steered the ARCA method together with the key representatives, whereas one observed the actions of the ARCA method. Both researchers wrote notes during the case. The researchers held a feedback session after the ARCA method steps of root cause detection and corrective action innovation. The observations were used to consolidate the results from the interviews and questionnaires. The data analysis was conducted in two phases. First, after each company case, we considered the collected research data to conclude the strengths and weaknesses of the ARCA method. Second, after all company cases were conducted, we evaluated the ARCA method as a whole by combining all empirical evidence from the company cases Field studies at Cases 5-6 Cases 5-6 (Article III) were conducted in two organizations: a medium-sized and a small-sized software development organization. The rationale for the selection of these two case sites was that together they enabled us to evaluate the ARCA method and ARCA-tool in collocated and distributed agile software engineering contexts. Unlike in Cases 1-4, the company personnel steered the use of the ARCA method. The ARCA method was limited in Cases 5-6. The work phases of preliminary cause collection and corrective action workshop (see Section 4.2) were ex- 20

41 Research approach and methodology cluded from the cases. Therefore, the evaluation results regarding the ARCA method are also limited, respectively. For the convenience of the reader, the ARCA method is called the limited ARCA method when referring to the use of the ARCA method without these work phases. Similarly to the first field studies, the data sources and research methods were triangulated. We used interviews (Yin 1994), questionnaires (Foddy 1994), and observations (Yin 1994) to evaluate the perceived usefulness and ease of use of the limited ARCA method and ARCA-tool. A total of 16 case participants filled in the questionnaires and a total of eight participants were interviewed. The scale of questionnaire items was symmetric: 1=very minor, 2, 3, 4, 5=very major (Case 5) and 1=very low, 2, 3, 4, 5=very high (Case 6). During the data analysis, the interviews and questionnaires were summarized in order to conclude whether the perceptions of participants were similar between the cases. Both cases were first analysed separately, because the questionnaires and interviews varied slightly between the cases. This was due to differences in the company contexts. Case 5 had used the limited ARCA method and ARCA-tool previously, while Case 6 had not. After the interviews were conducted, we transcribed and coded them accordingly. After the analysis of both cases, we summarized the results from both cases in order to compare their similarities and differences. 3.6 Controlled experiment evaluations Controlled experiments are commonly used to compare alternative work practices and tools, e.g. Bjørnson et al. (2009). In this thesis, a controlled experiment was used to extend the field studies by focusing on the cause-effect diagram of the ARCA method and ARCA-tool. The experiment with 11 student software project teams (61 participants) was conducted to evaluate the impact of using the cause-effect diagram with the limited ARCA method. The data collection considered the perceptions of participants and the outcome of the limited ARCA method (see Section 3.5.2). The experiment results provide additional evidence for the research question RQ3 (see Figure 2) Research context The experiment was conducted with software project teams of a capstone project course in Aalto University. In the course, students develop real-world software for real-world customers in teams. Each software project lasts five months. The challenges encountered by the project teams are close to the challenges encountered in industrial software development. Each team includes seven to nine student members divided into roles: three managers and four to six software developers. Additionally, each team follows an iterative process framework, which is defined by the course. The framework divides the projects into three time-boxed iterations, each lasting six to seven weeks. The experiment was conducted in the retrospectives of eleven project teams out of fourteen during the academic year The participation in the 21

42 Research approach and methodology experiment was voluntary for the project teams. Table 2 summarizes the retrospectives of the teams divided into the techniques used to organize the causes of problems during RCA. The table presents the main focus of the retrospectives. We can see that most of the teams focused on similar target problems in both retrospectives. The number of participants and the language used remained similar over the retrospectives. The use of students as study subjects has been discussed in the SE literature (Svahnberg, Aurum, and Wohlin 2008; Berander 2004; Carver et al. 2003; Runeson 2003; Höst, Regnell, and Wohlin 2000). The student subjects of the controlled experiment were graduate-level students, who were experienced in software engineering and motivated to reach up to their project goals. Thus, they were feasible targets for revealing the trend of improvements (Berander 2004; Runeson 2003). Additionally, the student projects were close to real software projects. Thus, also the challenges faced by the students were industrially relevant, as we concluded in Vanhanen et al. (2012) Experiment design The author of this thesis controlled the methods and settings of each retrospective. As required by the course framework, each team conducted retrospectives at the end of the second and third iteration. Thus, the experiment design was limited to two experimental units for each team, 22 experimental units as a total. The retrospective method and the used effort were fixed for each unit. The experiment was conducted by using a single factor paired design including one blocking variable (Juristo and Moreno 2003). The examined factor Table 2. The summary of the retrospectives. Team Ltd. ARCA method (Cause-effect diagram) Control Group (List-of-causes) # L Target problem p c c/p # L Target problem p c c/p 1 1 F Co-operation, management F Co-operation, management F Scope, quality F Quality, scope E Scope, development E Co-operation, management F Scope, quality F Quality, scope F Co-operation, customer F Quality, customer F Tasks, motivation F Motivation, skills F Scope, task monitoring F Task monitoring, scope E Process, skills E Process, skills F Management, co-operation F Co-operation, management E Requirements, risk management E Requirements, skills F Co-operation, management F Co-operation, management Mean Mean #=indicates whether the technique was used in the first (1) or second (2) retrospective, L=used language (F=Finnish, E=English), p=the number of participants, c=the number of detected causes, c/p=the average number of detected causes per participant 22

43 Research approach and methodology was the technique used to visualize the causes of problems. The factor had two alternatives, including the cause-effect diagram of the ARCA method (see Figure 4) and a list-of-causes (see Figure 5). Considering the main differences between the alternatives, arrows are drawn between the causes of the problem when using the cause-effect diagram. Instead, in the list-of-causes, there are no arrows between the causes of the problem; the causal structure is visualized by using bulleted lists. Furthermore, in the case of many effects being caused by one cause, we can see that multiple arrows can be drawn from a cause under the related effects with the cause-effect diagram. Instead, with the list-ofcauses, such cause needs to be duplicated under each effect. Both alternatives of the examined factor were used in each team, but in different retrospectives. The project phase created a blocking variable that could not be fully eliminated. The experiment design was balanced by 1) randomizing the starting order of the alternatives for each team and 2) forcing half of the teams to start with the cause-effect diagram and the rest with the list-ofcauses technique. Additionally, paired analysis between the alternatives inside each team was used to compare the differences, which mitigated differences between teams. Table 3 summarizes the balanced design, including the distribution of teams in the alternatives and the related project phase when used. Table 3. Distribution of alternatives (A=Cause-effect diagram, B=List-of-causes) into 22 units (Article V). Team T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 I2 A A B A A A B B B A B Phase I3 B B A B B B A A A B A Figure 4. The cause-effect diagram used in the A alternative (Article V). 23

44 Research approach and methodology The Problem - Cause 1 o Cause 2 Cause 4 Cause 5 Cause 6 Cause 7 Cause 8 Cause 9 o Cause 3 Cause 10 Cause 16 - Cause 11 o Cause 12 o Cause 13 Cause 8 Cause 15 Cause 16 Cause 17 Cause 18 o Cause 14 Cause 19 Figure 5. The list-of-causes used in the B alternative (Article V) Response variables and research hypothesis There were five response variables that were compared over the alternatives of the examined factor. These included Method Effectiveness (ME), Size of Depth Levels (SoDL), Number of Hub Causes (NoH), Characteristics of Detected Causes (CDC), and Perceptions of Participants (PP). The efficiency of the retrospective method has been measured with the number of detected causes (Bjørnson, Wang, and Arisholm 2009). The response variable ME indicates the number of unique problem causes detected. According to our hypothesis, using the cause-effect diagram results in a higher ME than using the list-of-causes. Causal Structure is related to the causal structure of the causes of the problem. Regarding Causal Structure, we recognized the response variables SoDL (Bjørnson, Wang, and Arisholm 2009) and NoH (Bjørnson, Wang, and Arisholm 2009). The response variable SoDL indicates the number of causes over different Depth levels, defined as the number of cause-effect pairs from a cause to the target problem. A function SoDL(x) was created to measure SoDL. The function returns the number of causes being registered to Depth level x. We hypothesized that with both alternatives, the return value of SoDL(x) increases among the Depth levels, but the return values of SoDL(x) are larger with the cause-effect diagram. The response variable NoH indicates the number of hub causes, defined as a cause which explains more than one cause. NoH was measured by calculating the number of effects stemming from each cause. We hypothesized that NoH is a higher number with the cause-effect diagram. The cause-effect diagram of the ARCA method is a directed graph, whereas the list-of-causes is a tree. A tree-structured cause-effect diagram has been compared with the directed graph-structured cause-effect diagram (Bjørnson, Wang, and Arisholm 2009). The prior study indicates that the directed graph-structured cause-effect diagram results in increasing NoH values (Bjørnson, Wang, and Arisholm 2009). 24

45 Research approach and methodology Third, we also assumed that the technique used to visualize the causes of problems did not artificially steer the discussions of retrospectives into different types of problems or causes. Instead, it was possible that the project domain including changing situations could affect the causes of problems detected in the retrospectives. CDC was used to measure the differences in the discussion contents of retrospectives. We hypothesized that there is no difference in CDC over the alternatives. CDC was measured for each retrospective by using a classification system, which characterizes the types and process areas of failure causes (Article IV). During the data analysis, the distributions of causes in cause classes over the alternatives were compared by using linear correlations. Fourth, we measured PP, which indicates how the participants perceived the alternatives. The prior literature has commonly recommended using the cause-effect diagram in RCA. Thus, we assumed that the participants prefer using it in retrospectives. In order to measure PP, a questionnaire (see Article V) was used after each retrospective. The answers of participants who were not involved in both retrospectives (10 of 61 participants) were excluded. Additionally, after both retrospectives were conducted, another questionnaire combined with a group interview was used to compare the alternatives Controlling undesired variation Learning effect and team specific contextual factors likely affected the outcome of the retrospectives. We were not able to eliminate the blocking variable related to the project phase. Therefore, it was important to ensure that the contextual factors were similar in each experimental unit. A total of six context variables were controlled. These included the high-level goal of retrospectives, the number and roles of participants, the used language, the physical context, and the retrospective facilitator. Additionally, we identified and measured three confounding variables, since we had no control organizing the course s project teams and their customer s topics. The confounding variables included the specific target problem of the retrospectives (see Table 2), team members motivation, and team spirit Data analysis We used the outcome of the retrospectives in statistical analyses on ME, SoDL, NoH, and CDC. In order to analyse PP, we combined statistical methods with qualitative methods. ME was analysed with the paired-samples two-tailed t-test with alpha level The tests were conducted for the total number of detected causes and for the average number of detected causes per participants. SoDL and NoH were also compared by using the paired-samples two-tailed t-test with alpha level Over the retrospectives of each team, we analysed whether the cause-effect diagram results systematically in larger SoDL(x) and NoH values than the corresponding list-of-causes technique. 25

46 Research approach and methodology CDC was analysed in order to show that the alternatives did not significantly affect the discussion contents of the retrospectives. We started the analysis by classifying the detected causes into type and process area categories (see Article IV). For each cause, the process area and cause type classifications were combined, which resulted in a characteristic of the cause (there were a total of 84 possible characteristics). After the characteristics were determined for each cause, the Pearson s correlation between the numbers of causes with the same characteristic was calculated over the retrospectives of the limited ARCA method and the control group. The correlation was calculated between the retrospectives of each team and between the retrospectives of all teams combined together. The closer the correlation is to 1, the less different were the discussion contents of retrospectives using the different alternatives. The analyses of PP were based on questionnaires and group interviews. Questionnaire 1 was used after each retrospective to evaluate the work practices of the used retrospective method (Wilcoxon Signed Rank Test, alpha=0.05). Questionnaire 2 was used after both retrospectives were conducted. It was used to compare the retrospectives. The group interview was conducted after Questionnaire 2. It was used to understand the perceptions of participants. 3.7 Case study evaluations The case studies are commonly used to understand real-life phenomena and events (Yin 1994). In this thesis, a multiple case study approach was used to evaluate the outcome of the ARCA method in industrial settings, aiming to explain the relationships between the causes of software project failures (articles II and IV). The results are used to answer the research questions RQ5-RQ7 (see Figure 2). The data analysis covered the perceived causes of software project failures and their perceived causal relationships in Cases 1-4. The selection of these case sites was reasonable, as together they allowed us to analyse the commonalities of the perceived causes of four different software project failures Data collection The ARCA method was used as the data collection method at each case. Its detailed description can be found in Section 4.2. Therefore, this section only introduces the main phases and contextual settings of the ARCA method relevant to Cases 1-4. Each case started with a focus group with senior managers who had the power to make process changes in their companies. The aim was to determine a high-level target problem that had caused project failures systemically. Measurable evidence was used in the focus group to testify to the occurrence of the target problem. Additionally, the senior managers selected company experts, nine people as an average, who should participate in the detection of the causes of the target problem. The company experts included people from various process areas covering sales & requirements, management work, software development, software testing, and release & deployment. 26

47 Research approach and methodology Following the focus group, the researchers arranged two work phases. These included a preliminary cause collection and causal analysis workshop. In the preliminary cause collection, the company experts were asked to provide at least five causes explaining why the target problem occurred. The preliminary cause collection was confidential for the experts, and it was conducted by an exchange between the author of this thesis and each individual expert. Based on the preliminary cause collection, the researchers and senior managers created a cause-effect diagram. The managers analysed the diagram carefully and selected cause entities that should be analysed in the causal analysis workshop (see Section 4.2 for further details about the cause entities). The causal analysis workshop was a time-boxed meeting of 120 minutes, which was conducted with the named company experts. During the meeting, new causes were detected under each selected cause entity. The meeting resulted in a finalized cause-effect diagram. It was used to explain why the software project failure occurred. Thereafter, the experts were asked to propose causes that they perceived as important to be further processed in the process improvement activities. Then the senior managers considered the diagram and made the final selection about the causes that were processed in the process improvement activities. Considering the validity of the collected research data, we should note that it is based on the perceptions of people. The correctness and accuracy of the detected causes were evaluated in each case by the author of this thesis. Triangulation of the data sources and the data collection methods (Yin 1994; Runeson and Höst 2008; Jick 1979) increases the reliability of the detected causes. Before the preliminary cause collection was conducted, interviews were kept with the senior managers to detect the causes of failures which they perceived to be important. I assume that the causes they underlined in the interviews would also be recognized by the experts in the causal analysis workshop. The detected causes from both of these two groups were compared and it was found that in each case, the experts detected and extended most of the causes underlined by the senior managers. This comparison is documented in detail in Article IV as a part of the case study results. Additionally, interviews and questionnaires were used to evaluate the outcome of the ARCA method, which included the evaluation of the correctness and accuracy of the detected causes. Regarding these results, the experts and senior managers perceived that the detected causes were correct and accurate. This validation is presented later as a part of the field study results (see Section 6.1) Data analysis The data analysis included three phases. It started by analysing the types of causes and the related process areas expressing where the causes occurred (articles II and IV). Thereafter, it continued by analysing how the causes were interconnected (Article IV). Finally, the feasibility of causes for process improvement was studied (Article IV). The data analysis was initialized by developing a detailed classification system. Thereafter, the system was applied to the ARCA method outcome. The 27

48 Research approach and methodology development of the classification system was iterative. First, a literature review was conducted. The literature review covered problem cause classification dimensions used in the software engineering context. The dimensions of process areas and cause types were concluded to be important. The dimension of process area expresses where in the development processes the cause occurs (Grady 1996; Dye and van der Schaaf 2002; Jacobs et al. 2005; Nakashima et al. 1999), and the dimension of cause type describes what the cause is, e.g. an issue in the product (Grady 1996; Dye and van der Schaaf 2002; Nakashima et al. 1999) or in the people (Leszak, Perry, and Stoll 2000; Stålhane 2004; Dye and van der Schaaf 2002; Jacobs et al. 2005). Followed by the literature review, preliminary categories for the dimensions of process areas and cause types were created. Thereafter, the preliminary categories were combined with an approach similar to the grounded theory, as suggested in Salinger et al. (2007). The author of this thesis classified a sample of causes from each case and simultaneously refined the preliminary categories to correspond better to the causes of our cases. After the classification dimensions were finalized, they were applied to all detected causes and their distributions were used to introduce what the problem causes of software projects were and where they occurred, introduced in Article II. The analysis was continued by extending the work of Article II to individual cases combined with cross-case analysis, introduced in Article IV. During the analysis, the classification system was also slightly improved. For example, based on the results of inter-rater agreement (see Section 7.5.4), two process areas were combined together. The process areas of Development Work and Change Management were combined under the process area of Implementation. Respectively, some sub-categories were re-named, e.g. the sub-category Customers was re-named into Customers & Users. Additionally, some cause statements were excluded from the analysis, as the more detailed analysis showed that they were not the real causes of failures, but some coarsegrained statements about speculations given in the discussions of the causal analysis workshop, e.g. there was a cause statement: There is a study from the States which concluded that software quality should be the most important goal for companies. The total number of excluded statements was 18 from a total of 648 statements. The continued analyses also covered an analysis of the interconnectedness between the causes of project failures (Article IV). A new term, bridge cause, was founded, which refers to a cause that links process areas together. The bridge causes were analysed qualitatively. The analysis was initialized by selecting the perceived causal relationships for which the cause and effect were classified in different process areas. Thereafter, the selected pairs of causes and effects were grouped according to their process areas. For each group, the perceived causal relationships were explored by considering the original causeeffect diagrams. The explored parts of the cause-effect diagrams were summarized and concretized in order to conclude how the causes and effects were interconnected over the related process areas. 28

49 Research approach and methodology Finally, the causes that were perceived as feasible targets for process improvement were analysed (Article IV). During the classification of the causes into the process areas and types dimensions, the author of this thesis marked whether the cause was proposed and/or selected as a target for process improvement activities. The causal analysis workshop revealed detected causes, whereas the causes the company experts proposed after the workshop are called proposed causes. The causes that were selected for process improvement activities by the senior managers are called selected causes. The perceived feasibility for process improvement was divided into three importance categories. The selected causes represent the highest-importance category, because such causes reflect the decision makers perspective. The second-highest importance category is related to the proposed causes because they reflect the company expert s perspective. The third importance category consists of the detected causes, which were neither proposed nor selected for process improvement activities. It was compared quantitatively how the causes in these three importance categories varied. First, the distributions for process areas and cause types were compared. Second, the share of bridge causes was compared with the share of other detected causes. 29

50 The ARCA method 4. The ARCA method This section starts by presenting the results from the literature review, including the high-level synthesis of RCA methods and their common steps with work practices. Thereafter, the ARCA method is introduced. These results are presented in Article I, and they are used to answer the first research question (see Figure 2). 4.1 Synthesis of RCA methods from literature Table 4 summarizes the prior RCA methods and compares them with the AR- CA method. There are three steps that are common for RCA methods introduced in the literature. These include target problem detection, root cause detection, and corrective action innovation. These steps and their alternative work practices are discussed below. Table 4. Summary of RCA methods and their work practices. Method Target problem detection Root cause detection Corrective action innovation Work practices Work practices Work practices Rooney and Vanden Interviewing and inspections Sequence diagram and Decision - Heuvel (2004) diagram Ammerman (1998) Paper-and-pencil, walk-through, Sequence diagrams, Interviewing, and flowcharting event and causal factor charts, lists, and worksheets Interviewing Latino and Latino Problem sampling, flowcharting Flow chart, logic tree, and meetings Writing individually and meetings (2006) sequence diagrams, interview- with brainstorming ing, and Pareto analysis Card (1998) Problem sampling, classification A fishbone diagram, cause catego- Meetings schemes, Pareto analysis, and ries, and meetings meetings Dingsøyr et al. (2001) Brainstorming, Brainwriting, Post-it notes, and grouping of experiences Selection of the main issues, brainstorming, discussions, a fishbone diagram and drawing causes on a whiteboard - ARCA method (Article I) A focus group meeting and brainstorming Anonymous inquiry, a directed inquiry, brainwriting combined graph, brainwriting and brainstorming in a meeting tives, and brainstorming in a with sceptical and optimistic perspec- meeting 30

51 The ARCA method Target problem detection RCA methods start with the detection of a target problem. This initial step is usually conducted through problem sampling (Latino and Latino 2006; Andersen and Fagerhaug 2006; Card 1998; Leszak, Perry, and Stoll 2000; Jalote and Agrawal 2005; Grady 1996; Kalinowski, Travassos, and Card 2008; Burnstein 2003), flowcharting (Latino and Latino 2006; Andersen and Fagerhaug 2006; Ammerman 1998), interviewing (Latino and Latino 2006; Rooney and Vanden Heuvel 2004; Rooney and Vanden Hauvel 2003), or brainstorming (Latino and Latino 2006; Andersen and Fagerhaug 2006; Bjørnson, Wang, and Arisholm 2009). Usually, there is a meeting where the target problem is finally decided upon (Card 1998; Burnstein 2003). Brainstorming in a focus group meeting was included in the work practices of the ARCA method (see Table 4). In the context of software project retrospectives, brainstorming is probably the most cost-efficient approach to detecting the target problems. It has been presented as an excellent approach to identify rapidly what is important to people (Lethbridge, Elliott Sim, and Singer 2005). It has also been presented as a part of lightweight RCA methods (Dingsøyr, Moe, and Nytrø 2001). Additionally, it can be easily conducted in collocated and distributed settings. Problem sampling, flowcharting, and interviewing were excluded from the ARCA method. Problem sampling sounds like a great idea (see Article I), but it can be used only with problems being reported (Card 1998; Kalinowski, Travassos, and Card 2008; Burnstein 2003; Gursimran and Jeffrey 2009). There are many problems in software projects important to control, but they are not reported, e.g. requirements faults (Gursimran and Jeffrey 2009). Furthermore, flowcharting (Ammerman 1998) might be a useful work practice for the target problem detection, but in the context of software engineering, problems are often intangible. Therefore, drawing a flowchart for the entire event, in order to explain how the target problem evolves, might be difficult. Interviewing (Latino and Latino 2006; Rooney and Vanden Heuvel 2004) solves the problems of problem sampling and flowcharting. On the other hand, it is a labour-intensive task to meet numerous people, and thereafter register, transcribe, and interpret their answers Root cause detection Root cause detection is the second step of the RCA methods. The outcome of this step is a documented in-depth analysis of the underlying causes of the target problem. Usually, there is a team of people who investigate the target problem causes together (e.g. Latino and Latino 2006; Card 1998; Bjørnson, Wang, and Arisholm 2009). The work practices include interviewing (Ammerman 1998), questionnaires (Andersen and Fagerhaug 2006; Burr and Owen 1996), brainstorming, and brainwriting (Latino and Latino 2006; Andersen and Fagerhaug 2006; Burr and Owen 1996). These techniques help to address the target problem causes that many people value highly, which is important. However, as a weakness, none of these approaches fully protects the anonymi- 31

52 The ARCA method ty of people. Therefore, it could happen that root cause detection is perceived as witch hunting (Latino and Latino 2006). Furthermore, the detection of the target problem causes usually includes the creation of a cause-effect diagram (see Section 2.3.1). Various diagramming techniques have been introduced, and they can be divided into two subcategories including the list- and network-based structures. List-based structures include a fishbone diagram (Andersen and Fagerhaug 2006; Bjørnson, Wang, and Arisholm 2009; Stålhane 2004; Burnstein 2003; Stevenson 2005), a fault-tree diagram (Andersen and Fagerhaug 2006), a logic tree (Latino and Latino 2006), and a causal-factor chart (Rooney and Vanden Heuvel 2004). Network-based structures include a directed graph (Bjørnson, Wang, and Arisholm 2009) and a matrix diagram (Andersen and Fagerhaug 2006). Furthermore, simple cause lists and worksheets can also be used to organize the target problem causes (Ammerman 1998). Brainwriting followed by brainstorming in a meeting was included in the work practices of the ARCA method (see Table 4). Brainwriting provides an efficient way to make good use of all participants simultaneously. Instead, brainstorming helps to refine the findings of individuals into more concrete conclusions about the root causes. The prior literature (Kavadias and Sommer 2009) indicates that brainstorming attains better solutions when it is used with cross-functional problems, and brainwriting is better when it is used with complex problems. The problems of software projects are both complex and cross-functional (see Article I). Therefore, using these techniques together is reasonable. Furthermore, the use of the directed graph was also included in the work practices of the ARCA method. The directed graph solves the problem of duplicating cause statements (see Article I). Additionally, the use of the directed graph has been claimed as an effective technique for software project retrospectives (Bjørnson, Wang, and Arisholm 2009). Interviewing (Ammerman 1998) and questionnaires (Andersen and Fagerhaug 2006; Burr and Owen 1996) were excluded from the ARCA method. Interviewing would have required more effort than keeping a meeting, and the use of questionnaires would have steered the thinking of retrospective participants into some premade topics, potentially biasing the results. Furthermore, these techniques have not been recommended in the prior RCA methods of lightweight software project retrospectives Corrective action innovation Corrective action innovation is the final step of the RCA methods. The outcome includes corrective actions that are developed for the selected target problem causes. The selection of causes should emphasize the level of controllability. The prior literature included very little practical guidance while considering how to develop corrective actions. Corrective actions are usually developed in a meeting with a group of people (Andersen and Fagerhaug 2006; Card 1998; Leszak, Perry, and Stoll 2000; Jalote and Agrawal 2005; Grady 1996). Additionally, the use of brainstorming and brainwriting (Andersen and Fagerhaug 2006) are recommended. Interviewing has also been introduced as 32

53 The ARCA method an approach to develop corrective actions (Ammerman 1998). However, considering the differences between keeping a meeting and conducting separate interviews, the meeting probably increases the commitment of participants more than the separate interviews. Furthermore, problem-prevention frameworks (Andersen and Fagerhaug 2006) have been developed to view the solution space of problems from various perspectives. The frameworks include Systematic Inventive Thinking, the Theory of Inventive Problem Prevention, and the Six Thinking Hats (Andersen and Fagerhaug 2006). However, the frameworks are rather difficult to use, and more creative techniques should be used instead (Andersen and Fagerhaug 2006). Similarly to the step of root cause detection (see Section 4.1.2), the use of brainwriting combined with brainstorming was concluded as the most optimal work practice for the ARCA method. We also found it to be important to take into account the potential positive and negative effects of the developed corrective actions (Andersen and Fagerhaug 2006). 4.2 Overview of the ARCA method Figure 6 summarizes the ARCA method (see Article I). The method follows the common steps of prior RCA methods, and its work practices are based on analytical argumentation about the prior methods, discussed in Section 4.1. These steps and their work practices are summarized in the following sub-sections Step 1: Target problem detection The outcome of the first step of the ARCA method is a target problem and a list of named experts who are invited to an in-depth analysis of the target problem. This step includes a focus group meeting lasting approximately 60 minutes. In the meeting, the following issues should be brainstormed, justified, and documented: what is the target problem and why exactly is this problem important to prevent? Collect evidence Focus group meeting Define the target problem Invite experts Target problem detection Root cause detection Preliminary cause collection Anonymous inquiry Causal analysis workshop brainwrite, brainstorm, and draw a cause-effect diargam Root cause selection inquiry Corrective action workshop brainwrite, brainstorm, and consider positive and negative effects Document the results Corrective action innovation Figure 6. The overview of the ARCA method. 33

54 The ARCA method The experts who should analyse the target problem are also considered and selected in the meeting (four to ten experts). When selecting the experts, it is important to consider all relevant stakeholders of the target problem. For example, in the case of software project failures, these may include sales personnel, product managers, project managers, software developers, software testers, and software quality assurance staff Step 2: Root cause detection This is the second step of the ARCA method. After this step, the most important target problem causes are detected and evaluated. Anonymous and public approaches are both important in the detection of target problem causes. This step consists of two work phases: preliminary cause collection and causal analysis workshop. In preliminary cause collection, the facilitator of the ARCA method sends out an inquiry to the selected experts in order to collect the target problem causes. The inquiry should be confidential in order to create a trustworthy knowledge-sharing between the experts and facilitator. The inquiry forces the experts to consider the target problem in advance. The inquiry asks the experts to list at least five causes of the target problem. Thereafter, the target problem causes are organized into a cause-effect diagram by the facilitator, as presented in Figure 7. Using a software tool is recommended here. Causal analysis workshop is a meeting wherein the target problem causes are analysed in-depth. The meeting is prepared by the facilitator. A cause entity is defined as a cause and its sub-causes, which together form an entity that is reasonable to process together (Article I). By using the cause-effect diagram created in preliminary cause collection, the facilitator selects the cause entities being processed in the meeting. It should be noted that the cause entities could overlap. Processing a cause entity containing approximately ten causes takes about 40 minutes. In causal analysis workshop, the selected cause entities are extended by detecting new causes. The facilitator starts the meeting by presenting the target problem and its preliminary causes, including the selected cause entities for the experts. The meeting continues by collecting new causes for each selected cause entity one at a time. Each cause either deepens or widens a cause entity. The causes are collected in the following three phases: 1. The experts write down (brainwriting) causes on paper for five minutes (the cause-effect diagram is projected onto the wall) 2. Each expert introduces the causes and explains where they should be registered in the cause-effect diagram 3. The experts briefly discuss the target problem causes and try to brainstorm more causes and causal relationships After all cause entities have been processed, the experts analyse the causeeffect diagram as a whole. The facilitator leads the experts to identify essential target problem causes and to discuss their level of controllability and impact for the target problem. 34

55 The ARCA method Figure 7. The cause-effect diagram of the ARCA method (Article I) Step 3: Corrective action innovation This is the third step of the ARCA method. The outcome of this step is corrective actions addressing the most important target problem causes. The step includes two work phases: root cause selection and corrective action workshop. Root cause selection is the first work phase of corrective action innovation, and it aims to focus on the development of corrective actions into most feasible targets. The facilitator selects the target problem causes which are processed later in the corrective action workshop. First, the cause-effect diagram is sent to the experts. They are asked to propose target problem causes for which corrective actions should be developed. Additionally, they are asked to evaluate the level of impact on the target problem and the level of difficulty of developing corrective actions for each proposed cause. Thereafter, the facilitator uses his judgment combined with an analysis of the experts proposals in order to select four to six target problem causes for which the corrective actions will be developed. Each selected cause, including its sub-causes, is documented on an individual paper. Corrective action workshop is a meeting wherein corrective actions are developed, evaluated, and analysed. The meeting is initialized by the facilitator, who selects the participants to join the meeting. Ideally, the number of participants equals with the number of selected causes (four to six). Furthermore, the participants should be an aggregate of experts being as competent as possible at solving the selected causes. In a corrective action workshop lasting approximately 120 minutes, the selected causes are rotated through the participants. Each participant contributes, in turns lasting ten to fifteen minutes, to one se- 35

56 The ARCA method lected cause. The participants develop corrective actions by writing them down on paper. Additionally, they supplement and comment on the corrective actions introduced by other participants. Furthermore, after the corrective actions are developed, they are evaluated. Two attributes are used in the evaluation (scale 1-5): 1) impact on the target problem and 2) feasibility to implement. For each selected cause, the last participant evaluating the corrective actions calculates the sum of evaluations for each corrective action. Thereafter, the participants brainstorm improvements to the corrective action(s) that has/have the highest values in the levels of impact and feasibility Step 4: Documentation of the results The documentation of the results is the final step of the ARCA method. Such a step is not included in every prior RCA method, but in the ones introduced by Card (1998), Dingsøyr et al. (2001), Latino and Latino (2006), and Ammerman (1998). The final report should cover the target problem definition, including the related background information. This is important in order to communicate the aim of the analysis, including its limitations. The report should also contain the main parts of the cause-effect diagram finalized in the causal analysis workshop. Additionally, the report presents the corrective actions and their evaluations. The final report is an important source of information, as it can be used to justify why some specific process changes are needed and what corrective actions are relevant to consider. Additionally, the report could improve future analyses, where it could exacerbate preliminary cause collection and help to consider the cause entities for causal analysis workshop. The report could also help to consider the impact and feasibility of corrective actions. Finally, if shared among the company employees, the report could improve organizational learning. 36

57 ARCA-tool 5. ARCA-tool Software tools are commonly used in RCA to improve the in-depth analysis of problems. However, there are no prior studies on how they support conducting RCA in collocated and distributed software project retrospectives. This section starts with the results of systematic literature review comparison of prior RCA software tools. Thereafter, ARCA-tool, fulfilling the main weaknesses of the prior tools, is presented. These results are introduced in Article III and they are used to answer the second research question (see Figure 2). 5.1 Comparison of RCA software tools A systematic literature review of 35 prior RCA software tools was conducted in order to evaluate their feasibility for software project retrospectives. The evaluation considered the seven aspects important for conducting a computer facilitated RCA in synchronous software project retrospective meetings (Article III). These aspects are introduced in the following sub-sections. To conclude, the main weaknesses of the prior tools include: 1) lack of real-time collaboration, 2) lack of network structured cause-effect diagrams, and 3) lack of features for voting the RCA method outcome Ease of adoption The first aspect for comparison is the ease of adoption. Software teams rarely have time for retrospectives (Glass 2002), and therefore this is an important aspect. Web browser-based software tools outperform native client software in the ease of adoption. Web browser-based software tools do not require client installation and they can be used from various physical locations with different computers having different operating systems and hardware. Only four existing tools are web browser-based software Real-time collaboration The second aspect is the support for real-time collaboration. Global software engineering is an increasing trend in today s software business (Herbsleb and Moitra 2001), but through the distributed team members, it creates a major challenge for retrospectives. The team members cannot meet face-to-face. Thus, the RCA software tool has to support real-time collaboration over distributed sites in order to make it possible for the participants to contribute to the analysis as it takes place. Obviously this requires that the outcome of the 37

58 ARCA-tool tool stays in sync between the distributed sites. There are only six tools that fully support real-time collaboration Cause-effect diagramming The third aspect is the support for cause-effect diagramming. The core component of RCA is the analysis of the underlying causal structures of the target problem. In retrospectives, such an analysis is usually conducted by using a cause-effect diagram (Bjørnson, Wang, and Arisholm 2009; Dingsøyr 2005). The majority of the existing tools enable the creation of a cause-effect diagram. However, most of them support only tree-structured diagrams, whereas only three existing tools support the creation of network-structured diagrams Corrective action development The fourth aspect is the support for developing corrective actions for the causes of problems. The software tool should enable developing and linking the corrective actions to the related target problem causes. It seems that the majority of the existing tools fulfil this aspect Support for voting The fifth aspect is about the team commitment through voting. In order to focus the steps of root cause detection and corrective action innovation to the findings that the experts value the most, the software tool should support voting. This way the experts can focus their attention on the causes perceived as the most important. Respectively, they can collaboratively decide the corrective actions that should be implemented. This aspect is supported only in one of the existing tools Support for knowledge management The sixth aspect is the support for knowledge management. It has been claimed that retrospectives can be used to leverage knowledge from individuals to organizations (Dingsøyr 2005). Additionally, it has been claimed that an organizational learning system includes a global knowledge base that combines the knowledge (Lee, Courtney, and O'Keefe 1992). Thus, the RCA software tool should include a knowledge base and enable combining the findings of many retrospectives. Such an aspect is supported by the majority of the existing tools Costs The seventh aspect considers the costs of the tools. Only three of the existing tools are free to use, whereas most of the tools are subject to a fee. Thus, there are only a few open-source alternatives available. 38

59 ARCA-tool 5.2 Overview of ARCA-tool ARCA-tool is a browser-based software that uses a client-server architecture with push-and-pull technology. It solves the main weaknesses of prior RCA tools for software project retrospectives. The software supports distributed real-time collaboration including features for 1) collaborative cause-effect diagramming and 2) the development of embedded corrective actions to the causes of the target problem. The tool also supports knowledge management and organizational learning by enabling capturing, analysing, summarizing, and managing the outcome of one-to-many retrospectives. ARCA-tool was designed to be used in the retrospectives of software projects with the ARCA method. Additionally, the tool was required to fulfil the seven aspects important for conducting RCA in software project retrospectives, introduced in Section 5.1. This section presents how to use ARCA-tool with the ARCA method Initializing ARCA-tool In order to conduct the ARCA method, the facilitator initializes ARCA-tool by creating an RCA case, which is thereafter shared with the participants of the steps of target problem detection, root cause detection, and corrective action innovation. The participants join the case from their own computers through a TCP network connection. The process support for the different steps of the ARCA method is introduced in the following sub-sections. Figure 8 summarizes the key features of ARCA-tool embedded in a radial menu, which is activated when a user selects a cause in the diagram. The key features include: Thumb-up (=vote for this cause), Pencil (=edit this cause), Trashcan (=delete this cause), Light bulb (=create a corrective action), Arrow left (=link this cause to another existing cause), + sign (=create a cause that is linked to this cause), Ticket (=classify this cause). More details of the tool can be found in Article III. Figure 8. Screen view of ARCA-tool (Article III). 39

60 ARCA-tool Target problem detection ARCA-tool supports the step of target problem detection (see Section 4.2.1), which includes a focus group meeting. ARCA-tool can be used in the focus group to register the target problem and the motivation to prevent it. This documentation can be included in the cause-effect diagram of the case. The diagram can be shared with all relevant stakeholders representing the experts invited to the case. The tool sends the invitations automatically for the defined experts. Alternatively, the invitations can be shared manually by sending the case URL address Root cause detection ARCA-tool supports the step of root cause detection which includes the work phases of preliminary cause collection and causal analysis workshop (see Section 4.2.2). In the phase of preliminary cause collection, the tool allows the defined experts to contribute to the cause-effect diagram of the case before the causal analysis workshop meeting. Thus, the facilitator does not need to send a confidential to experts and thereafter organize the causes replied in the , as the causes are already organized to the diagram by the experts. Additionally, the tool protects the anonymity of the experts. Furthermore, ARCA-tool supports conducting causal analysis workshop with its features for distributed and collocated knowledge sharing. First, the detected causes can be written down to a cause-effect diagram by the facilitator, acting as a scribe. The diagram can be simultaneously projected on the wall in order to visualize the analysis outcome to the participants. Second, each participant can also contribute directly to the cause-effect diagram from their own computers. There the detected causes are immediately visible for other participants. Thus, they can also contribute to the findings of others in real-time. The workload of the facilitator also decreases since there is no need for a scribe during the meeting. Third, if combined with an online audio bridge, ARCAtool enables conducting the ARCA method as distributed. The geographically distributed experts can register, introduce, and discuss the target problem causes similarly than with collocated settings Corrective action innovation ARCA-tool enables the conducting of corrective action innovation (see Section 4.2.3), which includes the work phases of root cause selection and corrective action workshop. In ARCA-tool, the participants can vote the causes they perceive as important by liking the causes (see Points in Figure 8). The amount of likes for each cause is limited to +/-1 for the experts while remaining unlimited for the facilitator. This feature makes it possible for the facilitator to ask the experts to propose the causes they perceive as important to proceed further in the process improvement activities. Respectively, the facilitator can emphasize the causes that are selected to process improvement activities by using the voting feature. 40

61 ARCA-tool According to the ARCA method description, the corrective actions are developed by writing them down on paper and rotating them through the participants (Article I). In ARCA-tool, the corrective actions are developed by embedding them to the related causes. The tool does not include features in allocating the selected causes for one expert only. Instead, the tool makes it possible to contribute to any of the causes registered to the cause-effect diagram. The other participants cannot modify or comment on the corrective actions developed by others, but they can register a new corrective action refining the existing ones. Finally, the participants can vote corrective actions by using the liking feature of the tool. Instead, there are no specific forms for feasibility and impact evaluations The documentation of results The ARCA method ends with the documentation of results (see Section 4.2.4), which is also supported by ARCA-tool. While the facilitator combines the gained knowledge from the case, the tool makes it possible to save the outcome of the RCA case as a *.CSV file, which includes the causes, corrective actions, and their related votes. Additionally, ARCA-tool enables conducting further analyses of the retrospective outcome including the analysis of the cause types, process areas, and causal relationships of problems. These features promote consideration for what the problem causes are, where they occur, and why they occur (Article IV). Each cause can be classified into the type and process area dimensions. The user can use the default dimensions, introduced in Article IV, or develop their very own dimensions that are more feasible for their context of use. Thereafter, the tool provides statistics about the distributions of causes, regarding their status in the RCA case (detected causes, proposed causes, causes with elimination ideas), in both of these dimensions simultaneously as a table view, or separately as a pie chart view. Furthermore, the tool can draw a graph summarizing the relationships over the process areas. The tool can also be used to view the internal and external causes for a process area. These analyses can be included in the final report of the ARCA method. ARCA-tool supports organizational learning and knowledge management by providing features for monitoring the outcome of many RCA cases. As a limitation, the users can analyse only the outcome of the cases they have participated in. The tool can be used to manage, monitor, and analyse the outcome of an individual RCA case as well as the combination of many cases. The status of the detected causes (detected, elimination, won t fix, fixed) and corrective actions (idea, will be implemented, implemented, rejected) can be managed. Furthermore, the users can filter the outcome they are interested in to monitor (causes, corrective actions, and specific RCA cases). 41

62 Evaluation results 6. Evaluation results This section presents the evaluation results from the field studies and student experiment. These are summarized in Table 5 and introduced in detail in Sections 6.1 and 6.2. They answer the research questions RQ3 and RQ4. Furthermore, Section 6.3 presents the results of the multiple case study, which answers the research questions RQ5-RQ Evaluation of the ARCA method In comparison to the existing practices of the industrial cases, the ARCA method was perceived as efficient. Respectively, the method was perceived as easy to use. The detected causes were also perceived as accurate and they helped to develop high-quality corrective actions. The ARCA method was evaluated from different perspectives. Cases 1-4 (N=30) evaluated all steps and work practices of the method. Cases 5 (N=11) - 6 (N=5) and student experiment (N=51) evaluated only the step of root cause detection (ARCA ltd.). Cases 5-6 also evaluated the support of ARCA-tool for the ARCA method. Furthermore, the student experiment included the comparison of the number of detected causes between the ARCA method and control group. Table 5. Summary of evaluation results regarding the ARCA method and ARCA-tool. Evaluation Field Studies (N=46) Experiment (N=51) Case Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 ARCA Control Method ARCA ARCA ARCA ARCA ARCA * ARCA * ARCA * ARCA * Root cause detection step Ease of use Mod Mod High Mod High High High High Efficiency High High High Mod High High Increase# Decrease# Accuracy High High High High High High - - Corrective action innovation step Ease of use High High High High Efficiency High High High High ARCA- tool Usefulness High High - - Ease of use High High - - Interpretation scale (based on questionnaires): Low (avg. < 4), Mod (avg. >4&<5), High (avg. > 5), * The limited ARCA method (ARCA ltd.), # Based on the number of detected causes 42

63 Evaluation results Evaluation of the ARCA method ease of use Table 6 summarizes the results from the questionnaires regarding the ease of use of the ARCA method. The interviews consolidate these results. We can see from the table that the ARCA method was perceived as generally easy to use. On the other hand, comparison between the cases reveals that the perceived ease of use increased in Cases 5-6 and the student experiment in contrast to Cases 1-4. In Cases 1-4, the participants evaluated in the questionnaires that the ease of use of the corrective action innovation step is high (avg. > 5) and the root cause detection step is moderate (4< avg. <5). Respectively, in the case interviews (see Article I), the ARCA method was generally experienced as easy to use. This was a common opinion over the key representatives of each case. On the other hand, organizing the detected causes was noted to be a challenging task. It was also said that the assistance of the researchers made the ARCA method unnaturally easy to use. In Cases 5-6, the participants evaluated in the questionnaires that the ease of use of the root cause detection step is high. They evaluated the easiness to collect causes with high values and the easiness to detect root causes with moderate to high values (see Table 6). Respectively, the interviews indicated that the participants perceived the method as simple and intuitive (see Article III). It was also noted that the perceived difficulty of the analysis is dependent on the number of the detected causes. Furthermore, in Case 6, the participants evaluated that the ease of use of the ARCA method makes an improvement over their existing practices (see Article III). In the student experiment, similar results regarding the ease of use were found. Regarding the results from questionnaires, the students perceived that the ease of use of collecting the target problem causes is high (see Table 6), but they also perceived that the difficulty of detecting the problem causes is Table 6. Evaluation results regarding the ease of use of the ARCA method. Ease of use Field Studies (N=46) Experiment (N=51) Case Case 1 Case 2 Case 3 Case 4 Case 5* Case 6* ARCA Control Root cause detection step Cause collection avg. std. Cause detection avg. std High^ 5.9 Mod Mod High Corrective action innovation step Dev. method avg. std. High High High Mod High High^ High Mod High High 5.6 Scale: 1=very low, 4=moderate, 7=very high, * normalized scale 1-7 (the original scale was 1-5), ^combined results from three teams

64 Evaluation results high (see Article V). Furthermore, the students compared the cause-effect diagram of the ARCA method (see Section 3.6.2) with the list-of-causes technique (control group). Considering the results from the comparison, most of the students evaluated that the cause-effect diagram of the ARCA method is a good technique to organize the causes (Median=6). Instead, they evaluated that the list-of-causes technique is only somewhat good (Median=5). The difference between these two techniques is statistically significant (p=0.001) Evaluation of the ARCA method cost-efficiency Table 7 summarizes the results from the questionnaires regarding the perceived cost-efficiency of the ARCA method. We can see from the table that the ARCA method was perceived as cost-efficient at each case (avg. of usefulness & efficiency are both >4). The interviews consolidate these results. In Cases 1-4, the results from questionnaires indicate that the steps of root cause detection and corrective action innovation are both useful and efficient. Respectively, the results from interviews indicate that the ARCA method was perceived as cost-efficient (see Article I). The interviews revealed that the key representatives experienced that their companies should adopt the ARCA method. They also perceived that the case results were beneficial in contrast to the effort used. Additionally, they were not able to name any other method that could reach equally advantageous results with lower costs. In Cases 5-6, the limited ARCA method was perceived as cost-efficient. In Case 5, the limited ARCA method was used previously, which indicates that the method was already found to be feasible and applicable to software project retrospectives. Respectively, in Case 6, wherein the limited ARCA method was compared with the existing practices, the participants evaluated in the questionnaires that the cost-efficiency of the method is high. Furthermore, regarding the results from interviews, the participants from both cases experienced that the structured approach of the limited ARCA method is one of its advantages. They also experienced that the method helps to detect the causes of problems. In the student experiment, the participants evaluated that the limited ARCA method is useful. They also evaluated that the cost-efficiency of the method is high. Additionally, the statistical analyses on the method outcome indicate that the limited ARCA method slightly increased the method effectiveness (p=0.065, Cohen s d=0.57). It also increased the number of hub causes (p=0.010, Cohen s d=1.42). Furthermore, the group interviews with students revealed concepts supporting the use of the cause-effect diagram in the ARCA method. The students perceived that in contrast to the control group, the cause-effect diagram of the limited ARCA method helped to outline how the causes are related to one another. Respectively, the visual structure of the cause-effect diagram was perceived as feasible for RCA. The cause-effect diagram was also perceived as a visually easier technique to navigate the detected causes. The students claimed that the cause-effect diagram helped to focus and process the detected causes systematically. The only argument that supported the control group is the high readability of the list-of-causes technique. 44

65 Evaluation results Table 7. Evaluation results regarding the cost-efficiency of the ARCA method. Cost-efficiency Field Studies (N=46) Experiment (N=51) Case Case 1 Case 2 Case 3 Case 4 Case 5 Case 6* ARCA Control Root cause detection step Usefulness avg. std. Efficiency avg. std. Effectiveness# avg. std. High High High High High High High Mod High High High Increase Corrective action innovation step Usefulness avg. std. Efficiency avg. std. Mod High High High High High High High High High Decrease Scale: 1=very low, 4=moderate, 7=very high, * normalized scale 1-7 (the original scale was 1-5), # Effectiveness indicates the number of detected causes Evaluation of the ARCA method outcome The outcome of the ARCA method includes the causes of the target problem and the related corrective actions. Table 8 summarizes the results from the questionnaires regarding the perceived correctness of the detected causes and the perceived impact and feasibility of the developed corrective actions. The results from the interviews consolidate these results. Regarding the correctness of detect causes, the participants of Cases 1-4 perceived that the detected target problem causes were correct (see Table 8). Additionally, the participants evaluated that feasible corrective actions that have a high impact on the target problem were developed. Respectively, the interviews with the key representatives indicate that significant root causes were detected with respect to the target problems (see Article I). Most of the key representatives also believed that, if implemented, the corrective actions would have a high impact on the prevention of the target problem. As an exception, it was claimed in Case 2 that the corrective actions don t prevent the target problem, but they assist the company to make improvements in their processes. Furthermore, in Cases 5-6, the participants evaluated the correctness and impact of the detected causes with high values. Respectively, regarding the results from interviews, they perceived that correct target problem causes were detected (see Article III). Figure 9 presents a scatter chart of the developed corrective actions in Cases 1-4. A high quality corrective action is highly feasible and equally effective (Article I). In Cases 1-4, each corrective action of each case was evaluated by 45

66 Evaluation results the case participants. The evaluations were conducted by using a symmetric ordinal scale from one to five (1=low, 2, 3, 4, 5=high). We can see from Figure 9 that the share of high-impact (avg. >= 3) corrective actions was larger than the share of low-impact (avg. < 3) corrective actions in each case. Instead, the share of high-feasibility (avg. >= 3) corrective actions was larger than the share of low-feasibility (avg. < 3) corrective actions in Cases 1 and 4 only. It is probably easier to develop high-impact corrective actions than to make them feasible. Despite the difficulties to develop high-quality corrective actions (avg. impact & feasibility are both >= 3), such corrective actions were developed in each case, as can be seen from the figure. Table 8. Evaluation results regarding the outcome of the ARCA method. Outcome evaluation Field Studies (N=46) Case Case 1 Case 2 Case 3 Case 4 Case 5* Case 6* Root cause detection Correctness of causes avg. std. Impact of causes avg. std. Corrective action innovation Impact of ideas avg. std. Feasibility of ideas avg. std. High High High High High^ High^ 5.5 High High High Mod High High High High High High 5.3 Scale: 1=very low, 4=moderate, 7=very high, * normalized scale 1-7 (the original scale was 1-5), ^combined results from three teams 1.2 Figure 9. Scatter chart of the perceived impact and feasibility of individual corrective actions (small marks) and their averages in the cases (large marks). 46

67 Evaluation results 6.2 Evaluation of ARCA-tool This section presents the empirical results regarding the evaluations of ARCAtool. The tool was evaluated in Cases 5-6 only (article III). Table 9 summarizes the results from the questionnaires regarding the perceived ease of use and usefulness of ARCA-tool. Our results indicate that ARCA-tool increases the cost-efficiency of the limited ARCA method. Additionally, the tool is perceived as essential when the ARCA method is conducted with geographically distributed settings Evaluation of the ease of use of ARCA-tool ARCA-tool was perceived as easy to use and learn in Cases 5-6. In Case 5, the participants evaluated the ease of use and learnability of the tool with very high values (see Table 9). In Case 6, the participants evaluated the ease of use and learnability of the tool with high values. The values given in Case 6, however, were lower than in Case 5. Case 5 had used the tool previously, whereas the tool was new to the participants of Case 6; a difference between the cases could explain the differences in the evaluations. The results from the interviews consolidate the results from questionnaires (see Article III). In Case 5, the participants experienced that the tool makes it easier to visualize the outcome of the root cause detection step, i.e. the causes of the target problem (the term retrospective was used in Article III to refer to this step). In Case 6, the tool was characterized as intuitive and relatively easy to use. Additionally, the interviews at Case 6 indicate that it is positive that only the necessary features are included in the tool. It was also claimed in the interviews of Case 6 that the perceived difficulty of using ARCA-tool in the step of root cause detection correlates with the number of detected causes. The Table 9. Evaluation results regarding ARCA-tool. ARCA-tool Field studies (N=16) Case Case 5* Case 6* Team 1 (N=3) Team 2 (N=5) Team 3 (N=3) Team 4 (N=5) Ease of use Very high High Very High High avg std Learnability High High Very High High avg std Cost-efficiency High avg. 5.6 std. 1.4 Usefulness High High High High avg std Scale: 1=very low, 4=moderate, 7=very high, * normalized scale 1-7 (the original scale was 1-5) 47

68 Evaluation results interview results from Case 6 also indicate that the tool could be improved. One participant claimed that when the causes of the target problem are organized, the visualization of cause groups would be important and it is currently difficult with the tool Evaluation of the usefulness of ARCA-tool In both cases, the results from the questionnaires indicated that the tool helped to detect the causes of problems. In Case 5, the participants evaluated that the efficiency of the step of root cause detection step would be lower without the tool (see Article III). Respectively, the participants of Case 6 evaluated that, in comparison to their previous practices, the cost efficiency of the tool is high (see Table 9). Additionally, the participants of both cases evaluated that the assistance of the tool for cause detection is significant (see Article III). Regarding the results from the interviews, it seems to be a common opinion that the tool improves the limited ARCA method (see Article III). Additionally, the results indicate that the tool is essential in geographically distributed settings. Furthermore, the interviews of Case 5 indicate that in face-to-face settings, the tool can be substituted with a whiteboard and Post-it notes, an approach introduced by Stålhane et al. (2003). However, the efficiency of analysis would then decrease (see Article III). 6.3 The cause types, process areas, and their relationships This section summarizes the case study results regarding the outcome of the ARCA method in Cases 1-4 (articles II and IV). The results indicate that in a case of software project failure, the outcome of the ARCA method helps to explain what happened, where, and why Process areas The ARCA method outcome included causes from five process areas. These included Management, Sales & Requirements, Implementation, Software Testing, and Release & Deployment. A total of 97.8 % to 100% of the detected causes were related to the process areas at each case. The process areas of the detected causes are somewhat similar to the ones found in software engineering process literature, e.g. RUP (Jacobson, Booch, and Rumbaugh 1998) and the waterfall model (Royce 1970). This means that most of the detected causes indicated commonly accepted development process areas wherein they occurred Cause types Table 10 summarizes the cause types and their sub-types. We can see from the table that the ARCA method outcome included causes with four types. These included People, Tasks, Methods, and Environment. The cause types are similar to the ones introduced in the literature of the causes of software 48

69 Evaluation results Table 10. Summary of cause types and their sub-types. Type Sub-type Examples People Instructions & Experience Lack of instructions when and how to verify. Values & Responsibilities Cooperation Company Policies People do not care if the number of bugs increases. Miscommunication between the developers and testers. New issues are not registered. Tasks Task Output Requirements are insufficient. Task Difficulty Task Priority It is difficult to create a comprehensive specification. The priority of defect fixing is too low. Methods Work Practices Implementation is done directly in the test environment. Process Monitoring The process for software testing is missing. An opaque view of the quality during the development work. Environment Existing Product The structure of the product has decayed during the past. Resources & Schedules Tools Customers & Users Lack of time to report defects specific enough. The version control system does not support customization. Importance for the customers is not well defined. engineering problems (see Article IV). Furthermore, the detected causes were also expressed with more details than these coarse-grained cause types. The sub-categorization of the detected causes revealed a total of fourteen different sub-types divided into three to four sub-types for each cause type. The subtypes are also in line with the findings of prior studies (McLeod and Mac- Donell 2011) Similarities of the causes of failures The causes of project failures were different in terms of process areas, but similar in terms of cause types. The distributions of causes in process areas were case dependent, which means that regarding the process areas of the detected causes, the failures were different. Instead, regarding the cause types, the cases were similar. In each case, the cause types were equally distributed into People (avg. 29%, std. 6%), Tasks (avg. 26%, std. 4%), Methods (avg. 22%, std. 3%), and Environment (avg. 22%, std. 5%). All of these cause types were also frequent in all process areas. The cases were also similar in terms of seven sub-types, covering 81% of all detected causes on average (std. 2%). These sub-types included Instructions & Experience (avg. 16%, std. 4%), Values & Responsibilities (avg. 8%, std. 6%), Work Practices (avg. 16%, std. 4%), Task Output (avg. 16%, std. 2%), Task Difficulty (avg. 7%, std. 3%), Existing Product (avg. 7%, std. 5%), and Resources & Schedules (avg. 9%, std. 4%). Considering the bridge causes (see Section 3.7.2), the commonality between the cases was that 1) the bridge causes were frequent in the detected causes (avg. 50%) and 2) the company experts (avg. 56%) and key representatives (avg. 68%) perceived them as feasible targets for process improvement activities. This means that the company people perceived it important to control the 49

70 Evaluation results causes of software project failures, which are related to possible causal relationships over the process areas Common causal relationships bridging the process areas Similar causes were related to similar causal relationships. Figure 10 summarizes the common causes of project failures and their related causal relationships bridging the process areas together. The term common refers to a cause that occurred in at least three of our four cases. Such a definition is in line with prior studies (e.g. Cerpa and Verner 2009; Verner, Sampson, and Cerpa 2008). The common causal relationships bridging the process areas included Weak Task Backlog, Lack of Cooperation, and Lack of Software Testing Resources. Weak Task Backlog bridged the Sales & Requirements, Management, Implementation, and Software Testing process areas (this process area was only relevant in two cases). Lack of Cooperation bridged the Sales & Requirements, Implementation, and Software Testing process areas. Lack of Software Testing Resources bridged the Management and Software Testing process areas. Furthermore, these three common causal relationships were also interconnected to one another. The common causal relationships alone did not cover any of the cases. When the case specific results were compared, it was found that each failure was also caused by different, case-specific causes that were interconnected to one another differently. Additionally, the common causes were neither proposed nor selected at every case, and other than common causes were proposed at every case. This means that the software project failures could not have been explained by using the common causal relationships alone. The common causal relationships could only improve the knowledge related to the possible causes of software project failures. Figure 10. Common causes and bridging causal relationships found in at least three out of four cases (Article IV) (Bolded text/line indicates the selected causes; Normal text/line indicates the subcauses of the selected causes; Dashed line/grey text indicates that the cause was neither a selected cause nor a sub-cause; Lines with arrows and text between the process areas indicate the direction of causal relationships interconnecting the process areas). 50

71 Discussion 7. Discussion This section answers the research questions and discusses their main implications. Additionally, the section presents the main threats to validity. 7.1 Lightweight RCA method and software tool Literature review about the prior RCA methods and the systematic investigation of the environment of use was utilized to create the ARCA method and ARCA-tool. These results contribute to the first research problem: How can RCA be conducted in collocated and distributed software project retrospectives? Two research questions were stated for this research problem and they are answered next Common steps of RCA methods and their work practices Section 4.1 summarizes the synthesis of the common steps and work practices of prior RCA methods. Section 4.2 presents the ARCA method. These results answer the first research question discussed below. RQ1: What are the common steps of RCA methods, and how are they to be conducted? The concrete work practices of RCA are fairly little studied in the context of software project retrospectives. Therefore, synthetizing the steps of prior RCA methods and their work practices (see Section 4.1) was an important contribution to the prior studies (see Section 2.3.1). Respectively, the ARCA method (see Section 4.2) is an important contribution to the prior literature as it concretizes how to conduct lightweight RCA over the common steps of prior methods including the steps of target problem detection, root cause detection, and corrective action innovation. Additionally, the ARCA method makes a good starting point for the industrial evaluation, as it provides measurable RCA construction to increase the comparability of the evaluation results over the different cases. The main difference between the prior RCA methods is their different work practices in the step of target problem detection. Problem sampling has been commonly used in large organizations, whereas it requires too much effort in order to be feasible for SME organizations. Instead, brainstorming in a meeting has been introduced as a feasible approach to SME organizations, whereas it has not been recommended for large organizations. Regardless of the organization size, the used work practices should reveal actual problems instead of 51

72 Discussion subjective opinions (Bjarnason and Regnell 2012). In SME organizations, it could be more cost-efficient to detect an actual target problem by using brainstorming in a meeting than by using problem sampling. Therefore, such an approach was included to the ARCA method. Instead, the situation might be opposite in large organizations. The need for problem sampling, including defect sampling and project surveys, could increase among the increasing number of employees and organizational complexity. Furthermore, the prior RCA methods are mostly similar in the step of root cause detection, where the causes of the target problem are analysed in-depth. The ARCA method follows the prior methods by using brainstorming and brainwriting in a causal analysis meeting in order to create a cause-effect diagram of the target problem causes (Card 1998; Bjørnson, Wang, and Arisholm 2009). Instead, the ARCA method differs from the prior methods by protecting the anonymity of participants. Most of the prior RCA methods also include the step of corrective action innovation, which develops action proposals for the most controllable and important root causes. The action proposals are usually developed in a meeting with a group of people (Andersen and Fagerhaug 2006; Card 1998; Leszak, Perry, and Stoll 2000; Jalote and Agrawal 2005; Grady 1996). Brainstorming and brainwriting (Andersen and Fagerhaug 2006) have been presented as useful work practices, which were also included in the ARCA method. Considering the RCA work practices, it becomes reasonable to claim that the outcome of RCA is only a reflection of expert knowledge instead of true reality. The RCA methods are highly dependent on data investigation techniques including brainstorming, brainwriting, and interviewing. These techniques are often used at every step of RCA methods. Therefore, it is possible that the use of RCA results in wrong conclusions and inaccurate corrective actions Software tools for the RCA of retrospectives Section 5.1 summarizes the comparison results of the prior RCA software tools and Section 5.2 introduces ARCA-tool. Together, these results contribute to the second research question discussed below. RQ2: What software tools for RCA are introduced, and how do they support software project retrospectives? Software tools could help to improve the RCA of software project retrospectives. We found that the use of Post-it notes and a whiteboard during RCA (Stålhane et al. 2003; Bjørnson, Wang, and Arisholm 2009) should be substituted with a monitor and software tool (see articles I and III). We also found the challenge of conducting RCA in distributed software project retrospectives due to the lack of real-time collaboration tool support (see Article III). Therefore, developing ARCA-tool was reasonable. There are at least seven important aspects that should be considered while evaluating the software tool for RCA (see Section 5.1). These aspects can be divided into 1) technical features and 2) features for RCA. Considering the technical features, the RCA tool should support real-time collaboration. Addi- 52

73 Discussion tionally, the tool should be easy to adopt. Considering the RCA features, the software tool should support cause-effect diagramming, corrective action development, voting, and knowledge management. A total of 35 software tools for RCA were found by using a systematic literature review. Regarding the comparison results, it seems that the prior RCA tools support software project retrospectives inadequately. The software tools for RCA include mostly proprietary native client software, which are developed for an individual analyst who investigates target problems by using interviewing techniques. 7.2 Perceived ease of use and cost-efficiency Field studies combined with the student experiment were used to evaluate the ease of use and cost-efficiency of the ARCA method and ARCA-tool. These results contribute to the second research problem: Is RCA perceived as efficient and easy to use in software project retrospectives? Two research questions were stated for this research problem and they are answered next Ease of use and cost-efficiency of the ARCA method The ARCA method was evaluated with target problems at different company levels starting from the company-level problems of software project failures and ending with the team-level problems of individual software development teams. Additionally, the method was evaluated in collocated and distributed retrospectives. The evaluation results regarding the ARCA method are summarized in Section 6.1, and they contribute to the third research question discussed below. RQ3: Is the ARCA method perceived as efficient and easy to use for analysing software engineering problems in software project retrospectives? Regarding the evaluation results, the ARCA method was perceived as costefficient and easy to use. This was the case in collocated and distributed software project retrospectives, which covered analyses between the top-level target problems (Article I) and team-level target problems (articles III and V). In each case, the effort used was perceived as suitable in terms of the output of the method. The detected causes were also experienced as correct in contrast to the target problems. Additionally, high quality corrective actions were developed in each case where the step of corrective action innovation was conducted (Cases 1-4). Furthermore, in each case, the method was perceived as useful. These evaluation results indicate that RCA is an important part of software project retrospectives, as also indicated in the prior studies on post-mortem reviews (Stålhane et al. 2003; Bjørnson, Wang, and Arisholm 2009; Dingsøyr 2005; Collier, DeMarco, and Fearey 1996) and defect causal analysis (Card 1998; Leszak, Perry, and Stoll 2000; Jalote and Agrawal 2005; Gupta et al. 2008; Grady 1996; Kalinowski, Travassos, and Card 2008; Jacobs et al. 2005; Nakashima et al. 1999). In Cases 1-4 and 6, the existing retrospective practices 53

74 Discussion did not include RCA, but they did include the detection of problems and development of corrective actions. In the existing practices, the structured investigation of the underlying causes of problems, i.e. RCA, was substituted with informal discussions about problems (articles I and III). Respectively, causeeffect diagrams were not used to register the findings of software project retrospectives. When the ARCA method was used in the cases, the participants perceived it as cost-efficient and feasible for their needs. There are at least three hypotheses as to why the ARCA method improved the existing practices. First, the structured investigation approach of the ARCA method decreased the level of informality by providing process structure (Dennis et al. 1997). Second, the use of brainwriting in order to detect the causes of problems (Andersen and Fagerhaug 2006) decreased the problem of dominating team members who speak over the others (Article III). Third, the visual power of the cause-effect diagram decreased memory bias (Von Zedtwitz 2002) by helping the participants to remember all relevant findings and outline them as a whole (Eden 2004). Considering the three hypotheses listed above, the results indicate that the structured investigation approach is the key for successful software project retrospective, which can be additionally improved by using brainwriting and the cause-effect diagram. In the student experiment, the perceived usefulness of the ARCA method was not significantly dependent on the use of the causeeffect diagram. However, the detailed comparison of the method outcome and the analysis of the perceptions of participants revealed that the cause-effect diagram was a more optimal technique for visualizing the outcome of RCA than using the list-of-causes (see Article V). Respectively, Cases 5-6 indicated that ARCA-tool, which uses the cause-effect diagram of the ARCA method, improves retrospectives (see Article III). However, the cases also revealed that the structured approach of RCA, including the systematic focus on target problem causes, is the most important component of the analysis. Furthermore, the results from Case 2 indicated that brainwriting combined with brainstorming is a better approach for developing action proposals than using brainstorming only Improving the ARCA method with ARCA-tool ARCA-tool was evaluated with collocated (Case 5) and distributed (Case 6) software project retrospectives in order to study its designed support for the ARCA method (see Section 5.2.). The evaluation results regarding ARCA-tool are introduced in Section 6.2 and they contribute to the fourth research question discussed below. RQ4: Is the developed ARCA-tool perceived as useful and easy to use in software project retrospectives applying the developed RCA method? ARCA-tool was perceived as useful and it improves the limited ARCA method (see Section 3.5.2) in collocated and distributed software project retrospectives. The participants of Cases 5-6 perceived that ARCA-tool increases the ef- 54

75 Discussion ficiency of RCA. They also experienced that the tool is essential in distributed retrospectives. Furthermore, ARCA-tool was perceived as easy to use. There are at least five hypotheses why ARCA-tool was perceived as useful. First, the tool enables conducting the ARCA method (see Section 5.2.), which was concluded as the key for successful software project retrospectives (see Article III). Second, the tool improves the in-depth analysis by enabling realtime visualization and simultaneous editing access to the retrospective outcome. Third, the RCA facilitator does not need to act as a scribe (see Section 4.2.2). Fourth, the tool enables conducting the ARCA method as distributed. Fifth, the tool protects the anonymity of retrospective participants. Considering the above hypotheses, I believe that the most important success factors of the tool are that it enables conducting the ARCA method and it provides real-time visualization and simultaneous editing access to the retrospective outcome. These success factors enable conducting RCA in distributed software project retrospectives, a research problem introduced by Stålhane et al. (2003). Additionally, possible post-retrospective analyses (see articles II and IV) become easier since the retrospective findings are already electronically registered (see Section 4.2.4). Furthermore, the efficiency of analysis increases since the participants register their findings directly to the electronic cause-effect diagram (see Figure 8) instead of writing down, often illegible, Post-it notes and pasting them on a whiteboard to represent the cause-effect diagram of the target problem (Stålhane et al. 2003; Bjørnson, Wang, and Arisholm 2009). 7.3 The outcome of RCA with software project failures A multiple case study about the ARCA method outcome was conducted in four cases of software project failures in order to evaluate whether the outcome of the ARCA method helps to express what happens, where the failures occur, and why the failures occur. The case study results contribute to the third research problem: Does the outcome of RCA indicate how the causes of software project failures are interconnected? Three research questions were stated for this research problem and they are answered next Frequently used process areas and cause types The outcome of the ARCA method was analysed in Cases 1-4 in order to explain what caused software project failures at each case and where in the software development processes the causes occurred (articles II and IV). Section introduces the process areas detected from the ARCA method outcome, expressing where the causes of failures occurred. Respectively, Section introduces the types of causes expressing what the individual causes of failures were. Furthermore, Section considers the similarities between the cases. Together these results contribute to the fifth research question discussed below. 55

76 Discussion RQ5: Which process areas and cause types were frequently used in RCA to explain software project failures? Regarding the outcome of the ARCA method, the software project failures were commonly influenced by the problems of the management, sales & requirements, implementation, and software testing work. These process areas are similar to the ones found in software engineering process literature. Additionally, the prior studies of software project failures have emphasized these process areas (see Article IV). Furthermore, the causes of failures were commonly related to the cause types of People, Tasks, Methods, and Environment. The causes were also commonly related to the sub-types of Instructions & Experience, Values & Responsibility, Work Practices, Task Output, Task Difficulty, Existing Product, Cooperation, and Resources & Schedules. Comparison of these cause types to the results of prior studies (see Article IV) indicates that these findings are also in line with others (McLeod and MacDonell 2011). Due to the high similarity between the case study results and prior studies, I conclude that using the ARCA method with software project failures in SME organizations helps to express where the causes of software project failures occur and what they are. In the ARCA method, the perceived causes of failures that are registered to the cause-effect diagram are expressed with rich information about their types and related process areas. Thus, analysing the outcome of the ARCA method could help to conclude what happened and where The role of bridge causes The interconnections between the causes of software project failures were studied in order to evaluate whether the outcome of the ARCA method helps to express how the causes of software project failures affect one another. Section presents the common causal relationships bridging the process areas. Furthermore, Article IV includes the in-depth analysis of the perceived causal relationships between the process areas and individual causes of software project failures at each case. Together these results contribute to the sixth research question discussed below. RQ6: What causal relationships bridge the process areas? The term bridge cause refers to a cause-effect relationship for which the process area of the effect is different than the one of its cause. Regarding the AR- CA method outcome, a high number of perceived causes of software project failures in implementation, software testing, and release & deployment was bridged to the output of management work and sales & requirements. This finding consolidates the prior studies by indicating that software project failures are caused by insufficient management work and sales & requirements (see Article IV). Furthermore, the ARCA method outcome indicated that solving the problems in the management work and sales & requirements requires improvements in the implementation work and software testing too. This finding was logically compiled (see Article IV). The causal relationships between the process areas were multidirectional including three common mechanisms, bridging the process areas together. These mechanisms included Lack of Co- 56

77 Discussion operation, Weak Task Backlog, and Lack of Software Testing Resources (see Figure 10). Furthermore, the ARCA method outcome expressed the perceived causal relationships of individual problems local to process areas (see Article IV), and these causal relationships were also interconnected to the bridge causes. Thus, the outcome of the ARCA method helped to explain, not only the bridge causes, but the whole network of causes and effects, starting from the separated problems of software development process areas and ending with a perceived causal model of software project failures at each case. Due to the logical relationships between the detected causes and their process areas (see Article IV), I conclude that the ARCA method helps to express how the perceived causes of software project failures are related to one another. Considering a software project failure as a problem to follow the law of causality (see Section 2.1), as indicated by Cerpa and Verner (2009), controlling the individual problems of software projects becomes important during the project. This requires knowledge about the relationships between the individual problems, i.e., the interconnections between the causes of failures. Therefore, RCA is an important part of software project retrospectives (see Figure 1). It could help to explain why the individual problems of software projects occur. Additionally, it could help to explain how these individual problems form the software project failure together Feasible targets for process improvement activities The perceptions of practitioners and senior management on the causes for process improvement activities were studied in order to consider the importance of detecting perceived causal relationships between the causes of software project failures in software project retrospectives. The prior studies on software project failures have claimed to be important to analyse how the causes of failures are related, however, these claims are not evaluated in practice (see Article IV). Section presents results on the perceived feasibility of the bridge causes for process improvement activities, and Article IV extends the results to cover an analysis of the related process areas and cause types. These results contribute to the seventh research question discussed below. RQ7: Do the causes perceived as feasible targets for process improvement differ from the other detected causes, and if so, how? The case study results indicate that the causes of software project failures, perceived as feasible targets for process improvement activities, are often related to the perceived causal relationships interconnecting the process areas. This means that in software project retrospectives, revealing the interconnections between the individual problems of software projects is not only theoretically reasonable (see Section 2.1), but also practically important. It leads to an understanding about the problems between software development process areas, which are important to consider in the process improvement activities (see Article IV). These results consolidate the evaluation results on the high-perceived usefulness of the ARCA method for software project retrospectives. 57

78 Discussion 7.4 Implications This thesis introduced how to use RCA in software project retrospectives and how the participants perceived its ease of use and cost-efficiency in SME organizations. It seems that software project retrospectives should use RCA. All of the evaluation results indicate that RCA is a useful part of retrospectives. These findings consolidate the prior studies that present RCA as a part of collocated retrospectives (Stålhane et al. 2003; Bjørnson, Wang, and Arisholm 2009). Additionally, our results extend the prior studies by showing that RCA is also a good approach for distributed retrospectives following the agile methods (Schwaber and Sutherland 2011). Furthermore, the results of this thesis indicate that the focus of process improvement effort should be in the perceived causal relationships of problems. The theory of causality (see Section 2.1) consolidates this assumption. Additionally, the claim is consolidated by Card (1998) who introduced the effect of using RCA in two software organizations, and caused a total of 50% decrease in defect rates. Card s study includes significant evidence in the effect of corrective actions developed by using RCA. Together, these studies consolidate the high applicability of RCA for software process improvement activities. Finally, there are many software engineering problems that could be considered with RCA, but they are not reported, e.g. requirement faults (Gursimran and Jeffrey 2009). Therefore, alternative work practices for the step of target problem detection should be considered. Most of the prior methods have used problem sampling, which is infeasible for unreported problems. Our results indicate that in SME organizations, problem sampling could be substituted with a focus group meeting, which makes the RCA method as lightweight and adaptable for different target problems. Such an approach could be feasible in large organizations too. However, future work is needed to consolidate this assumption. 7.5 Evaluation of the research This section discusses the main threats to the study results. The discussion is divided into four perspectives of validity (Runeson and Höst 2008) including the construct validity, the internal validity, the external validity, and the reliability. Detailed discussion about the threats to validity can be found in the publications Construct validity Construct validity reflects the validity of research methods used to collect the research data and draw out the conclusions regarding the research questions (Runeson and Höst 2008). The research methods used in this thesis follow the methods recommended in the framework of design science (Hevner et al. 2004) including literature reviews, field studies, multiple case studies, and controlled experiments. 58

79 Discussion The literature review about the prior RCA methods (RQ1) was structural; however, it was conducted semi-systematically (see Article I). We used predefined search words and two alternative search engines (Scopus and Google). Unfortunately, we did not keep an accurate record on the literature that we excluded. Therefore, evaluating the coverage of the review is difficult. Additionally, the list of search words was created based on our initial understanding about the relevant key words of RCA. These included RCA, root cause analysis, DCA, defect causal analysis, defect analysis, defect prevention, and problem prevention. This list could have been extended with search terms including retrospective, postmortem, and post-project review. These search terms were used during the latter parts of this research work to search for additional background literature (articles III and V). The found papers did not extend the set of prior RCA methods any further, which indicates that the coverage of the literature review was sufficient to make the synthesis of prior RCA methods and to develop the ARCA method. The literature review about the prior RCA software tools (RQ2) was structural and systematic. However, the review was limited to the extensive number of hits. Respectively, the review was limited with the available information. Furthermore, the review was limited with the search term: root cause analysis software. The field study methods used to evaluate the ARCA method (RQ3) and AR- CA-tool (RQ4) in the industrial cases and controlled student experiment creates a threat for construct validity regarding the reliability of human input. The weakness is that the evaluation results are mostly dependent on the perceptions of participants. Instead, the strength of the study is that the ARCA method and ARCA-tool were evaluated from various perspectives including the individual work practices and the retrospective outcome. Additionally, the evaluation was replicated in many different retrospective contexts and in different companies. Furthermore, the evaluation was replicated in a student experiment. The threats to the construct validity regarding the multiple case study (RQ5- RQ7) are related to the ARCA method outcome. The outcome of RCA has been questioned, because of its high dependency on human factors (Ayad 2010). Regarding the case evaluations including interviews (see articles I and III), questionnaires (see articles I and III), and the method outcome (see Article IV), this risk is not highly significant. Furthermore, the case study results are also affected by the risks of using the grounded theory approach for analysing the ARCA method outcome regarding the process areas, cause types, interconnectedness, and feasibility for process improvement. It is possible that the interpretations about the case domains do not reflect the reality. Thus, it is possible that the classification system does not reflect the reality either. This risk decreases by the fact that we had cooperated months with the case companies. Additionally, we conducted interviews about the case domains before using the ARCA method (see articles I and IV). Therefore, our knowledge about the case domains was likely sufficient for using the grounded theory approach. 59

80 Discussion Internal validity Internal validity considers the validity of the causal relationships between the studied factors and their measured effects (Runeson and Höst 2008). The study factors of this thesis include the ARCA method and ARCA-tool. The measured effects include the improvement over the existing practices in terms of the perceived efficiency and ease of use. There is a threat to internal validity regarding the improvement of the ARCA method and ARCA-tool over the existing practices. It is possible that the researcher involvement and varying social context biased the evaluation results. It is also possible that the target problems caused bias in the study results. Section considers the potential bias caused by the researcher involvement and thus it is not discussed any further in this section. It is possible that the evaluation results regarding the existing practices were slightly biased by internal variations in the social contexts. RCA has been characterized as a witch-hunting tool (Latino and Latino 2006). Thus, the social context might affect the detected problems and thus decrease, or increase, the coverage of the retrospectives. Furthermore, there is a threat to internal validity regarding the potential differences in the target problems analysed with the ARCA method and existing practices. It is possible that the target problems analysed with the ARCA method were perceived as more important than the target problems analysed with the existing practices. Regarding this risk, Cases 1-4 had tried to solve their target problems previously (Article I). Thus, they were able to provide methodological comparison without significant biases by different target problems. Respectively, the participants of Cases 5-6 (Article III) were experienced on conducting retrospectives continuously and with different types of target problems. Thus, they were able to provide methodological comparison without significant biases by monotonic target problems External validity External validity is concerned about the generalizability of the results (Runeson and Höst 2008). The results regarding RCA and the RCA software tools are limited to the work practices of the ARCA method. Additionally, the total number of cases was only six. Furthermore, only two cases evaluated the use of ARCA-tool. The ARCA method is based on prior RCA methods, which increases the external validity. Instead, many different work practices of the prior methods were not included in the evaluation. Respectively, we did not compare the AR- CA method with the prior RCA methods. Instead, the ARCA method was compared with the existing practices of the case companies. Therefore, the external validity regarding the prior RCA methods remains somewhat low. On the other hand, the external validity regarding the existing practices is high. The evaluation covered different case contexts. Respectively, the ARCA method was evaluated with target problems at different company levels. The evaluation also covered both collocated and distributed retrospective contexts. These varia- 60

81 Discussion tions over the evaluation domains increased the external validity of the conclusions about the ARCA method. The results over the cases were remarkably similar. Regarding the conclusions about the ARCA method outcome with software project failures including the general cause types, process areas, and the feasibility of the bridge causes for process improvement activities, the external validity is high. Cases 1-4 were used to analyse the outcome of the ARCA method in the cases of software project failures. The cases varied in terms of software project failures, case participants, and case companies. Instead, the complexity of software project failures, the use of the ARCA method, and the roles of case participants remained similar. Thus, the cases considered the causes of software project failures and their relationships from various perspectives, which collectively increase the external validity. Respectively, the similarities between the cases made the cases more comparable. The future works should include studies with varying case contexts including projects with different size, cultural context, geographical distribution, software development methods, and software project failures. Currently our results are generalizable to SME organizations of software product companies operating in western cultures Reliability The threats to reliability are related to potential researcher bias in the study results (Runeson and Höst 2008). There was a social tie between the researchers and the evaluation contexts. Additionally, the analyses regarding the interviews and the qualitative analysis of bridge causes are researcher dependent. The researchers steered the use of the ARCA method in Cases 1-4 and the student experiment. This creates a threat for the reliability. It is possible that the researchers and subjects influenced one another (Sandelowski 1986). Thus, the contribution of the researchers could bias the evaluation results. This threat was controlled in Cases 5-6 (Article III). The limited ARCA method was steered solely by the company personnel. The evaluation results from Cases 1-4 are very similar to those in Cases 5-6. Thus, the potential risk of researcher bias is most likely related to the work practices of preliminary cause collection only (see Section 4.2.2), a work practice which was not included in the limited ARCA method. Furthermore, the qualitative analyses of the interviews (articles I, III, and V) and bridge causes (Article IV) create threats for the reliability. It has been claimed to be difficult to replicate qualitative data analyses (Mays and Pope 1995). Replicating the qualitative analyses of this thesis might not be difficult, because the methods used in the data collection and analyses are clearly reported in the related articles. Instead, the potential researcher bias is related to the interpretations about the qualitative data. Regarding the interview results, triangulation (Jick 1979) of the data sources, data collection methods, and data analysis methods increase the reliability of the results. Our conclusions were based on the analyses of individual parts of research data and the analysis of all research data combined together. Such an approach has been called her- 61

82 Discussion meneutic circle (Klein and Myers 1999), the key principle of interpretive field study research. Finally, regarding the qualitative analyses of the bridge causes, the classification system, including the analysis of inter-rater agreement, increases the reliability of the study results. The kappa value 0.65 indicated a good agreement between the researchers over the process area dimensions (Article IV). Thus, it is likely that the perceived causes of software project failures selected for qualitative analysis covered the bridge causes at each case. Furthermore, the bridge causes interconnecting two process areas included a relatively low number of causes. Therefore, summarizing how the process areas were interconnected was not a difficult task. 62

83 Conclusions and future work 8. Conclusions and future work This thesis made four contributions. First, a lightweight RCA method and RCA software tool was developed. Additionally, a high number of RCA methods and their work practices were introduced and discussed. Second, the use of RCA as a part of software project retrospectives was evaluated thoroughly in six industrial cases and one student experiment. Third, the use of computer facilitation during RCA was evaluated. Fourth, the outcome of RCA in the cases of software project failures was analysed. The evaluations of RCA are limited to the work practices of the developed ARCA method, a synthesis of prior RCA methods (Article I). The use of computer facilitation is limited to the developed ARCA-tool (Article III). Furthermore, the analysis of the outcome of RCA is limited to the outcome of the AR- CA method in SME organizations trying to explain why software projects have failed. 8.1 Conclusions This is one of the first studies in the software engineering context that has systematically evaluated the perceptions of subject matter experts using RCA in their software project retrospectives. Such a systematic evaluation has not been reported before. The results indicate that RCA is an important part of software project retrospectives. It increases the efficiency of retrospectives. It is also somewhat easy to use. Additionally, it reveals feasible targets for process improvement. The evaluation covers the use of RCA in collocated and distributed software project retrospectives. Additionally, it covers the use of RCA with different target problems and various levels of retrospectives including the levels of company, organization, and team. This is also the first study in the software engineering context that has evaluated the use of RCA software tools in distributed retrospectives. The results indicate that computer facilitation is essential for the RCA of distributed retrospectives. Respectively, RCA software increases the efficiency of collocated retrospectives. The main features of the software tool for RCA include collaborative cause-effect diagramming, corrective action development, and voting of the most important RCA outcome. Furthermore, detailed knowledge about the actual outcome of RCA was created in this study. In the case of software project failures, the outcome of RCA helps to express hypotheses on what happened, where it happened, and why it happened. This methodological capability increases the feasibility of using 63

84 Conclusions and future work RCA as a data collection method in software project retrospectives and in the studies of software project failures. 8.2 Future work In the future, comparative studies over the existing RCA methods and software tools should be conducted, e.g. Bjørnson et al. (2009). Conducting an in-depth analysis of software engineering problems during retrospectives is important, but also a challenging task. Retrospectives should be lightweight or they are not used (Glass 2002). Therefore, simplifying the visualization of the underlying target problem causes should be a part of future works. The high complexity and cross-functionality of software engineering problems makes it difficult to detect and analyse their causes. It should also be studied how to simplify the causal analysis without losing the important knowledge about the solution space of target problems. A software tool could improve the causal analysis. However, the current tools require improvements. Finally, replicative studies on the use of RCA with software project failures should be conducted in the future. Questionnaires and interviews about the causes of failures are an important part of the future studies. However, the central role of the bridge causes should be taken in account better. Bridge causes could be detected with RCA, but that requires further validation with different target problems and cases. The software engineering research has not yet filled this gap. 64

85 References References Al-Mamory, Safaa O., and Hongli Zhang Intrusion detection alarms reduction using root cause analysis and clustering. Computer Communications 32 (2) (February): Álvarez, M. P The four causes of behavior: Aristotle and skinner. International Journal of Psychology and Psychological Therapy 9 (1): Ammerman, Max The root cause analysis handbook: A simplified approach to identifying, correcting, and reporting workplace errors. First Edition ed. 444 Park Avenue South, Suite 604, New York, NY 1016, USA: Productivity Press. Andersen, Björn, and Tom Fagerhaug, eds Root cause analysis: Simplified tools and techniques. Second Edition ed. United States, Milwaukee 53203: Tony A. William American Society for Quality, Quality Press. Ayad, Amine Critical thinking and business process improvement. Journal of Management Development 29 (6): Berander, Patrik Using students as subjects in requirements prioritization. Paper presented at Empirical Software Engineering, ISESE'04. Bhandari, Inderpal, Michael Halliday, Eric Tarver, David Brown, Jarir Chaar, and Ram Chillarege A case study of software process improvement during development. IEEE Transactions on Software Engineering 19 (12) (December): Birk, Andreas, Torgeir Dingsøyr, and Tor Stålhane Postmortem: Never leave a project without it. IEEE Software 19 (3): Bjarnason, Elizabeth, and Björn Regnell Evidence-based timelines for agile project Retrospectives A method proposal. Agile processes in software engineering and extreme programming: , Springer. Bjørnson, Finn O., Alf I. Wang, and Erik Arisholm Improving the effectiveness of root cause analysis in post mortem analysis: A controlled experiment. Information and Software Technology 51 (1) (January): Boh, Wai F., Sandra A. Slaughter, and Alberto J. Espinosa Learning from experience in software development: A multilevel analysis. Management Science 53 (8): Burnstein, Ilene Practical software testing. New York: Springer Science+Business Media. 65

86 References Burr, Adrian, and Mal Owen, eds Statistical methods for software quality: Using metrics for process improvement. First Edition ed. ITP A division of International Thomson Publishing Inc. Card, David N Learning from our mistakes with defect causal analysis. IEEE Software 15 (1): Defect-causal analysis drives down error rates. Quality Time 10 (4) (July): Carver, Jeffrey, Letizia Jaccheri, Sandro Morasca, and Forrest Shull Issues in using students in empirical studies in software engineering education. Paper presented at Ninth International Software Metrics Symposium, Cerpa, Narciso, and June M. Verner Why did your project fail? Communications of the ACM 52 (12): Chillarege, Ram, Inderpal S. Bhandari, Jarir K. Chaar, Michael J. Halliday, Diane S. Moebus, Bonnie K. Ray, and Man-Yuen Wong Orthogonal defect classification - A concept for in-process measurements. IEEE Transactions on Software Engineering 18 (11) (November): Collier, Bonnie, Tom DeMarco, and Peter Fearey A defined process for project post mortem review. IEEE Software 13 (4): Cooke, David L Learning from incidents. Paper presented at Proceedings of the 21st International Conference of the System Dynamics Society, New York, NY, USA. Dennis, Alan R., Craig K. Tyran, Douglas R. Vogel, and Jay F. Nunamaker Jr Group support systems for strategic planning. Journal of Management Information Systems 14 (1): Dingsøyr, Torgeir Postmortem reviews: Purpose and approaches in software engineering. Information and Software Technology 47 (5): Dingsøyr, Torgeir, Nils B. Moe, and Øystein Nytrø Augmenting experience reports with lightweight postmortem reviews. Paper presented at PROFES '01 Proceedings of the Third International Conference on Product Focused Software Process Improvement. Dye, J., and T. van der Schaaf PRISMA as a quality tool for promoting customer satisfaction in the telecommunications industry. Reliability Engineering & System Safety 75 (3): Eden, Colin Analyzing cognitive maps to help structure issues or problems. European Journal of Operational Research (3):

87 References Edmondson, Amy C Learning from mistakes is easier said than done: Group and organizational influences on the detection and correction of human error. The Journal of Applied Behavioral Science 32 (1): El Emam, Khaled, and A. Gunes Koru A replicated survey of IT software project failures. IEEE Software 25 (5): Foddy, William, ed Constructing questions for interviews and questionnaires. Hong Kong by Colorcraft: Cambridge University Press. Galles, David, and Judea Pearl Axioms of causal relevance. Artificial Intelligence 97 (1-2): Glass, R. L Project retrospectives, and why they never happen. IEEE Software 19 (5) (October): Grady, Robert B Software failure analysis for high-return process improvement decisions. Hewlett-Packard Journal 47 (4) (August): Granger, Clive WJ Some recent development in a concept of causality. Journal of Econometrics 39 (1): Gupta, Anita, Jingyue Li, Reidar Conradi, Harald Rönneberg, and Einar Landre A case study comparing defect profiles of a reused framework and of applications reusing it. Empirical Software Engineering 14 (2) (20 August): Gursimran, S. W., and C. C. Jeffrey A systematic literature review to identify and classify software requirement errors. Information and Software Technology 51 (7) (July): Herbsleb, James D., and Deependra Moitra Global software development. IEEE Software 18 (2): Hevner, Alan R., Salvatore T. March, Jinsoo Park, and Sudha Ram Design science in information systems research. MIS Quarterly 28 (1): Höst, Martin, Björn Regnell, and Claes Wohlin Using students as subjects a comparative study of students and professionals in lead-time impact assessment. Empirical Software Engineering 5 (3): Hume, David A treatise of human nature [1739]. reprinted from the Original Edition in three volumes and edited, with an analytical index, by L.A. Selby-Bigge ed. Oxford: Clarendon Press. Jacobs, Jef, Jan Van Moll, Paul Krause, Rob Kusters, Jos Trienekens, and Aarnout Brombacher Exploring defect causes in products developed by virtual teams. Information and Software Technology (47):

88 References Jacobson, I., G. Booch, and J. Rumbaugh The unified software development process. Addison-Wesley. Jalote, Pankaj, and Naresh Agrawal Using defect analysis feedback for improving quality and productivity in iterative software development. Paper presented at Proceedings of the Information Science and Communications Technology (ICICT 2005). Jick, Todd D Mixing qualitative and quantitative methods: Triangulation in action. Administrative Science Quarterly 24 (4): Jin, Zhao X., John Hajdukiewicz, Geoffrey Ho, Donny Chan, and Yong-Ming Kow Using root cause data analysis for requirements and knowledge elicitation. Paper presented at International Conference on Engineering Psychology and Cognitive Ergonomics (HCII 2007), Berlin, Germany. Juristo, Natalia, and Ana M. Moreno Basics of software engineering experimentation. London: IBT Global. Kalinowski, Marcos, Guilherme H. Travassos, and David N. Card Towards a defect prevention based process improvement approach. Paper presented at Proceedings of the 34th EUROMICRO Conference on Software Engineering and Advanced Applications, Parma, Italy. Kavadias, Stylianos, and Svenja C. Sommer The effects of problem structure and team diversity on brainstorming effectiveness. Management Science 55 (12) (December): Klein, Heinz K., and Michael D. Myers A set of principles for conducting and evaluating interpretive field studies in information systems. MIS Quarterly: Latino, Robert J., and Kenneth C. Latino, eds Root cause analysis: Improving performance for bottom-line results. Third Edition ed Broken Sound Parkway NW, Suite 300 Boca Raton, FL : CRC Press. Lee, S., J. F. Courtney, and R. M. O'Keefe A system for organizational learning using cognitive maps. Omega, the International Journal of Management Science 20 (1): Leszak, Marek, Dewayne E. Perry, and Dieter Stoll A case study in root cause defect analysis. Paper presented at Proceedings of the 2000 International Conference on Software Engineering. Lethbridge, Timothy C., Susan Elliott Sim, and Janice Singer Studying software engineers: Data collection techniques for software field studies. Empirical Software Engineering 10 (3):

89 References Livingstone, A. D., G. Jackson, and K. Priestley Root causes analysis: Literature review. Health & Safety Executive, Contract Research Report 325: March, Salvatore T., and Gerald F. Smith Design and natural science research on information technology. Decision Support Systems (15): Mays, Nicholas, and Catherine Pope Rigour and qualitative research. BMJ 311 (8): Mays, Robert G Applications of defect prevention in software development. IEEE Journal on Selected Areas in Communications 8 (2) (February): McLeod, Laurie, and Stephen G. MacDonell Factors that affect software systems development project outcomes: A survey of research. ACM Computing Surveys 43 (24): Monteiro, Paula, Ricardo J. Machado, Rick Kazman, and Cristina Henriques Dependency analysis between CMMI process areas. Paper presented at PROFES, LNCS Nakashima, T., M. Oyama, H. Hisada, and N. Ishii Analysis of software bug causes and its prevention. Information and Software Technology (41): Naur, P., and B. Randel Software engineering: A report on a conference sponsored by the NATO science committee. Nato. Pearl, Judea, ed Causality: Models reasoning, and inference. United States of America: Cambridge University Press. Rooney, James J., and Lee N. Vanden Hauvel Collecting data for root cause analysis. Quality Progress 36 (11) (November): 104. Rooney, James J., and Lee N. Vanden Heuvel Root cause analysis for beginners. Quality Progress 37 (7) (August): Royce, Winston Managing the development of large software systems. Paper presented at Proceedings of IEEE WESCON 26 (August). Runeson, Per Using students as experiment subjects an analysis on graduate and freshmen student data. Paper presented at Proceedings of the 7th International Conference on Empirical Assessment in Software Engineering. Keele University, UK. Runeson, Per, and Martin Höst Guidelines for conducting and reporting case study research in software engineering. Empirical Software Engineering (14) (19 december):

90 References Salinger, Stephan, Laura Plonka, and Lutz Prechelt A coding scheme development methodology using grounded theory for qualitative analysis of pair programming. Paper presented at 19th Annual Psychology of Programming Workshop, Joensuu. Sandelowski, M The problem of rigor in qualitative research. ANS 8 (3): Schwaber, Ken, and Jeff Sutherland Scrum guide. Scrum Alliance. Shull, Forrest, Dag I. K. Sjøberg, and Janice Singer Guide to advanced empirical software engineering. Springer-Verlag London Limited. Siekkinen, Matti, Guillaume Urvoy-Keller, Ernst W. Biersack, and Denis Collange A root cause analysis toolkit for TCP. Computer Networks (52): Stålhane, Tor Root cause analysis and gap analysis - A tale of two methods. Paper presented at EuroSPI 2004, Trondheim, Norway. Stålhane, Tor, Torgeir Dingsøyr, Geir Hanssen, and Nils Moe Post mortem an assessment of two approaches. Empirical Methods and Studies in Software Engineering: Stevenson, William J., ed Operations management. 8th ed. New York: McGraw-Hill/Irwin. Svahnberg, Mikael, Aybüke Aurum, and Claes Wohlin Using students as subjects-an empirical evaluation. Paper presented at Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement. Terzakis, John Virtual retrospectives for geographically dispersed software teams. IEEE Software 28 (3): Thiele, T. N The law of causality. The Annals of Mathematical Statistics 2 (2): Traeger, Avishay, Ivan Deras, and Erez Zadok DARC: Dynamic analysis of root causes of latency distributions. Paper presented at SIGMET- RICS '08, Annapolis, Maryland, USA. Vanhanen, Jari, Timo O. A. Lehtinen, and Casper Lassenius Teaching real-world software engineering through a capstone project course with industrial customers. Paper presented at 1st International Workshop on Software Engineering Education Based on Real-World Experiences, EduRex 2012, Zurich. Verner, June, Jennifer Sampson, and Narciso Cerpa What factors lead to software project failure. Paper presented at Proceedings of Research Challenges in Information Science (RCIS 2008). 70

91 References Von Zedtwitz, Maximilian Organizational learning through post project reviews in R&D. R&D Management 32 (3): Wang, Yingxu, and Graham King Software engineering processes: Principles and applications. CRC Press LLC. Xiangnan, L., L. Hong, and Y. Weijie Analysis failure factors for small & medium software projects based on PLS method. Paper presented at The 2nd IEEE International Conference on Information Management and Engineering (ICIME 2010). Yin, Robert K., ed Case study research: Design and methods. 2nd Edition ed. United States of America: SAGE Publications. 71

92 References 72

93 Part II: Articles Part II: Articles I Development and evaluation of a lightweight root cause analysis method (ARCA method) Field studies at four software companies Timo O.A. Lehtinen, Mika V. Mäntylä and Jari Vanhanen Journal of Information and Software Technology, Volume 53, Issue 10, October 2011, Pages II III IV V What are problem causes of software projects? Data of root cause analysis at four software companies Timo O.A. Lehtinen and Mika V. Mäntylä Proceedings of International Symposium on Empirical Software Engineering and Measurement, 2011, Pages A tool supporting root cause analysis for synchronous retrospectives in distributed software teams Timo O.A. Lehtinen, Risto Virtanen, Juha O. Viljanen, Mika V. Mäntylä and Casper Lassenius Journal of Information and Software Technology, Volume 56, Issue 4, April 2014, Pages Perceived causes of software project failures An analysis of their relationships Timo O.A. Lehtinen, Mika V. Mäntylä, Jari Vanhanen, Juha Itkonen and Casper Lassenius Journal of Information and Software Technology, Volume 56, Issue 6, June 2014, Pages An experimental comparison of using cause-effect diagrams and simple memos in software project retrospectives Timo O.A. Lehtinen, Mika V. Mäntylä, Juha Itkonen and Jari Vanhanen Journal of Systems and Software (2014), 26 pages, in revision. 73

94 74

95 Article I I Development and evaluation of a lightweight root cause analysis method (ARCA method) Field studies at four software companies Timo O.A. Lehtinen, Mika V. Mäntylä and Jari Vanhanen Journal of Information and Software Technology, Volume 53, Issue 10, October 2011, Pages Elsevier B.V. Reprinted with permission.

97 Information and Software Technology 53 (2011) Contents lists available at ScienceDirect Information and Software Technology journal homepage: Development and evaluation of a lightweight root cause analysis method (ARCA method) Field studies at four software companies Timo O.A. Lehtinen, Mika V. Mäntylä, Jari Vanhanen Department of Computer Science and Engineering, School of Science, Aalto University, P.O. BOX 19210, FI Aalto, Finland article info abstract Article history: Received 17 December 2010 Received in revised form 3 May 2011 Accepted 15 May 2011 Available online 20 May 2011 Keywords: Root cause analysis Problem prevention Software process improvement Industrial field study Design science research Cause-effect diagram Context: The key for effective problem prevention is detecting the causes of a problem that has occurred. Root cause analysis (RCA) is a structured investigation of the problem to identify which underlying causes need to be fixed. The RCA method consists of three steps: target problem detection, root cause detection, and corrective action innovation. Its results can help with process improvement. Objective: This paper presents a lightweight RCA method, named the ARCA method, and its empirical evaluation. In the ARCA method, the target problem detection is based on a focus group meeting. This is in contrast to prior RCA methods, where the target problem detection is based on problem sampling, requiring heavy startup investments. Method: The ARCA method was created with the framework of design science. We evaluated it through field studies at four medium-sized software companies using interviews and query forms to collect feedback from the case attendees. A total of five key representatives of the companies were interviewed, and 30 case participants answered the query forms. The output of the ARCA method was also evaluated by the case attendees, i.e., a total 757 target problem causes and 124 related corrective actions. Results: The case attendees considered the ARCA method useful and easy to use, which indicates that it is beneficial for process improvement and problem prevention. In each case, target problem root causes were processed and corrective actions were developed. The effort of applying the method was 89 man-hours, on average. Conclusion: The ARCA method required an acceptable level of effort and resulted in numerous high-quality corrective actions. In contrast to the current company practices, the method is an efficient method to detect new process improvement opportunities and develop new process improvement ideas. Additionally, it is easy to use. Ó 2011 Elsevier B.V. All rights reserved. 1. Introduction Analyzing problem causes is considered in various software process improvement models, e.g., CMMI, ISO/IEC 12207, and Six Sigma [1]. The key for effective problem prevention is to know why the problem occurs [2]. We believe this is mainly because the reoccurrence of the problem can be prevented only through the elimination of its causes. Root cause analysis (RCA) is a structured investigation of a problem to identify which underlying causes need to be fixed [3]. It can help with process improvement and problem prevention in various contexts [1,4 12] and across all software organizations, including product development, hardware design, product engineering, and manufacturing [6]. Most of the reported industrial cases in software engineering root cause analysis [5,8,13 15] have aimed to lower defect rates Corresponding author. Tel.: ; fax: addresses: timo.o.lehtinen@aalto.fi (T.O.A. Lehtinen), mika.mantyla@aal to.fi (M.V. Mäntylä), jari.vanhanen@aalto.fi (J. Vanhanen). by preventing the causes of the most typical types of the defects. The results are promising: a 50% decrease in defect rates [15], a 53% savings in costs and a 24% increase in productivity [13] has been indicated. However, the high number of particular types of software defects is not the only target problem that should be analyzed; e.g., negative project experiences [4], delayed product releases, and challenging product installations are all industrially relevant and severe problems but have only been exiguously explored using RCA. There are many RCA methods [1 6,8,11,13,15 21], but no studies have included extensive analyses of the participants feedback on the RCA method, and only a few studies have discussed the effort required to apply the method. Grady [8] indicates that 7 h of team work is the minimum cost of executing a non-recurring RCA method, whereas Mayes [6] indicates that the costs of the RCA method in large organizations consist of 8 10 action team members using 10% of their time for action team duties and 4 7 developers participating in kickoff and causal analysis meetings, each lasting 2 h. Card [15] indicates that the costs of the RCA /$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi: /j.infsof

98 1046 T.O.A. Lehtinen et al. / Information and Software Technology 53 (2011) method range from 0.5% to 1.5% of the software budget, which additionally requires a startup investment to fund the supportive infrastructure, i.e. defect classification scheme definitions, procedure setup, establishment of data collection mechanisms, and personnel training. Unfortunately, the prior RCA studies are too general to assess and compare the required startup and execution efforts in concrete man-hours. Additionally, most of the industrial RCA studies [5,6,8,13 15] were conducted at large software companies operating with mature development processes and products. The optimal RCA method for small- to medium-sized companies operating closer to a style of agile software development is likely to be different from the RCA methods presented in the prior studies. This paper presents a lightweight RCA method and its empirical evaluation. For our research purposes, we developed an RCA method, referred to as the ARCA method, and evaluated it. This was done using a framework similar to that of design science [22,23], presented in Fig. 1. The environment of the research was the context of applying root cause analysis methods in software engineering. The business need was to develop a lightweight RCA method feasible for problem prevention at medium-sized software companies. The knowledge base of the method design was established by a literature review of root cause analysis (see Sections 2 and 3). The assessment of the design (see Sections 5 and 6) was performed through industrial field studies (see Section 4) by piloting the ARCA method in target problems of four medium-sized software product companies. The research goal is as follows: To develop a lightweight RCA method for medium-sized software companies and evaluate it in industrial cases. The ARCA method consists of four steps, i.e., target problem detection, root cause detection, corrective action innovation, and documentation of the results. Unlike the prior RCA methods applied in the software industry [5,6,8,13 15], the ARCA method does not require problem reports, e.g., software defect reports, in the target problem detection step. Instead, our method utilizes a focus group meeting to detect the target problem. This difference makes the ARCA method lightweight. It does not require heavy startup investment and, simultaneously, it is highly adaptable for various target problems. We collected feedback from the case attendees to evaluate the easiness and usefulness of the ARCA method. Additionally, we measured the required effort and the output of the method, i.e., 757 target problem causes and their 124 related corrective actions. Even though implementing and monitoring the corrective actions is an important part of problem prevention programs [16], we excluded it from this research. It would have been practically impossible to separate the effects of the ARCA method from the company-specific context factors. Evaluation of the method was conducted by answering the following research questions: RQ1: Is the ARCA method efficient? Efficiency refers to the interrelationship between the advantages of the method output and the required effort. The output of the ARCA method is a set of corrective actions for the related root causes. Quality of corrective actions refers to their feasibility for and impact on the target problem. RQ2: Is the ARCA method easy to use? Ease of use refers to the ease of conducting the steps of the ARCA method: target problem detection, root cause detection, corrective action innovation, and documentation of the results. The rest of the paper is structured as follows. Section 2 discusses the theoretical background of root cause analysis. Section 3 introduces the ARCA method and its development. Section 4 presents the field study methodology used in the empirical part of this study. Section 5 shows the results of the field studies, and Section 6 answers the research questions and discusses the most interesting findings and threats to their validity. Section 7 states the conclusions and proposes future work on the topic. 2. Theoretical background In this section, we introduce the framework of RCA methods. We first present definitions of root cause analysis in Section 2.1 and characterize what we mean by the word target problem in Section 2.2. Thereafter, in Section 2.3, we summarize the common steps of RCA methods and their related work practices Definitions of RCA Usually, the idea behind RCA is to decrease the likelihood of a problem s reoccurrence [2,13,15,18], but, depending on the utilization context, RCA targets vary. For example, RCA is used to detect the causes of negative and positive project experiences [4] and to distill textual raw data, which is useful for requirement collection and knowledge elicitation [20]. There is no unique and commonly accepted definition for RCA [3,16] or for a root cause. Several authors introduce RCA as a cause Environment Relevance IS Research Rigor Knowledge Base People - Roles - Capabilities - Characteristics Organizations - Strategies - Structure & Culture - Processes Technology - Infrastructure - Applications - Communications Architecture - Development Capabilities Business Needs Develop / Build - Theories - Artifacts Assess Justify / Evaluate - Analytical - Case Study - Experimental - Field Study - Simulation Refine Applicable Knowledge Foundations - Theories - Frameworks - Instruments - Constructs - Models - Methods - Installations Methodologies - Data Analysis Techniques - Formalism - Measures - Validation Criteria Application in the Appropriate Environment Additions to the Knowledge Base Fig. 1. Framework of design science in information systems research [22].

99 T.O.A. Lehtinen et al. / Information and Software Technology 53 (2011) detection method only [2 4,13,17], whereas some authors present RCA as a problem prevention method that includes causal analysis and the development of corrective actions [11,15,16,18]. Some authors define a root cause as the deepest cause at the end of the causal structure [16,17], whereas others define it as any underlying cause of a target problem [2]. However, most of the authors recognize a root cause as a cause that management has the power to fix [2,16,17,21]. Logically, the target problem may have numerous root causes. In our terminology, RCA is a process of detecting a target problem, collecting and organizing its causes, and recognizing its root causes. For our purposes, the RCA method for problem prevention means a method that includes RCA and the development of corrective actions. We define a root cause as any underlying cause of the target problem that the management has the ability to fix Target problem characterization The target problem of RCA is a state of difficulty resulting in unwanted situations or events [16]. Additionally, it should be systematic and create severe consequences [1,5,8,11,13,15]. We believe that there is a wide variety of systematic target problems in software engineering, e.g., a defect, a high number of software defects, an overrun project budget, a late product release, a challenging product installation, lack of software testing, etc. Logically, the events and problems of a software company are interconnected with a cause and effect relationship. We describe this relationship as the interconnectivity of a problem. It is claimed that every problem has a solution space that can be characterized in its complexity and cross-functionality [24]. The solution space of a problem corresponds to the number of solutions that the problem may permit [24]. The complexity of a problem corresponds to the effort required in its solution space to solve it, and cross-functionality represents the diversity of expertise required by the problem to attain solutions within its solution space [24]. We believe that the interconnectivity of the target problem correlates with the complexity and cross-functionality of its solution space. An example of a target problem with a complex and cross-functional solution space is a late product release. The target problem may be caused by various difficult causes, e.g., overly optimistic schedule estimations, a large number of software defects, misunderstood requirements, or other unknown factors. A software defect alone may not be considered a severe situation. However, it may be caused by some systematic working methods in the development work; e.g., the features require modifications to a database, but developers omit them in 30% of the cases due to their busy schedules. As time goes on, the working methods might result in a large number of software defects in the project, causing delays to the already busy development schedule and, in the worst case, causing a delay in the product release. The busy development schedule, in turn, may be caused by various causes, not only because of the high number of the defects but also because of the overly optimistic project plans driven by misunderstood functional requirements established by the company sales personnel. We see that solving the target problem requires managing all of its interconnections. Unfortunately, some of the interconnections may not be controlled or prevented Common steps of RCA methods and related work practices We found three steps that are common to the RCA methods introduced in the literature: (1) target problem detection [1 5,8,11,13,15 21], which defines the target problem of the RCA method, (2) root cause detection [1 5,8,11,13,15 21], which detects and organizes the causes of the target problem, and (3) corrective action innovation [1 3,5,8,11,13,15 19], which develops corrective actions for the most important root causes. Alternative work practices have been presented for each of the above steps. These are presented in Sections Target problem detection A target problem for RCA is detected through problem sampling [1,3,5,8,13,15,16,19], interviewing [2,3,25], brainstorming [3,16], and flowcharting [3,16,17]. Usually, there is a meeting where the target problem is finally decided upon [15,19]. The usual case of software engineering root cause analysis exploits the Pareto principles to software defects to detect the target problem [1,5,8,11,13,15,18]. The idea is to sample and classify software defects and, thereafter, to select the class containing the highest number of defects as the target problem. Another approach to detect the target problem is using qualitative methods, such as interviewing [3,25] software development managers to name the main problems of the development work. The negative side of this approach is that the target problem might not be as focused as it might be if exploiting Pareto principles to software defects. The positive side is that this is a quick and easy way to select an important and severe problem for analysis. Thus, the large workload of defect classification and deeper analyses can be avoided. For some organizations, there is a great motivation to use this approach, as there is little possibility of separate resources for the RCA investigation, while there are more possibilities in a large company [6,8]. The mad schedule rush of software companies forces them to progress in new projects rather than focusing on analyzing the defects of yesterday s projects [26]. However, in a large development effort, this approach could cause too many target problem causes to be detected, so the magnitude of work to analyze all the relevant causes would stay high [5,8,13] Root cause detection In root cause detection, there are different ways to collect and organize the target problem causes [25]. The causes are usually collected from various stakeholders [3,15,19,25] using interviewing [17], questionnaire [16,27], brainstorming, and brainwriting methods [3,16,27]. The questionnaires and interviews are more anonymous approaches, in contrast to the brainstorming and brainwriting approaches, which are performed publicly. The target problem causes are usually organized into a causeeffect diagram based on their cause and effect relationships using a fishbone diagram [4,11,16,19,28], a fault tree diagram [16], a causal map [4], a matrix diagram [16], a scatter chart [16], a logic tree [3], or a causal factor chart [2]. It has been shown that lists, worksheets, and charts may also be used to organize the causes [17]. The root causes are finally detected by analyzing the collected target problem causes by focusing on the causes that will be prevented [2,15] Corrective action innovation Corrective actions are usually developed in a meeting [5,8,13,15,16], where brainstorming and brainwriting are the recommended work practices [16]. Brainstorming has three major obstacles that brainwriting can tackle: (1) people cannot speak simultaneously, (2) there is a fear of negative evaluation from other group members, and (3) individual contributions are not identifiable [4,16,24]. Additionally, it has been claimed that brainwriting is a feasible practice to address complex problems, whereas in cross-functional problems (see Section 2.2), brainstorming attains better solutions [24]. These practices can also be mixed with problem elimination techniques, such as Systematic Inventive Thinking, the Theory of Inventive Problem Prevention, or the Six Thinking Hats [16]. However, these techniques are rather complex, and more creative approaches should be used [16].

100 1048 T.O.A. Lehtinen et al. / Information and Software Technology 53 (2011) Description of the ARCA method In this section, we present the ARCA method. First, in Section 3.1, we introduce how the method was developed. Thereafter, in Section 3.2, we present the work phases and practices of the ARCA method and compare these to the most notable prior RCA methods by following the common steps of the RCA methods introduced in Section Development of the ARCA method We started the design of the ARCA method by setting down its requirements. We believe that a beneficial, lightweight RCA method would help software companies to develop high-quality corrective actions with low effort. We think that this goal can be satisfied by fulfilling the following requirements: 1. Helps to develop corrective actions that are feasible and effective for solving the target problem. 2. Requires low effort. 3. Is easy to use. 4. Is adaptable for different kinds of target problems. Thereafter, we performed a literature review. The literature included RCA methods used in the software industry and also in other contexts. The literature was collected using predefined search words ( RCA, root cause analysis, DCA, Defect Causal Analysis, defect analysis, defect prevention, and problem prevention ) in Google and Scopus. The review was driven by the following questions: 1. Are there steps common to all RCA methods? 2. What are the recommended work practices in the different steps of RCA? We designed an initial version of the ARCA method based on its requirements and the literature review. During the method design, we performed an analytical argumentation on various alternatives introduced in prior works. The initial ARCA method was piloted with a student software project. This was very important because it made it possible to refine the method before the industrial field studies were conducted. For example, we realized that a monitor and a software tool should be used to visualize and register the problem causes because of the high number of them. Using Postit notes was unfeasible for this purpose The ARCA method In this section, we first introduce the most notable prior RCA methods. Thereafter, in Sections , we describe in detail the work phases and practices of the ARCA method and argue the design by comparing it to these prior RCA methods. We discuss these methods because they were presented in enough detail in the related publications and, like the ARCA method, they follow the common steps of the RCA methods, as summarized in Table 1. Rooney and Vanden Heuvel [2] present an RCA method consisting of four work phases: data collection, causal factor charting, root cause identification, and recommendation generation. The method starts with the data collection, where the team gathers a target problem s related data. In the causal factor charting, the team organizes and analyzes the results of the data collection. Causal factor charting provides a way to structure the data based on its cause and effect relationships to a sequence diagram, which helps investigators to recognize causal factors that are seen as the most likely potential causes of the target problem. Thereafter, in the root cause identification, the investigators analyze the causal factors using a decision diagram, which is an up-front collection of potential problem causes helping to answer questions about why a particular cause exists. Finally, in the recommendation generation, the Table 1 Summary of the ARCA and prior RCA methods and their work phases. RCA method Target problem detection step Root cause detection step Corrective action innovation step Work phase Work practices Work phase Work practices Work phase Work practices Rooney and Vanden Heuvel [2] Ammerman [17] Latino and Latino [3] Data collection Problem definition and data collection Task analysis Change analysis Control barrier analysis Opportunity analysis Interviewing, inspections Causal factor charting Root cause identification Event and causal factor charting Paper-and-pencil, Root cause walk-through determination Flow charts Flow charts Sequence diagrams, interviewing, Pareto analysis Data analysis Card [15] Defect sampling Sampling, meetings Determining principal cause Defect Classification scheme, classification meetings ARCA method Identifying systematic errors Target problem detection Pareto analysis, meetings A focus group meeting Preliminary cause collection Causal analysis workshop Sequence diagram Decision diagram Sequence diagrams Interviewing, event and causal factor charts, lists, and worksheets Flow chart, logic tree, meetings A fishbone diagram, cause categories, meetings Anonymous inquiry, a directed graph Brainwriting and brainstorming in a meeting, a directed graph Recommendation generation Corrective action development Recommendation development Development of action proposals Root cause selection Corrective action workshop Interviewing Writing individually, meetings Meetings inquiry Brainwriting combined with skeptical and optimistic perspectives in a meeting

101 T.O.A. Lehtinen et al. / Information and Software Technology 53 (2011) investigators develop corrective actions for the most important causal factors. Ammerman [17] introduces an RCA method (PIC) consisting of eight work phases: problem definition and data collection, task analysis, change analysis, control barrier analysis, event and causal factor charting, root cause determination, corrective action development, and reporting conclusions. The method starts with defining a target problem, which is followed by collecting the problemrelated data. The task analysis helps the team to understand where the pitfalls are within the target problem that is under evaluation. The goal is to find out what was assumed to have happened, not exactly what happened. Instead, the change analysis helps to understand what actually happened and what was expected to happen. The activity that was successfully performed is compared to an activity that was unsuccessfully performed. The focus of the control barrier analysis is to discover where physical or administrative barriers are needed to prevent the target problem. In the event and causal factor charting, a flow chart that graphically displays an entire event resulting in the target problem is created. The work phase of the root cause determination aims to detect the root causes of the target problem. The team should detect the root causes in a systematic way and utilize visual tools such as lists, worksheets, and charts. The goal of the corrective action development is to identify, develop, and evaluate corrective actions required to prevent the target problem s recurrence or significantly reduce its likelihood. Finally, the team documents all the intermediate results and recommended corrective actions. Latino and Latino [3] present an RCA method (PROACT) consisting of four work phases: opportunity analysis, data analysis, developing recommendations, and reporting conclusions. In opportunity analysis, failures are sampled and classified. Then, Pareto analysis is used to detect the most likely potential target problems for RCA. Thereafter, in the data analysis, cause and effect relationships are detected and structured using a logic tree, which is a combination of a logic diagram and a fault tree. The goal is to detect the root causes of the target problem by listing and structuring hypothetic causes and either proving or disproving them with hard data. In corrective action development, the team first decides on an acceptance criterion for recommendations. Thereafter, the team develops recommendations to address the target problem root causes. Finally, the team documents all the findings, including the failures, root causes, and recommendations. Card [15] presents Defect Causal Analysis (DCA), an RCA method consisting of six work phases: defect sampling, defect classification, identifying systematic errors, principal cause determination, developing action proposals, and reporting conclusions. In defect sampling, software defects are sampled to explore those that occur most frequently and have the most negative impact on the quality of the software. Thereafter, in the defect classification, investigators identify clusters of software defects by classifying the sample. Then, they use Pareto analysis to identify systematic defects. In principal cause determination, the root causes of the systematic defects are detected. If the root cause is not obvious from the defect statement, it should be drawn out using a fishbone diagram. In the development of action proposals, the corrective actions are developed for the determined root causes to either detect systematic defects earlier or prevent them. Finally, all the results, including the root causes and corrective actions, are recorded Step 1: Target problem detection This is the first step of the ARCA method. After this step, the target problem will have been defined. Rooney and Vanden Heuvel [2] and Latino and Latino [3] indicate that interviewing is a feasible practice in detecting the target problem. However, we emphasize a focus group meeting because it is an excellent approach to identify rapidly what is important to the people [29]. We also believe that it requires less effort than interviewing and is easy to conduct. Flow charting is also shown to be a useful method in problem detection [17]. However, the intangibility of software engineering problems makes it difficult to create flow charts to describe how they evolve. We believe that problem sampling [3,15,18] is unfeasible for many target problems. It sounds like a great idea to analyze and eliminate the causes of the most usual type of problems to lower the likelihood of their reoccurrence. On the other hand, problem sampling requires effort and information that is not easily available in practice [30]. For example, our collaboration with industrial partners suggests that information such as the defect type or defect module is sporadically reported by the company s personnel [31], thus making the defect data too unreliable for RCA. Additionally, according to [32], it is labor-intensive and probably not worthwhile to link the defects to their causes in large development efforts, as it may not lead to ideas that can be used to improve the software engineering mechanisms. Moreover, the problem sampling can be done only for the problems that are reported [1,15,19,33], and, in many cases, defect databases do not contain problems such as requirements faults [33]. In the ARCA method, the first step starts with a focus group meeting where the target problem is defined and the causal analysis workshop participants, who are to collect the target problem causes and to evaluate root causes, are selected (4 10 participants). The RCA facilitator holds this meeting with company staff, e.g., the managers who are responsible for product quality. In the meeting, the following issues should be justified and documented: what is the target problem and why exactly is this problem important to prevent? When selecting the causal analysis participants, it is important to include target problem experts that represent different stakeholders around the target problem. These may include project managers, developers, testers, software quality assurance staff, product managers, and process improvement group members Step 2: Root cause detection This is the second step of the ARCA method. After this step, the most important root causes will have been detected and evaluated. We see that both anonymous and public approaches are important in root cause detection. Anonymity encourages the participants to address causes that they believe are dangerous to say aloud, whereas publicity helps to address causes that many participants value highly. The other RCA methods do not emphasize this. Ammerman [17] emphasizes interviewing only, whereas Latino and Latino [3] and Card [15] emphasize meetings. Unlike the prior RCA methods, we recommend using a directed graph [4] to structure the causes based on their cause-and-effect relationships (see Fig. 2). As the directed graph represents a network of causes, each cause needs to be placed only once in the cause-effect diagram. The cause-effect diagrams of the prior RCA methods result in the problem of duplicating the same cause multiple times if the cause simultaneously explains more than one cause. Card [15] recommends using a fishbone diagram, which he claims to be a simple technique. However, using the fishbone diagram does not solve the duplicating problem. The problem also occurs when a logic tree is used, which is recommended by Latino and Latino [3]. Rooney and Vanden Heuvel [2] recommends using a sequence diagram followed by a decision diagram. Unfortunately, the sequence diagram also includes the duplicating problem and the decision diagram includes the challenge of detecting the correct problem causes in advance, as the target problems vary. Additionally, we believe that using two diagrams is more difficult than using one. Ammerman [17] indicates that structuring the target problem causes should be done with visual tools such as lists, worksheets, and charts. However, it is likely that too many target

102 1050 T.O.A. Lehtinen et al. / Information and Software Technology 53 (2011) Fig. 2. The cause-effect diagram of the ARCA method. problem causes will be detected to be visualized using these tools [5]. Additionally, the duplicating problem occurs with lists, worksheets, and charts. In the ARCA method, the second step consists of two work phases: preliminary cause collection and a causal analysis workshop. In preliminary cause collection, the RCA facilitator sends out an inquiry to the case participants and collects the target problem causes. The inquiry asks the participants to list at least five causes of the target problem. Since the listed causes probably complement one another, they are organized into a cause-effect diagram by the RCA facilitator, as presented in Fig. 2. Using a software tool is recommended here. The second work phase is the causal analysis workshop, which is prepared by the RCA facilitator. A cause entity (see the colored causes in Fig. 2) includes a cause and its sub-causes, which together form an entity that is reasonable to process together. By analyzing the cause-effect diagram, the RCA facilitator selects the most important cause entities to be processed in the workshop. It is possible that the entities will overlap since the causes explain one another. Processing a cause entity containing about 10 causes can be done adequately in about 40 min. We recommend this as a suitable size for a cause entity. The causal analysis workshop is a meeting where new target problem causes are collected and analyzed. The workshop has a recommended minimum duration of 40 min per cause entity. At the beginning of the workshop, the RCA facilitator presents the target problem, the preliminary causes, and the selected cause entities. Thereafter, new causes are collected for each selected cause entity. The cause entities are processed one at a time. Each cause can either deepen or widen a cause entity. Collecting the causes into a cause entity is done in three parts: 1. The participants write new causes down on paper for 5 min (the cause-effect diagram should be projected onto the wall). 2. Each participant presents the causes he or she has written and explains where they should be placed in the cause-effect diagram. 3. The participants briefly discuss the cause entity s causes, trying to brainstorm more causes and to recognize whether a cause has a relationship to other causes. After all the selected cause entities have been processed, the related cause-effect diagram is analyzed as a whole. The RCA facilitator asks the participants to point out essential causes and to discuss them. The controllable causes, i.e., the root causes, are identified. The other causes are set aside and are not processed any further Step 3: Corrective action innovation This is the third step of the ARCA method. After this step, the corrective actions for the most important root causes will have been developed. In the prior RCA methods, there is very little practical guidance on how to develop corrective actions. Keeping a meeting where the corrective actions are developed is presented by Latino and Latino [3] and Card [15], whereas Ammerman presents interviewing techniques to be used [17]. We believe that keeping the meeting helps to develop commitment to the corrective actions among the participants more than the interviewing techniques. In the corrective action innovation, we chiefly emphasize brainwriting because it provides an efficient way to use all of the participants simultaneously. However, as we believe that there are also advantages in brainstorming (see Section 2.3), we recommend it to refine the findings into the best corrective actions. Latino emphasizes brainstorming in the corrective action innovation but stresses also that it is important to write down the corrective actions [3]. Ammerman indicates that it is important to develop multiple corrective actions and to evaluate and select them to have alternatives [17]. We found that the commonality between the elimination techniques presented in the literature [16] is that a corrective action is analyzed from different perspectives, especially from optimistic and skeptical perspectives. Therefore, we adopted the idea of different perspectives to the ARCA method by creating a paper template for a corrective action that forces the participants

103 T.O.A. Lehtinen et al. / Information and Software Technology 53 (2011) to brainwrite the corrective actions from both perspectives (see Appendix E). In the ARCA method, the third step consists of two work phases: the root cause selection and the corrective action workshop. The first work phase includes the selection of the root causes. To focus the available resources as efficiently as possible, the RCA facilitator has to carefully select the root causes for which corrective actions are to be developed. First, the finalized cause-effect diagram is sent to the participants of the causal analysis workshop. The participants are asked to propose root causes for which corrective actions should be developed and evaluate them using the following criteria: the level of impact on the target problem and the level of difficulty of developing corrective actions. Then, the RCA facilitator selects 4 6 root causes to be processed using his judgment and analysis of the root causes proposed by the participants. Finally, the RCA facilitator documents each of the selected root causes including its sub-causes into a cause-effect diagram, each for an individual paper. The second work phase of the step is the corrective action workshop, which is a meeting wherein the corrective actions of the selected root causes are developed, evaluated, and analyzed. The workshop has a recommended duration of 2 h. First, the RCA facilitator selects 4 6 participants to join the workshop. They have to be an aggregate of experts who are as competent as possible at solving the selected root causes. In the workshop, each participant works, in turn, for min with one root cause. They develop corrective actions by writing them down on paper (see Appendix E) and rotating them through the participants. The root causes are rotated until every participant has treated all the root causes. A participant can also supplement corrective actions developed by other participants by adjusting, expanding, and commenting on them. The corrective actions are evaluated to find the best corrective actions. The evaluation is conducted similarly to their development: the root causes, including their corrective actions, are rotated through the participants. Each participant evaluates corrective actions of a root cause by giving two attributes to each (scale of 1 5): impact on the target problem and feasibility. The last participant evaluating the corrective actions of a root cause calculates the sum of evaluations of each corrective action. Then, he presents the corrective action that has the highest value of the multiplication of the impact and feasibility. This is done for each processed root cause. The participants are asked to discuss the corrective action and to refine it. The presenter writes down the comments and improvement suggestions concerning the action he presented Step 4: Documentation of the results During this final step of the ARCA method, the results are compiled into a final report, which includes at least the target problem definition, the cause-effect diagram, and all of the corrective actions, including their evaluations. This step is also mentioned by Card [15], Latino and Latino [3], and Ammerman [17]. The best corrective actions should be implemented to make the actual changes in the way of working. Because gaining currency for a corrective action can be challenging, the final report can be used to justify the changes required to prevent the target problem. Additionally, the final report can be a valuable source of cause information in future RCA cases. 4. Field study methodology This section introduces the field study methodology [29] used in the empirical part of this study. Section 4.1 presents how the data collection and analysis was conducted in the field study settings. We introduce the data collection methods used, including their focus and how the collected data was analyzed. In Section 4.2, we introduce the industrial cases wherein the field studies were conducted Data collection and analysis Triangulation of the data sources and the data collection methods increases the reliability of the results [34,35]. We used interviews [34], query forms [36], measurements, and observations [34] to collect empirical evidence from the industrial cases to evaluate the feasibility of the ARCA method. Table 2 summarizes the data collection methods and their focus in the analyses of this study. Sections introduce these instruments in detail and discuss how they were used. The data analysis was conducted in two phases. After each case, we analyzed the collected data to help understand the strengths and weaknesses of the ARCA method used in the current case. After all the cases were conducted, we evaluated the method as a whole by combining all empirical evidence from the industrial cases and comparing the results among the interviews, query forms, observations, and measurements Interviews The key representatives were company people involved in steering the cases and had the power to make changes in their companies. Interviews were held with them before and after a case to analyze how they experienced the ARCA method in general. The researchers tested the interview questions with colleagues before the cases. Interview 1 (see Appendix A) was a group interview with 2 4 company key representatives. Its goal was to give an overview of the case context (see Section 4.2). Interview 2 (see Appendix B) Table 2 Targets of the data collection instruments. Target Presented in Interview 1 Query form 1 Query form 2 Interview 2 Measurement Observation Case context Case information x Section 4.2 Current practices x Case participants x x Case target problem x x RQ1 Number of detected causes (Table 5) x Cause correctness x x Tables 6 and 7) Importance of the processed causes x x (Tables 6 and 7) Number of processed causes (Table 5) x Number of corrective actions (Table 5) x Feasibility of the corrective actions (Fig. 3 and Tables 6 and 7) x x x Impact of the corrective actions (Fig. 3,Tables 6 and 7) x x x Effort used (Table 4) x x Feasibility of the method (Tables 6 and 7) x x x x RQ2 Easiness of the method (Tables 6 and 7) x x x x

104 1052 T.O.A. Lehtinen et al. / Information and Software Technology 53 (2011) was conducted with the key representatives responsible for steering the case. Its goal was to evaluate the practices and output of the ARCA method. Before interview 2 was conducted, the final report of the ARCA method (see Section 3.2.4) was first examined. As Yin recommends [34], a similar protocol was used in each interview and the duration was no longer than 60 min. Each interview was recorded and transcribed by the first author. Thereafter, the answers were cleaned up and entered into an Excel sheet according to the following coding themes: case context, method usefulness, method easiness, and output quality. Finally, the particular theme was analyzed between the cases by comparing how the answers varied Query forms The query forms were used after the causal analysis and corrective action workshops to analyze how the case participants experienced the ARCA method and its output. The query forms included closed and open-ended questions, as recommended by [36]. The researchers tested and reviewed the query forms with colleagues before using them. Additionally, they were tested with students who piloted the ARCA method before the industrial cases. We asked the names of the case participants in the query forms because we wanted to analyze how the answers of particular participants varied between the workshops. Unfortunately, it is possible that this slightly skewed the results, as the participants knew that the researchers might at least note their names. However, we stressed that the answers are confidential and emphasized the importance of giving feedback as straightforwardly as possible. Query form 1 (see Appendix C) was designed to help in analyzing how the case participants experienced the case target problem and the work practices of the root cause detection step (see Section 3.2.2). Query form 2 (see Appendix D) was designed to help analyzing how the case participants experienced the importance of the processed root causes and the work practices of the corrective action innovation step (see Section 3.2.3). We also analyzed whether the output of the steps of the root cause detection and corrective action innovation was correct according to the case participants. Similarly, we analyzed the feasibility and impact of the corrective actions. The data from the query forms was entered into an Excel spreadsheet to make it possible to analyze just one case or all the cases simultaneously. All the answers from each participant were divided into separate cells according to the coding themes presented in Section For every quantitative question in the query forms, we calculated the averages and standard deviations of the answers for each case separately and for all the cases simultaneously Measurements We measured the effort used and the output of the ARCA method. We kept an accurate record of how many man-hours were used in the different activities of the method and how many causes were detected and processed during the cases. We also kept an accurate record of how many corrective actions were developed and how the case participants evaluated the feasibility and impact of each corrective action with respect to the target problem. The effort used was measured straightforwardly in most of the activities of the ARCA method, as we were able to video-record them. However, the effort used in two work phases relied on the reports of the case participants. Each case participant reported independently how much effort they used in the preliminary cause collection (see Section 3.2.2) and in proposing the root causes for which the corrective actions should be developed (see Section 3.2.3). The required effort for the ARCA method was entered into an Excel sheet to analyze how many man-hours were actually used in the different steps of the method and how many people contributed there. The number of detected and processed target problem causes during the cases was measured straightforwardly. We divided the causes according to the steps of the ARCA method. Similarly, we were able to measure the number of corrective actions. During the data analysis, the number of target problem causes and the number of corrective actions were entered into an Excel sheet to compare the cases. A paper template (see Appendix E) was used to develop the corrective actions and to evaluate their quality, as presented in Section The paper template included an evaluation form that was used by the case participants to evaluate the feasibility and impact of each corrective action. During the data analysis, we compared the cases by analyzing the corrective actions based on these evaluations. The evaluation form was not anonymous, as the case participants were able to see what the others answered. Thus, it is possible that the evaluations were biased Observations Two researchers participated in each case. One steered the case together with the key representatives, whereas one focused only on observing the actions during the video-recorded workshops. Both researchers wrote notes during the workshops. After each workshop, the researchers held a feedback session together. The observation data was used to confirm the results of the interviews and query forms on the feasibility and easiness of the work phases of the ARCA method Industrial cases The field studies were conducted at four medium-sized software companies located in Finland. Based on interview 1, Sections introduce these case companies and the related cases in detail. The target problem of the ARCA method was chosen by the key representatives of the case company, who also selected the case participants. To avoid the possibility that the cases could be highly different, the key representatives were requested to choose generally similar target problems, i.e., a complex software engineering problem that causes delays in software projects. Table 3 summarizes the company cases with the data important for using the ARCA method. In the table, the qualitative data is based on interview 1, whereas the quantitative data is based on query form 1. Including the effort the company has expended trying to solve the target problem previously, the table summarizes how the key representatives characterized the target problems and how the case participants evaluated it. The impact evaluation of the target problem is a combination of the query form questions regarding the impact of the target problem for the quality of the product, adverse effect of the target problem to my daily work, impact of the target problem for the end users of the product, impact of the target problem for customer relationships, and internal impact of the target problem for the company. The similarities of the cases made them more comparable, whereas the dissimilarities consolidated the field study results in different case contexts. In each case, the target problem was experienced as highly complex and difficult to prevent. Similarly, in each case, the impact of the target problem was experienced as relatively high. Instead, the target problem itself and the effort the company had employed to try to prevent it varied between the cases and between the opinions of the case attendees. Additionally, the company size and the current company practices, including the available resources for software process improvement, varied. There were also differences in the roles of the case participants.

105 T.O.A. Lehtinen et al. / Information and Software Technology 53 (2011) Table 3 Summary of the case contexts. Case 1 Case 2 Case 3 Case 4 Case company Software company with 100 employees Software company with 450 employees Software company with 100 employees Software company with 110 employees Target problem Fixing and verifying defects delays project schedules Blocker type defects are detected in the product after release New product installation and updating are challenging tasks Issues lead time is sometimes intolerably long Roles of the case participants Project managers, quality managers, developers, sales personnel, N = 9 Mostly developers, N = 9 Project managers, testers, developers, N =7 Project managers, testers, developers, sales personnel, N =6 Extremely costly and complex Target problem characteristics Extremely costly and complex Not very costly, but very complex High impact on customer relationships and complex Difficulty of preventing Average = 5.3 Average = 5.6 Average = 5.4 Average = 5.5 the target problem a Standard deviation = 1.1 Standard deviation = 0.8 Standard deviation = 1.3 Standard deviation = 1.0 Earlier effort surrounding the target problem a We have continuously tried to solve this During recent months, we have reacted to this We haven t managed this much We have discussed how to improve communication Average = 3.0 Average = 4.3 Average = 3.4 Average = 3.0 Standard deviation = 1.0 Standard deviation = 1.3 Standard deviation = 0.5 Standard deviation = 0.6 Impact of the target Average = 5.8 Average = 5.0 Average = 5.6 Average = 5.9 problem a Standard deviation = 1.1 Standard deviation = 1.3 Standard deviation = 0.9 Standard deviation = 0.9 a Scale: 1 = very low; 2, 3, 4 = neutral; 5, 6, 7 = very high Case 1 The first case was conducted at Company 1, a medium-sized international software product company with approximately 100 employees. The average size of the project organization is about seven people. The main product is a large and complex software system, released twice a year, consisting of a major and a minor release. The key representatives assumed that the company uses approximately 0.9% of its annual budget for software process improvement, which is managed by a quality assurance (QA) team consisting of three people. The QA team holds meetings in which different kinds of problems based on their criticality are selected and processed. The problems are initially detected by interviewing different stakeholders, such as project managers and product owners. The company s earlier experiences in RCA were fairly insignificant. The target problem of the case was that the product releases are delayed due to a high number of software defects detected at the end of the development projects. The company has continuously tried to prevent the problem during recent years. The key representatives common opinion was that the problem is extremely complex and costly for the company. They claimed that the main problem causes are that the size of technical blocks in the software is too large and that employees attitudes are not fertile enough to develop high-quality software at once. Additionally, they assumed that increasing discipline among the developers and releasing the software in shorter cycles would help in eliminating the target problem Case 2 The second case was conducted at Company 2, a medium-sized international software product company with approximately 450 employees. The company releases new software versions regularly and its products can be characterized as complex and model-based software. The key representatives assumed that approximately 1% of the annual budget of the company is used on software process improvement, which is divided into different levels of the company. While managers are asked to use 5 10 min daily to think about how the software process could be improved, the developers and requirements engineers are involved in process improvement meetings on a regular basis. Additionally, all detected defects are prioritized on a daily basis by a group of people. The company used RCA earlier by applying a five times why practice in process improvement meetings. The target problem of the case was that blocker-type defects are detected after the product releases, which increases the costs of redevelopment. The company has recently reacted to this problem by setting a clear goal to lower the number of defects detected by the customers. The key representatives characterized the target problem as very complex and including many different causes. The main causes for the target problem were believed to be the fact that new code is built on the old, low-quality code, too many different methods are used in the development work, and the lack of different hardware set-ups decreases the coverage of the software testing. They said that the problem could be best eliminated by refactoring the old code. They also believed that the problem is not very severe because the customers are currently highly satisfied Case 3 The third case was conducted at Company 3, a medium-sized international software product company with approximately 100 employees. The main product can be characterized as a highly configurable software service. The product is delivered for the customers through installation projects that occasionally include the development of new features. New software versions are released regularly. The key representatives assumed that the company uses approximately 3 5% of its annual budget on software process improvement, which is managed by a quality manager, assisted by a quality management system. The project teams use weekly meetings in which positive and negative project experiences are discussed. The company s earlier experiences with RCA were fairly low. The target problem of the case was that the installation projects are too challenging to be performed efficiently. It often follows that re-engineering has to be done because of unexpected defects caused by the complex software configurations and new development work during the projects. The company has not expended much effort to manage the target problem earlier. However, the key representatives stressed that the target problem has a

106 1054 T.O.A. Lehtinen et al. / Information and Software Technology 53 (2011) significant impact on their customer relationships and that it is very complex to prevent. They said that the main cause of the target problem is that the employees have too many different ways in which to perform a product installation. Additionally, the number of different stakeholders is too high with respect to the quality of communication between them. They also indicated that the target problem could be minimized by creating checklists and simplifying the installation process Case 4 The fourth case was conducted at Company 4, a medium-sized international software product company with approximately 110 employees. The main product can be characterized as a highly complex software system. The product is delivered to customers through complex integration projects where the product is configured into the software systems of the customers. The key representatives assumed that the company uses approximately 3 5% of its annual budget on software process improvement. The company s management team is responsible for writing process guidelines and for improving the software development process in general. Coding and testing teams are required to improve their daily work through regular process improvement meetings. The teams work together regularly. The company s earlier experiences in RCA were fairly insignificant. The target problem of the case was that the lead time of an issue is occasionally intolerably long, resulting in delays in projects. The company has not expended much effort to manage the target problem earlier. However, they have tried to improve communication between the stakeholders of the company. The key representatives valued the target problem as high because it has a severe financial impact. It follows that the projects are not finalized on time. They said that the main causes of the target problem are lack of communication between the stakeholders and the way the company is dividing resources between the issues. Usually, an issue with fairly low priority does not get enough resources. They concluded that preventing the target problem is not an easy task. This would require increasing face-to-face meetings, increasing the number of inspections, and allocating skilled project managers to be responsible for the issues. 5. Results In this section, we present the empirical results of the field studies. Section 5.1 presents the effort used of the cases. Section 5.2 presents the output of the ARCA method, and section 5.3 presents the feedback collected from the key representatives and case participants Effort used Table 4 presents the effort used and the number of case participants throughout the different steps of the ARCA method. In total, man-hours were required to conduct the cases. The required hours were mostly dependent on the number of case participants because both workshop sessions were time-boxed. The effort used increases with each additional case participant. Roughly a quarter of the total effort was used in RCA facilitatorspecific activities, whereas the rest was used in activities that included the case participants (see Table 4). An average of 10 h were used in step 1 (problem detection), 37 h were used in step 2 (root cause detection), 25 h were used in step 3 (corrective action innovation), and 12 h were used in step 4 (documentation of the results) Output of the method Table 5 presents the results of the method in the cases. The target problem causes were detected by the preliminary cause collection ( causes) and by the causal analysis workshop ( causes). The effort used was not fixed in the preliminary cause collection, whereas it was fixed in the causal analysis workshop. It seems that the number of detected causes in the preliminary cause collection was dependent on the effort used. The correlation between the effort used and the number of problem causes in the preliminary cause collection is positive. It is also larger than the correlation between the number of case participants and the number of problem causes in the preliminary cause collection. Our results indicate that a decreasing number of case participants detected an increasing number of causes in the causal analysis workshop. The duration of the workshop was fixed and the correlation between the effort used and the number of problem causes in the causal analysis workshop is negative. A total of 2 6 root causes were selected in the cases. Together with their sub-root causes, the selected root causes formed a set of root causes that was processed in the corrective action workshop. In each case, root causes were processed and corrective actions were developed. The processed root causes covered 10 45% of the total number of the detected target problem causes in each case (average = 25%). In case 2, the corrective action innovation step differed from those in the other cases. The corrective actions were developed by brainstorming each corrective action until a mutual understanding was found between the case participants. Thereafter, the next corrective action was developed, etc. All the other cases followed the brainwriting method, as presented in Section This modification in case 2 (choosing to brainstorm instead of brainwrite) was done because we wanted to test whether brainstorming or brainwriting would better fit our needs. By comparing the number of corrective actions between the cases, case 2 was determined to be less effective than the other cases (see Table 5). Additionally, the quality of the corrective actions was lowest in case 2 (see Fig. 3), as their feasibility was relatively low. Based on our observations, the brainstorming method was less effective than the brainwriting method because the people were not able Table 4 Effort used in the cases (h = hours) and the number of case participants (n) ( = RCA facilitator only). The step of the ARCA method Case 1 Case 2 Case 3 Case 4 Avg. Std h n h n h n h n h n h n Step 1 Problem definition meetings (startup) Step 2 Preliminary cause collection ( inquiry) Organizing the cause-effect diagram ( ) Causal analysis workshop Smartening up the cause-effect diagram ( ) Step 3 Root cause selection Corrective action workshop Step 4 Final report ( ) Total (h)

107 T.O.A. Lehtinen et al. / Information and Software Technology 53 (2011) Table 5 Results of the method. Case 1 Case 2 Case 3 Case 4 Avg. Std Step 2 Target problem causes from the preliminary cause collection Target problem causes from the causal analysis workshop Step 3 The number of selected root causes The number of processed root causes, including sub-root causes The number of corrective actions Case 1: 38 corrective actions Case 2: 13 corrective actions Case 3: 33 corrective actions Case 4: 40 corrective actions Fig. 3. Corrective actions of the cases (scales: 1 = low; 2, 3, 4, 5 = high). to speak simultaneously. However, case 2 also varied from the other cases with respect to the homogeneity of the case participants (see Table 3). Thus, perhaps some important viewpoint was missing in case 2 when developing the corrective actions resulting in more unfeasible results. We do not know whether this was caused only by the brainstorming method or the method and the other case settings simultaneously. High-quality corrective action is highly feasible and equally effective. Fig. 3 presents the impact and feasibility of the corrective actions per case as a scatter chart. In each case, every case participant evaluated the impact and feasibility of each corrective action to detect the highest-quality corrective actions, as presented in Section The evaluations were done using a numerical scale, comprised of integers between one and five. We calculated the averages of the evaluations for each corrective action. The corrective action that had the highest value of the multiplication between the average impact and the average feasibility was interpreted as the highest-quality corrective action. It is interesting that the proportion of the high-impact (avg. P 3) corrective actions was larger than the proportion of the low-impact (avg. < 3) corrective actions in each case. Instead, the proportion of the high-feasibility (avg. P 3) corrective actions was larger than the proportion of the low-feasibility (avg. < 3) corrective actions only in cases 1 and 4. It seems to be easier to develop high-impact corrective actions than to make them feasible Feedback of the case attendees This section presents the feedback of the case participants and key representatives. Table 6 summarizes the data from the query forms after the causal analysis and corrective action workshops. There, the steps of the root cause detection and the corrective action innovation are presented from three different perspectives. The first perspective is the easiness of the method. The second perspective is the usefulness of the method. The third perspective emphasizes the quality of the outputs of the ARCA method, including the comparison of the method to the current process improvement practices of the case companies. The results of the easiness and usefulness of the root cause detection step are combinations of multiple questions of the query forms (see Appendices C and D). The easiness of the root cause detection step is a combination of the factors easiness of organizing causes and easiness of detecting root causes. The usefulness of the root cause detection is a combination of the factors usefulness of the cause collection and usefulness of the method of root cause detection. The other results were rated with one question. The case participants experienced the corrective action innovation step as highly easy to use (avg. = 5.9), whereas the step of the root cause detection was experienced as only slightly easy to use (avg. = 4.7). The participants experienced that both of these steps are useful. They also experienced that correct target problem Table 6 Feedback of the case participants (N = the number of respondents, Avg. = average, Std = standard deviation, scale: 1 = very low; 2, 3, 4 = neutral; 5, 6, 7 = very high). Case 1 Case 2 Case 3 Case 4 All cases N Avg. Std N Avg. Std N Avg. Std N Avg. Std N Avg. Std Root cause detection Easiness Usefulness Correctness of detected causes Openness in communication Efficiency comparison to company practices Corrective action innovation Easiness Usefulness Impact of the CAs Feasibility of the CAs Importance of processed causes for target problem Importance of processed causes for product quality Openness in communication Efficiency comparison to company practices

108 1056 T.O.A. Lehtinen et al. / Information and Software Technology 53 (2011) causes were detected and that fairly feasible corrective actions that have a high impact on the target problem were developed. The communication in both steps was experienced as highly explicit. The impact of the corrective actions was evaluated to be generally higher than their feasibility. The case participants experienced that the processed root causes were important for both product quality and the target problem. Unfortunately, the evaluation of the importance of the processed root causes for the target problem was done only in cases 3 and 4. The case participants experienced the root cause detection step as a more effective method to detect new process improvement opportunities than their current process improvement practices (avg. = 5.2). Similarly, the corrective action innovation step was experienced as a more effective method to develop process improvement ideas (avg. 6.1). Table 7 summarizes the answers of the key representatives when they were interviewed after the cases. Our goal was to evaluate how they experienced the easiness and usefulness of the ARCA method and to include the effort used with respect to the output of the method under the evaluation. In general, it seems that the method was experienced as easy to use. On the other hand, organizing the causes was noted to be challenging (person 3b) and the assistance of the researchers made the method unnaturally easy to use (person 4). The key representatives unanimous opinion was that their companies should adopt the method and that the results were experienced as beneficial in contrast to the effort used. Additionally, they were not able to name any other method that could reach equally advantageous results with lower costs than our RCA method. They experienced that significant root causes were detected with respect to the target problem, and most of them stressed that, if implemented, the developed corrective actions would have a high impact in preventing the target problems. As an exception, it was noted that the corrective actions do not prevent the target problem, but they do help the company to improve their processes (person 2). 6. Discussion In this section, we answer the research questions and discuss our findings and possible threats to the validity of this study. In Section 6.1, we discuss the easiness and efficiency of the ARCA method in contrast to the current software process improvement practices of the case companies. In Section 6.2, we discuss the results of prior RCA studies and the feasibility of the ARCA method in contrast to the prior RCA methods introduced in Section 3.2. In Section 6.3, we discuss the validity of the conclusions based on the empirical results of this study Answering our research questions One of our goals was to evaluate the ARCA method by answering the following research questions: Is the ARCA method efficient? and Is the ARCA method easy to use? Here, we answer these questions by discussing how the case attendees evaluated the usefulness and easiness of the ARCA method and the quality of its output. Our results indicate that the effort required to use the ARCA method in similar case contexts is suitable. In Section 5.1, we showed that a total of man-hours were required to conduct the cases with 7 11 case attendees. The key representatives experienced that the effort used was suitable in terms of the output of the method, as presented in Table 7. Furthermore, the case participants experienced the method as useful (see Table 6). Additionally, they experienced that the method is a more efficient practice to detect new process improvement opportunities and to develop process improvement ideas than their current company practices (see Table 6). Respectively, the key representatives were unable to name any method as efficient as the ARCA method (see Table 7). This evaluation logically covered the current process improvement practices of the case companies. Hundreds of target problem causes were detected in the cases (see Table 5). The case participants experienced that the detected causes were correct (see Table 6) and the key representatives experienced that significant root causes were detected with respect to the target problems (see Table 7). These indicate that genuine and accurate target problem causes were detected. Our observations during the causal analysis workshops support this conclusion. In addition, the case participants experienced that the communication was highly explicit in the steps of the ARCA method (see Table 6). Many high-quality corrective actions were developed for the processed root causes (see Fig. 3). The processed root causes were experienced as highly important for the target problem and Table 7 Interviews of the key representatives. (Coding themes: E = method easiness, U = method usefulness, Q = output quality). Question Case 1 Case 2 Case 3 Case 4 Person 1 Person 2 Person 3a Person 3b Person 4 How easy and learnable is the method? Were the detected root causes significant with respect to the target problem? Do the corrective actions prevent the target problem? Would it have been possible to get the same results at lower costs using some other method? Should your company adopt the method? Easy to use and internalize. (E) Most of the causes were significant (Q) Yes, I think they do because they have a major impact on the processed root causes (Q) No. We wouldn t be able to get this many relevant corrective actions (U, Q) Yes, we should. This works (U) Easy in contrast to required effort and the output of the method (E, U, Q) As a general rule, yes. We have already reacted in one of the causes (Q) No, I think that the corrective actions don t prevent the problem, but they do help us to improve our processes (Q) The method didn t require much effort. However, there should be only one workshop session and I would drop the inquiry (U) Maybe, because this is an easy method with much potential. Additionally, the costs are low (E, U) Easy to use and learn (E) Yes, they were. They matched well with my conception (Q) Yes they do. We wouldn t even need to implement them all (Q) I don t believe that. I don t know any such method (U) I think we should adopt this method (U) It is fairly easy to use and learn. Organizing the causes was challenging (E) Yes they were. I already knew some of those (Q) I think that the corrective actions won t remove the problem completely, but they do have a major impact on the problem s sub-fields (Q) I think that better practice would mean smaller group size and more talented experts in the second workshop (U) I would gladly try this method again. Formal prioritization was nice (U) It was easy with the assistance of the researchers (E) Yes they were. The causes were mainly issues that lead the problem (Q) Yes, the impact would be enormous (Q) Maybe some other brainwriting method, where ideas are developed in literal form, could work as well (U) We should use this method, or at least a very similar one (U)

109 T.O.A. Lehtinen et al. / Information and Software Technology 53 (2011) product quality (see Table 6), covering an average of 25% of the detected causes (see Table 5). The case participants experienced that feasible corrective actions that have a high impact on the target problem were developed (see Fig. 3, Table 6), and the key representatives stressed that the impact of the corrective actions in preventing the target problem would be high (see Table 7). These results indicate that important target problem causes were processed and high-quality corrective actions were developed to prevent them. The results in Section 5.3 showed that, in general, the key representatives and the case participants experienced the ARCA method also as easy to use (see Tables 6 and 7). However, in each case, it was challenging to get a clear overview of the cause-effect diagram due to its enormous size. Therefore, it was also challenging to detect all the different effects to which a given cause was related. It was not a surprise that case participants evaluated the easiness of the root cause detection step with lower scores (avg. = 4.7) than the corrective action innovation step (avg. = 5.9). Maybe it is that organizing hundreds of target problem causes is more challenging than listing dozens of new ideas. It is likely that the case participants were also more familiar with the corrective action development practices, whereas analyzing the target problem causes systematically was something new for them. In contrast to the prior process improvement practices of the case companies, we believe that the ARCA method is an efficient method to detect new software process improvement opportunities and to develop process improvement ideas. Our results additionally indicate that the ARCA method is relatively easy to use and learn Comparison to prior works Some of our results follow the prior RCA studies. Grady [8] indicates that 7 h of team work is the minimum cost of conducting a non-recurring RCA method, whereas Mayes [6] indicates that the required effort to conduct an RCA method consists of 4 7 developers participating in a kickoff and a causal analysis meeting, each lasting 2 h, and 8 10 action team members using 10% of their time for action team duties. Considering the target problem causes and the impact of the corrective actions, Card [15] discusses an RCA case where a total of 100 target problem causes were detected. There, the cause collection was conducted in a meeting to a certain extent similar to the causal analysis workshop of the ARCA method (see Section 3.2.2) resulting in an average of 110 target problem causes (see Table 5). Card [15] also presents quantitative evidence on the impact of the corrective actions developed through an RCA method in two software organizations. He claims that, when the DCA method (see Table 1) was used to prevent software defects, the impact of the corrective actions was enormous, resulting in a 50% decrease in the defect rates [15]. This indicates that focusing the software process improvement effort on the target problem causes probably decreases the likelihood of the target problem reoccurrence and, thus, slightly supports the evaluations of the case attendees on the impact of the corrective actions developed in our cases. We noted that organizing the target problem causes is challenging. Other studies have faced similar problems. Usually, too many target problem causes are detected [5] and, overall, the causal analysis mechanism is qualitative and labor-intensive [8]. We believe it is important to use such a cause-effect diagram that makes organizing the target problem causes as easy as possible. Using a directed graph is currently a good candidate for this [4]. The prior RCA methods would have been less feasible in the cases of our field studies than the ARCA method. The DCA [15] and Proact RCA [3] methods are not as adaptable for various target problems as the ARCA method because they require accurate and reliable problem reports available for problem sampling, including a separate problem classification scheme for each target problem type [15]. Additionally, these RCA methods require heavy startup investments in problem classification scheme definitions, procedure setup, establishment of data collection mechanisms, and personnel training [15]. Our industrial partners would not have stood for such startup investments. The required startup effort of the ARCA method is relatively low, as it includes only the personnel learning. The RCA method presented by Rooney and Vanden Heuvel [2] would have required that potential problem causes are collected before the method can be even conducted. There obviously is a challenge in detecting the correct target problem causes in advance, as the target problems vary. Additionally, there would have been a problem of detecting too many problem causes [5], making the method highly difficult to use. We believe it is important to utilize both anonymous and public work practices when preventing cross-functional and complex target problems (see Section 2.2). This is not supported in any of the prior RCA methods. The PIC method [17] relies only on interviewing techniques, whereas the DCA [15] and Proact RCA [3] methods emphasize only meetings. In the prior RCA methods, there is also a problem of duplicating the same cause multiple times in the cause-effect diagram. Using a fishbone diagram [15], a logic tree [3], a list [17], a worksheet [17], or a chart [17] does not support references between the target problem causes, whereas the directed graph of the ARCA method (see Fig. 2) supports it. Finally, in the prior RCA methods, there is very little practical guidance on how to develop corrective actions Evaluation of the research This section discusses the validity of our empirical results using a validation scheme presented by [35]. We will present the construct validity in Section 6.3.1, the external validity in Section 6.3.2, and the reliability of the study in Section It should be mentioned that there is a fourth aspect of validity, called internal validity. However, even though it represents an important aspect, it is of concern only when the causal relations of the measured factors are examined [35]. Thus, this aspect is excluded here Construct validity Construct validity reflects the extent to which the studied operational measures really represent what is investigated according to the research questions [35]. In this study, these are the measurements, query forms, and interviews that were carried out to evaluate the ARCA method. We believe that high-quality corrective action has a high impact on the target problem, but, simultaneously, it is highly feasible. There is a threat to the construct validity regarding the evaluations of the quality of the corrective actions developed in the cases. The analyses were based on experiential evaluation of the case attendees only, not on monitoring the target problems afterward. Therefore, we do not know how many of the corrective actions were actually implemented, nor whether or not they had an impact in deterring the reoccurrence of the target problems. Generally, it should be noted that this sort of validity problem is common, as it is practically impossible to separate the effects of the RCA method from the company-specific context factors. As the analyses of the impact and feasibility of the corrective actions are on uncertain ground, similarly is the conclusion on the suitability of the effort used. It is challenging to estimate whether the effort used was suitable or excessive, as there was no real evidence on either the costs required to implement the corrective actions or on their impact. As the only source was the opinions of the case attendees, the analyses of the usefulness of the ARCA method

110 1058 T.O.A. Lehtinen et al. / Information and Software Technology 53 (2011) (see Table 6) and suitability of the effort used with respect to the output of the method (see Table 7) are unreliable. We compared the efficiency of the ARCA method to the current process improvement practices of the case companies, as the case attendees were asked to evaluate that. Unfortunately, we did not do such a comparison with the easiness of the ARCA method and, thus, the related conclusions are based solely on the personal experiences of the case attendees. Additionally, it is highly possible that some of the case participants were not experienced enough with the current process improvement practices of the case companies and, thus, their answers skewed our results. Fortunately, the key representatives were competent to perform such an evaluation, which increases the validity of the results. In addition, they performed this evaluation not only in the query forms but also in interview External validity External validity is concerned with whether it is possible to generalize the findings of the study and to what extent they can be generalized [35]. In this study, this means that are our results also valid in other case contexts. All of the cases varied and, thus, considered the ARCA method from different perspectives. Though the cases were conducted at four different companies, all with different case attendees and target problems, and though the interviews slightly differed between the cases, the results collectively confirmed the suitability of the ARCA method for medium-sized software companies where prior experiences with RCA are relatively insignificant. We believe that the results of this study can be generalized for similar case contexts. The lack of comparison to prior RCA methods creates a severe threat to external validity. So far, we cannot conclude whether or not the ARCA method is truly efficient and easy to use in contrast to the prior RCA methods, as we were not able to compare its results extensively to those methods. The main cause for this is that no such prior results are publicly available and our field studies did not cover a case where the case attendees are highly experienced with prior RCA methods. We did not pilot those methods, either. We did an analytical argumentation on the selections we made during the ARCA method design and, when possible, we presented similar results from the prior works. However, our conclusions based on these are likely incomprehensive and inaccurate Reliability Reliability is concerned with the extent to which data and analysis are dependent on a specific researcher [35]. Considering the reliability of our results, the fact that the researchers steered the cases with the key representatives (see Section 4.1.4) was both a strength and a weakness. The strength was that it made the cases more comparable, as almost everything was done similarly in the cases. On the other hand, the weakness was that the collected research data was partially bounded by the researchers contributions. If the company people had tried to apply the method based only on the written instructions (see Sections ), the evaluations of the effectiveness and easiness of the method could have been entirely different. We believe that the experience of the RCA facilitator has a great impact on the ARCA method output. Additionally, as we were a third party from the case attendees point of view, it is possible that they were more or less willing to contribute in the cases than if the cases had been steered only by the company personnel. It is also possible that the high motivation and personal characteristics of the researchers spread to the participants, which had an impact to their motivation and open communication in the cases. As the total number of the case attendees was only 30, the conclusions based on their feedback are on unreliable ground. Thus, our results should not be used to seek significant correlations between the work phases of the ARCA method and the feedback of the case attendees. Additionally, the small number of interviewees and cases likely skewed the interpretation of the results. 7. Conclusions and future work It is argued that the key for effective problem prevention is to know why a problem occurs [2]. Unfortunately, in software engineering, there is very little practical knowledge on how the problem causes can be detected and prevented and what that requires. Our goal was to develop a lightweight RCA method and evaluate it through industrial field studies to introduce how problem causes can be detected and how the related corrective actions can be developed, as well as how much effort the RCA method requires and how the case attendees experience it. This paper makes three contributions. First, we developed and introduced a lightweight RCA method, named the ARCA method. The ARCA method consists of four steps, i.e., target problem detection, root cause detection, corrective action innovation, and documentation of the results. Unlike the prior RCA methods applied in the software industry [5,8,13 15], the ARCA method does not require heavy startup investments and problem reports to detect its target problem. Instead, our method utilizes a focus group meeting to detect the target problem, making the method simultaneously highly adaptable for various target problems. Second, we applied the ARCA method at medium-sized software product companies. This differs from the prior RCA studies that have investigated the use of RCA methods in large-company contexts [5,8,13 15] or student experiments [4]. In small and medium-sized software companies, the RCA method needs to be lightweight, as there is little possibility for separate resources for the RCA investigation, while there are more possibilities in a large company [6,8]. We also see that applying RCA to real industrial problems rather than the toy problems that are often used in student experiments consolidates the ARCA method in its true context. Third, we provided empirical results of the usefulness, easiness, and output quality of the ARCA method, including the effort used in the cases. In prior works, such data is often missing. For example, in [15], the costs of the RCA method are reported only as a percentage of the yearly development budget instead of more concrete man-hours, as we did. Furthermore, the general satisfaction of the case attendees is not reported in any of the prior studies. We did that using interviews and query forms. In contrast to the current process improvement practices of the case companies, the ARCA method was experienced as efficient. The effort of applying the method (89 man-hours, on average) was concluded to be suitable considering the value of the results. We showed in Sections 5.2 and 5.3 that the developed corrective actions were evaluated as fairly feasible and effective, having a high impact on the target problems. The case participants experienced that the steps of the root cause detection and corrective action innovation are both useful, and the key representatives experienced that it would not have been possible to get the same results with lower costs using any other method they knew. The method was generally experienced as easy to use. However, as an exception, organizing the detected causes was experienced as challenging due to the high number of detected causes. We collected 757 target problem causes and 124 related corrective actions using RCA in the cases of this study. Analyzing the similarities between the target problem causes is part of our future work. The similarities between the developed corrective actions should be analyzed, as well. These would better help us to understand how the software companies try to prevent their problems and what types of related root causes exist.

111 T.O.A. Lehtinen et al. / Information and Software Technology 53 (2011) Finally, to increase the validity of the study, the ARCA method needs to be used in different types of contexts, e.g. in software companies with extensive experiences with prior RCA methods. This also means that software companies should adopt and apply the ARCA method repeatedly. Appendix A. Questions asked in interview 1 (group interview) Part 1 1. How many employees work in your company? 2. How is problem prevention organized in your company? 3. How much effort does your company expend on software process improvement (SPI)? 4. What are the stakeholders attending to SPI in your company? 5. How does your company try to avoid quality deviations? 6. How are quality deviations detected in your company? 7. Are quality deviations other than software defects recorded? 8. How does your company react to quality deviations? 9. Are the causes of the quality deviations detected? 10. If so, how it is conducted and how many people are included in the analysis? 11. And if so, what stakeholders are present (developer, testers, designers, sales)? Part 2 1. How much effort do you think your company has used to prevent the target problem previously? How it is done? 2. In an economic sense, how significant is the target problem for your company? 3. How complex is the target problem and how would you characterize it? Appendix B. Questions asked in interview 2 Part 3 1. Would it be easier to detect the same causes just by listing them generally? 2. Were the detected root causes significant compared to the target problem? 3. Were major deficiencies detected or were they more minor problems? Part 4 1. Would it have been possible to develop similar process improvement ideas without root cause detection just by innovating generally in how you could improve your activities? 2. Would it have been possible to get the same results at a lower cost using some other practice? 3. In general, do the corrective actions prevent the problem? 4. Are the corrective actions feasible? 5. What is the impact of the corrective actions for other problems in your company? Part 5 1. How easy and learnable is the RCA method? 2. Compared to the effort used, how would you characterize the feasibility of the RCA method? 3. Should your company adopt the RCA method? 4. What are the most relevant challenges in the RCA method that make it unfeasible for your company? Appendix C. Questions asked on feedback form 1 1. The target problem Answer the questions by giving a value [1 = very low; 2, 3, 4 = neutral; 5, 6, 7 = very high] that corresponds to the question best. Impact of the target problem for the quality of the product Adverse effect of the target problem on my daily work Difficulty of preventing the target problem Effort the company used to try to prevent the target problem earlier Impact of the target problem for the end users of the product Impact of the target problem on customer relationships Internal impact of the target problem for the company My experience of the technical causes of the target problem My knowledge of the impact of the target problem for the end users of the product 2. The quality of the causes and root causes Answer the questions by giving a value [1 = very bad; 2, 3, 4 = neutral; 5, 6, 7 = very good] that corresponds to the question best. Usefulness of the cause collection Usefulness of the method of root cause detection Easiness of detecting the root causes Ability of the method to detect new process improvement opportunities in contrast to the current state of the practices of your company Correctness of the detected causes Correctness of the detected root causes Easiness of solving the detected root causes Openness of the communication in this first workshop session 3. Your duty in your company: 4. Select the roles that best describe your responsibility in the company: I am a manager I am a developer I am a tester I am a salesman I am a trader Something else: 5. How would you improve the RCA method? Appendix D. Questions asked on feedback form 2 1. How much time you used to propose and evaluate the root causes to be processed before this workshop session: 2. Were the processed root causes the most important with respect to the target problem? (Select one of the following) Absolutely YES More than YES Yes

112 1060 T.O.A. Lehtinen et al. / Information and Software Technology 53 (2011) Neutral No More than NO Absolutely NO 3. Were the processed root causes the most important with respect to the quality of the product? (Select one of the following) Absolutely YES More than YES Yes Neutral No More than NO Absolutely NO Easiness of the corrective action development method Feasibility of the corrective action innovation method Ability of the method to develop process improvement ideas in contrast to the current state of the practices of your company Impact of the corrective actions on the target problem Feasibility of the corrective actions If implemented, the impact of the corrective actions for your company, in general Openness of the communication in this second workshop session 6. How would you improve the RCA method? Appendix E. Template for the corrective actions 4. Were the processed root causes easy to eliminate? (Select one of the following) Absolutely YES More than YES Yes Neutral No More than NO Absolutely NO 5. The method used to develop the corrective actions Answer the questions by giving a value [1 = very bad; 2, 3, 4 = neutral; 5, 6, 7 = very good] that corresponds to the question best. References [1] M. Kalinowski, G.H. Travassos, D.N. Card, Towards a defect prevention based process improvement approach, in: Proceedings of the 34th EUROMICRO Conference on Software Engineering and Advanced Applications, Parma, Italy, 2008, pp [2] J.J. Rooney, L.N. Vanden Heuvel, Root cause analysis for beginners, Quality Progress 37 (7) (2004) [3] R.J. Latino, K.C. Latino (Eds.), Root Cause Analysis: Improving Performance for Bottom-Line Results Broken Sound Parkway NW, Suite 300 Boca Raton, CRC Press, FL, [4] F.O. Björnson, A.I. Wang, E. Arisholm, Improving the effectiveness of root cause analysis in post mortem analysis: a controlled experiment, Information and Software Technology 51 (1) (2009) [5] P. Jalote, N. Agrawal, Using defect analysis feedback for improving quality and productivity in iterative software development, in: Proceedings of the

113 T.O.A. Lehtinen et al. / Information and Software Technology 53 (2011) Information Science and Communications Technology (ICICT 2005), 2005, pp [6] R.G. Mays, Applications of defect prevention in software development, IEEE Journal on Selected Areas in Communications 8 (1990) [7] S.O. Al-Mamory, H. Zhang, Intrusion detection alarms reduction using root cause analysis and clustering, Computer Communications 32 (2) (2009) [8] R.B. Grady, Software failure analysis for high-return process improvement decisions, Hewlett-Packard Journal 47 (4) (1996) [9] M. Siekkinen, G. Urvoy-Keller, E.W. Biersack, D. Collange, A root cause analysis toolkit for TCP, Computer Networks (2008) [10] A. Traeger, I. Deras, E. Zadok, DARC: Dynamic Analysis of Root Causes of Latency Distributions, SIGMETRICS 08, Annapolis, Maryland, USA, 2008, pp [11] T. Stålhane, Root Cause Analysis and Gap Analysis A Tale of Two Methods, EuroSPI 2004, Trondheim, Norway, 2004, pp [12] I. Bhandari, M. Halliday, E. Tarver, D. Brown, J. Chaar, R. Chillarege, A case study of software process improvement during development, IEEE Transactions on Software Engineering 19 (12) (1993) [13] M. Leszak, D.E. Perry, D. Stoll, A case study in root cause defect analysis, in: Proceedings of the 2000 International Conference on Software Engineering, 2000, pp [14] A. Gupta, J. Li, R. Conradi, H. Rönneberg, E. Landre, A case study comparing defect profiles of a reused framework and of applications reusing it, Empirical Software Engineering 14 (2) (2008) [15] D.N. Card, Learning from our mistakes with defect causal analysis, IEEE Software 15 (1) (1998) [16] B. Andersen, T. Fagerhaug (Eds.), Root Cause Analysis: Simplified Tools and Techniques. United States, Milwaukee 53203: Tony A. William American Society for Quality, Quality Press, [17] M. Ammerman, The Root Cause Analysis Handbook: A Simplified Approach to Identifying, Correcting, and Reporting Workplace Errors. 444 Park Avenue South, Suite 604, Productivity Press, New York, NY 1016, USA, [18] D.N. Card, Defect-causal analysis drives down error rates, Quality Time 10 (4) (1993) [19] I. Burnstein, Practical Software Testing, Springer Science + Business Media, New York, [20] Z.X. Jin, J. Hajdukiewicz, G. Ho, D. Chan, Y. Kow, Using root cause data analysis for requirements and knowledge elicitation, in: International Conference on Engineering Psychology and Cognitive Ergonomics (HCII 2007), Berlin, Germany, 2007, pp [21] A.D. Livingstone, G. Jackson, K. Priestley, Root Causes Analysis: Literature Review, Health & Safety Executive, Contract Research Report 325, 2001, pp [22] A.R. Hevner, S.T. March, J. Park, S. Ram, Design science in information systems research, MIS Quarterly 28 (1) (2004) [23] S.T. March, G.F. Smith, Design and natural science research on information technology, Decision Support Systems (15) (1995) [24] S. Kavadias, S.C. Sommer, The effects of problem structure and team diversity on brainstorming effectiveness, Management Science 55 (2009) [25] J.J. Rooney, L.N. Vanden Hauvel, Collecting data for root cause analysis, Quality Progress 36 (11) (2003) 104. [26] R.L. Glass, Project retrospectives, and why they never happen, IEEE Software 19 (2002) [27] A. Burr, M. Owen (Eds.), Statistical Methods for Software Quality: Using Metrics for Process Improvement, ITP A Division of International Thomson Publishing Inc, [28] W.J. Stevenson (Ed.), Operations Management, McGraw-Hill/Irwin, New York, [29] T.C. Lethbridge, S. Elliott Sim, J. Singer, Studying software engineers: data collection techniques for software field studies, Empirical Software Engineering 10 (2005) [30] S. Wagner, Defect classification and defect types revisited, in: Proceedings of the 2008 Workshop on Defects in Large Software Systems (DEFECTS 08), Seattle, Washington, USA, 2008, pp [31] M.V. Mäntylä, J. Itkonen, J. Iivonen, Who tested my software? Testing as an organizationally cross-cutting activity, Software Quality Journal, submitted for publication. [32] R. Chillarege, I.S. Bhandari, J.K. Chaar, M.J. Halliday, D.S. Moebus, B.K. Ray, M. Wong, Orthogonal defect classification a concept for in-process measurements, IEEE Transactions on Software Engineering 18 (11) (1992) [33] S.W. Gursimran, C.C. Jeffrey, A systematic literature review to identify and classify software requirement errors, Information and Software Technology 51 (7) (2009) [34] R.K. Yin (Ed.), Case Study Research: Design and Methods, Sage Publications, United States of America, [35] P. Runeson, M. Höst, Guidelines for conducting and reporting case study research in software engineering, Empirical Software Engineering (14) (2008) [36] W. Foddy (Ed.), Constructing Questions for Interviews and Questionnaires, Cambridge University Press, Hong Kong by Colorcraft, 1994.

114

115 Article II II What are problem causes of software projects? Data of root cause analysis at four software companies Timo O.A. Lehtinen and Mika V. Mäntylä Proceedings of International Symposium on Empirical Software Engineering and Measurement, 2011, Pages IEEE. Reprinted with permission.

116

2011 International Symposium on Empirical Software Engineering and Measurement What Are Problem Causes of Software Projects? Data of Root Cause Analysis at Four Software Companies Timo O.A. Lehtinen Department of computer science and engineering Aalto University School of Science Espoo, Finland timo.

117 2011 International Symposium on Empirical Software Engineering and Measurement What Are Problem Causes of Software Projects? Data of Root Cause Analysis at Four Software Companies Timo O.A. Lehtinen Department of computer science and engineering Aalto University School of Science Espoo, Finland Mika V. Mäntylä Department of computer science and engineering Aalto University School of Science Espoo, Finland Abstract Root cause analysis (RCA) is a structured investigation of a problem to detect the causes that need to be prevented. We applied ARCA, an RCA method, to target problems of four medium-sized software companies and collected 648 causes of software engineering problems. Thereafter, we applied grounded theory to the causes to study their types and related process areas. We detected 14 types of causes in 6 process areas. Our results indicate that development work and software testing are the most common process areas, whereas lack of instructions and experiences, insufficient work practices, low quality task output, task difficulty, and challenging existing product are the most common types of the causes. As the types of causes are evenly distributed between the cases, we hypothesize that the distributions could be generalizable. Finally, we found that only 2.5% of the causes are related to software development tools that are widely investigated in software engineering research. Key words: Root Cause Analysis, Problem Prevention, Software Process Improvement, Grounded Theory I. INTRODUCTION The discipline of software engineering was born in 1968 due to problems in software projects [1]. The key for effective problem prevention is to know why the problem occurs [2]. Problems and challenges of software engineering have been introduced, e.g., Demir [3] indicates that scope management, requirements management, estimation, and communication are usual areas of challenges. Unfortunately, the causes of these challenges have not been comprehensively presented. There is only little value to know the problems in contrast to the value of understanding what causes them. Root cause analysis (RCA) is a structured investigation of a problem to detect the causes that need to be prevented [4]. RCA takes the problem as an input and provides a set of problem causes as an output. It states what the problem causes are, in addition, where they occur. This helps with software process improvement in various contexts and across all software organizations, including product development, hardware design, product engineering, and manufacturing [5]. In some prior RCA studies, the causes of defects have been presented. Card [6] indicates that the defect causes are related to the methods, people, input, and tools, but his classification is quite coarse-grained and is lacking the software process dimension completely. Grady [7] states that the top eight causes of defects are specifications, user interface, error checking, hardware interface, software interface, logic, data handling, and standards. Grady s classification on the other hand sees causes from a technical perspective but does not go beyond that, e.g., this line of code has the error vs. why does this line of code have error whose symptoms are visible in the released product. In this work, we want to understand problem causes from wider than just the technical perspective that Grady provides, furthermore, we want to provide more details than Card provides and we want to map the problem causes to the process dimension. A final difference to prior work is that the high number of particular types of software defects is not the only target problem that should be analyzed, e.g., negative project experiences, delayed product releases, and challenging product installations are all industrially relevant and severe problems but have only been exiguously explored using RCA [4]. In our previous paper [4], we presented the development and evaluation of ARCA, an RCA method, in terms of effort, usefulness and ease of use. The ARCA method consists of four steps, i.e., target problem detection, root cause detection, corrective action innovation, and documentation of the results [4]. In this paper, we introduce a detailed classification system for the detected causes of the ARCA method, that we developed by using the grounded theory approach introduced in [8]. Our classification system is based on a literature review and causes of four industrial RCA investigations focusing on complex software engineering problems. The classification system is thereafter used to show what types of causes were detected and where in the development processes they occurred. We discuss the similarities and dissimilarities of the causes and show what types of causes were common between the cases and what were not. This paper makes two important contributions as it introduces the output of the ARCA method and /11 $ IEEE DOI /ESEM

118 simultaneously creates hypotheses on the challenges of software engineering for future research. The research goal is as follows: study the causes of complex software engineering problems by applying grounded theory to the target problem causes detected in four medium-sized software companies. The research aimed to answer the following questions: RQ1: What types of causes are related to the target problems of cases? In the context of this study, it describes what the causes are, e.g., wrong working methods, lack of instructions, and challenging existing product. RQ2: In which process areas the causes of the cases can be mapped? Every cause occurs somewhere. In the context of this study, this means the development processes wherein the causes occurred. The causes are not isolated, instead, they are divided between the software processes, e.g., the causes of a late product release are not occurring in the development work only but also in the requirements engineering and software testing. II. METHODOLOGY The cause data was collected in industrial field studies [9] by using the ARCA method [4] in four medium-sized software product companies (100 to 450 employees) located in Finland. The target problem of the ARCA method was defined in a focus group meeting by the key representatives of the case company, who also selected the case participants. The common high-level goal of the companies was to understand why their software projects are delayed and how to avoid that. As recommended in the ARCA method, the target problem causes were detected through an anonymous inquiry followed by a causal analysis workshop, which is a meeting where the case participants write down the target problem causes and present them to others. The causes were organized to a cause-effect diagram [4]. The cause data was analyzed by creating two classification schemes to classify the causes. The development of the schemes was done in iterations. We started by a literature review on software engineering root cause analysis to conclude what kind of cause classification schemes have been previously introduced [6, 7, 10]. Thereafter, we created a preliminary classification scheme for both the types and related process areas of causes. Third, we combined the preliminary classification schemes to the grounded theory approach [8] and classified samples of the causes of the cases. During this step, we modified the preliminary classification schemes to create finalized classification schemes that actually correspond the causes of the cases. Finally, we applied the finalized classification schemes to all causes of the cases. III. RESULTS A. Classification Schemes This section introduces the classification schemes that were developed in this study. The first scheme describes what types of causes were detected and the second scheme describes what software engineering process areas were affected. There are three important terms used in this study. The process area describes where the cause occurs, e.g. requirements engineering or software testing. The type describes what the cause actually is, e.g., lack of instructions and experiences or lack of monitoring. The class means a set of similar types of causes, i.e. people, tasks, methods, and environment. It describes on general level what types of causes were detected in the cases and makes it possible for us to compare our results to the results of the prior studies of RCA [6, 10]. TABLE I. introduces the classification scheme used to describe the process areas of the causes. These process areas are similar to the ones found in software engineering process literature. The list of process areas was created based on our initial understanding of common software process steps, and it was refined by the data analysis. If we compare the process areas to commonly recognized software processes such as RUP [11] or the waterfall model [12] we can see several similar steps such as requirements engineering, testing, change management, product release and deployment. However, there are also some differences. First, we have merged software implementation and software design under a process area called development work. It would not have been feasible to separate whether technical problems of the product were due to poor design or implementation, because our cause data did not support such a division. Another difference is that we have a process area called management that gathers causes such as insufficient resource allocation, bad estimates, poor prioritization decisions, and bad organizational culture. Such issues undoubtedly are causes for problems in software projects, but they cannot be placed under other process areas. Thus, the management process area is needed to enable descriptive and honest presentation of the causes. The final difference to commonly recognized process areas is the Unknown process area, which includes causes that cannot be classified into any other process area, e.g., laziness. TABLE I. Process area Requirements engineering, Re Management, Ma Development Work, Dw Software Testing, St Change Management, Cm Product Release and Deployment, Pd Unknown, Un THE PROCESS AREAS OF CAUSES Description Causes are focused on the requirements engineering and input from customers. Causes are focused on the company support and the way the project stakeholders are managed and allocated to tasks. Causes are focused on implementation of features and its output. Causes are focused on software testing and its output. Causes are focused on implementation of change requests. Causes are focused on installing and releasing the product. Causes that cannot be focused on any specific process area. 389

119 TABLE II. presents the classification scheme used to describe the types of causes, which are additionally organized under four classes: people, tasks, methods, and environment. In prior works, Jalote [10] and Card [6] present a similar coarse-grained classification of cause classes. Unfortunately, these classification schemes are too general as they do not go under the classes. We wanted to extend these classification schemes to provide more details of the problem causes. Thus, we added the type level. For the type level there was no prior work, thus, it is completely based on our analysis of the problem causes. TABLE II. Class / Type People, P Instructions and Experiences Values and responsibilities Co-operation Policies Tasks, T Task Priority Task Output Task Difficulty Methods, M Work Practices Process Monitoring Environment, E Existing Product Resources and Schedules Tools Customers THE CAUSE CLASSES AND RELATED TYPES OF CAUSES Description This class includes the people related causes This type includes causes of missing documentation and lack of experience. The needed documentation is missing or inaccurate, and the lack of experience complicates the work. This type includes causes of bad attitude and lack of taking individual responsibility. The people do not care about important things and they look out for number one. This type includes causes of inactive, inaccurate, and missing communication between the stakeholders. The people do not communicate actively or share knowledge on their own will. This type includes causes of not following the company policies. This class includes the task related causes This type includes causes of task priority. The priority is missing, wrong, or too low. This type includes causes of low quality task output. In our terminology the task is a general term which corresponds the tasks of all stakeholders, e.g. the managers may do inadequate resource allocation whereas the developers may do bad code, etc. This type includes causes of challenging tasks. The task requires too much effort, time, or it is too difficult. This class includes the methodological causes This type includes causes of lack of current working methods. The method is missing or inadequate. This type includes causes that are focused on the current operations model. The model is unclear, vague, too heavy, or inadequate. This type includes causes of lack of monitoring. The management does not know the project status caused by the lack of monitoring the progress. This class includes the environment related causes This type includes causes of the existing product, which is too complex and the old low-quality code creates challenges. This type includes causes of wrong resources and schedules. This type includes causes of missing or insufficient tools. This type includes causes of customer requests and users expectations and needs. The people class includes types of causes that correspond to human aspects. The tasks class includes types of causes that correspond to the causes that were closely related to implemented tasks. The methods class includes types of causes that correspond to the causes of wrong working methods. The environment class describes the type of causes that are related to external settings of the work. The detailed types of causes including their descriptions are placed under each class as can be seen in TABLE II. B. Cause Distributions TABLE III. summarizes the types of target problem causes and shows how they divide into the software processes. We also report the totals of causes for each process area and type. Next to the totals is the standard deviation between cases. This is reported to help in analyzing the external validity of the cause distribution. High standard deviation indicates that the cause distribution is highly affected by the case context. Low standard deviation between cases suggest that the distribution could be generalizable, but with only four cases it is only possible to draw initial hypotheses. It should be noted that when looking at the standard deviation one should always contrast it with the total average. The lack of instructions and experiences included the highest number of causes (18.1 %), which was mainly divided into the requirements engineering, development work, software testing, and product release and deployment. The wrong work practices included the second highest number of causes (15.7 %), which was mainly divided into the software testing, development work, management, and product release and deployment. Looking at the deviations we can see that shares of Instructions and Experiences could be generalizable (deviation 3.4 units from total share of 18.1%), but that shares of existing product do not seem generalizable (deviation 7.8 from the total share of 8.5%) From the process perspective, the software testing (23.1 %) and development work (22.6 %) included the highest number of causes. The causes of software testing divided mainly into the wrong work practices, lack of instructions and experiences, insufficient task output, task difficulty, and wrong resources and schedules. The causes of development work were mainly similar to those in the software testing, but the existing product was more often referred (2.5 %) whereas the insufficient task output was less often referred (0.9 %). The deviations between cases are higher in the process areas than it is in the types of causes. From the recognized process areas, only the development work process area has a low standard deviation (7.9) in comparison to the total share of causes (22.6%). Thus, we can hypothesize that shared causes per process area are more dependent on the case context than the type of causes, which seem more general. C. Limitations As the total number of cases was only four, the results need to be validated by further studies. However, in contrast to prior studies [6, 7, 10] our results are based on more than 390

120 TABLE III. PERCENTAGES OF THE TYPE OF CAUSES IN SOFTWARE PROCESS AREAS (A TOTAL OF 648 CAUSES) Cause type, Class Process Area Re Ma Dw St Cm Pd Un Total Std* Inst. and Exp., P Work Pr., M Task Output, T Task Difficulty, T Existing Pr., E Res. and Sch., P Val. and Resp., P Process, M Policies, P Co-operation, P Customers, E Tools, E Task Priority, T Monitoring, M Total Std* Std* = deviation of % units between the cases one case and thus are more externally valid than they are. Effect of the case context, both the company context and the chosen RCA focus is likely to be high. The deviation between the cases varied between process areas and types. The classification scheme was jointly developed and partly based on the existing literature. The classification of the causes was done only by the first author, which increases the possibility of the researcher bias. We plan to address this in our future work on this topic. IV. CONCLUSIONS In this paper we have created a two-dimensional classification of software problem causes based on four industrial RCA field studies resulting in 648 causes. The first dimension of the classification is based on common software engineering process areas. The second dimension describes the type of causes and it extends prior works of software engineering root cause analysis [6, 7, 10] by giving more detailed types under the general classes of people, tasks, methods, and environment. Our classification is useful for understanding problem causes as it highlights both the process areas where improvements should be made and also the types of improvements that need to be made, e.g. do we have a problem with tools or work practices. We have also presented a distribution of causes with our two-dimensional classification system. In it, we found that instructions and experiences was the most common cause type followed by insufficient work practices. It is interesting to note that tools were mentioned in only 2.5% of the causes, although a great deal of software engineering research is focused on building new tools. In the software process dimension the process areas with most causes were development work and software testing. However, the deviation between the cases was higher in the process area dimension. Therefore, we believe that case context and focus has a larger effect on the process area of the causes compared to the types of causes. V. REFERENCES [1] P. Naur and B. Randel, Software engineering: A report on a conference sponsored by the NATO science committee, Nato, [2] J. J. Rooney and L. N. Vanden Heuvel, Root cause analysis for beginners, Quality Progress 37 (7) (2004) [3] K. A. Demir, A survey on challenges of software project management, Proceedings of the 2009 International Conference on Software Engineering Research Practice, 2009, pp [4] T. O. A. Lehtinen, M. V. Mäntylä and J. Vanhanen, Development and evaluation of a lightweight root cause analysis method (ARCA method) field studies at four software companies, Information and Software Technology 53 (10) (2011) [5] R. G. Mays, Applications of Defect Prevention in Software Development, IEEE Journal on Selected Areas in Communications 8 (1990) [6] D. N. Card, Learning from our mistakes with defect causal analysis, IEEE Software 15 (1) (1998) [7] R. B. Grady, Software failure analysis for high-return process improvement decisions, Hewlett-Packard Journal 47 (4) (1996) [8] S. Salinger, L. Plonka and L. Prechelt, A coding scheme development methodology using grounded theory for qualitative analysis of pair programming, 19th Annual Psychology of Programming Workshop, Joensuu, 2007, pp [9] T. C. Lethbridge, S. Elliott Sim and J. Singer, Studying software engineers: Data collection techniques for software field studies, Empirical Software Engineering 10 (3) (2005) [10] P. Jalote and N. Agrawal, Using defect analysis feedback for improving quality and productivity in iterative software development, Proceedings of the Information Science and Communications Technology (ICICT 2005), 2005, pp [11] I. Jacobson, G. Booch and J. Rumbaugh, The Unified Software Development Process. Addison-Wesley, [12] W. W. Royce, Managing the development of large software systems: Concepts and techniques, Proceedings of Wescon, 1970, pp

121 Article III III A tool supporting root cause analysis for synchronous retrospectives in distributed software teams Timo O.A. Lehtinen, Risto Virtanen, Juha O. Viljanen, Mika V. Mäntylä and Casper Lassenius Journal of Information and Software Technology, Volume 56, Issue 4, April 2014, Pages Elsevier B.V. Reprinted with permission.

122

123 Information and Software Technology 56 (2014) Contents lists available at ScienceDirect Information and Software Technology journal homepage: A tool supporting root cause analysis for synchronous retrospectives in distributed software teams Timo O.A. Lehtinen, Risto Virtanen, Juha O. Viljanen, Mika V. Mäntylä, Casper Lassenius Department of Computer Science and Engineering, Aalto University School of Science, P.O. Box 19210, FI Aalto, Finland article info abstract Article history: Received 7 June 2013 Received in revised form 8 January 2014 Accepted 9 January 2014 Available online 17 January 2014 Keywords: ARCA-tool Root cause analysis Distributed retrospective Global software engineering Context: Root cause analysis (RCA) is a useful practice for software project retrospectives, and is typically carried out in synchronous collocated face-to-face meetings. Conducting RCA with distributed teams is challenging, as face-to-face meetings are infeasible. Lack of adequate real-time tool support exacerbates this problem. Furthermore, there are no empirical studies on using RCA in synchronous retrospectives of geographically distributed teams. Objective: This paper presents a real-time cloud-based software tool (ARCA-tool) we developed to support RCA in distributed teams and its initial empirical evaluation. The feasibility of using RCA with distributed teams is also evaluated. Method: We compared our tool with 35 existing RCA software tools. We conducted field studies of four distributed agile software teams at two international software product companies. The teams conducted RCA collaboratively in synchronous retrospective meetings by using the tool we developed. We collected the data using observations, interviews and questionnaires. Results: Comparison revealed that none of the existing 35 tools matched all the features of our ARCA-tool. The team members found ARCA-tool to be an essential part of their distributed retrospectives. They considered the software as efficient and very easy to learn and use. Additionally, the team members perceived RCA to be a vital part of the retrospectives. In contrast to the prior retrospective practices of the teams, the introduced RCA method was evaluated as efficient and easy to use. Conclusion: RCA is a useful practice in synchronous distributed retrospectives. However, it requires software tool support for enabling real-time view and co-creation of a cause-effect diagram. ARCA-tool supports synchronous RCA, and includes support for logging problems and causes, problem prioritization, cause-effect diagramming, and logging of process improvement proposals. It enables conducting RCA in distributed retrospectives. Ó 2014 Elsevier B.V. All rights reserved. 1. Introduction Retrospectives, also known as post-mortems, are activities where the team members share experiences about problems and their causes [1], analyzing a recently ended project and/or iteration. Root cause analysis (RCA) is a structured investigation of a problem to detect which underlying causes need to be solved [2], and a useful practice for retrospectives [3 5]. Retrospectives are typically conducted in face-to-face meetings, in which the team members first identify problems that occurred. Subsequently, they conduct lightweight RCA by collaboratively creating a cause-effect diagram visualizing the causes of problems [5]. Corresponding author. Tel.: addresses: timo.o.lehtinen@aalto.fi (T.O.A. Lehtinen), risto.virtanen@ aalto.fi (R. Virtanen), juha.o.viljanen@aalto.fi (J.O. Viljanen), mika.mantyla@aalto.fi (M.V. Mäntylä), casper.lassenius@aalto.fi (C. Lassenius). Global software engineering, employing geographically distributed teams, has become a standard way of operating in today s business [6]. This way of working creates new challenges related to geographical, temporal, cultural and organizational distance [7]. The use of distributed teams also creates a major challenge for conducting team retrospectives [8]. In previous work, we developed a lightweight focus group based RCA method, ARCA, and evaluated it in four industrial field studies using collocated teams [9]. Even though the method was well liked, the companies pointed out the need to conduct RCA with their distributed teams. Literature on distributed retrospectives identifies a similar need and discusses the use of a combination of , spreadsheets and an online audio bridge to help facilitate the retrospectives [8]. However, relying on such tools in focus group based synchronous RCA is not feasible, as organizing and interpreting a high number of causes using s and spreadsheets would be highly difficult. Instead, cause-effect diagrams [9] supporting real-time online environment should be used in distributed retrospectives /$ - see front matter Ó 2014 Elsevier B.V. All rights reserved.

124 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) There are many proprietary software tools for RCA. 1 However, we have not succeeded in finding a web-based tool that fulfills the needs of conducting lightweight RCA in synchronous distributed software project retrospectives. First, the tool should make it possible for RCA participants to co-create a cause-effect diagram [5,9], which stays in-sync between the sites. Second, the tool should allow the development of process improvement ideas for the causes and maintain links between the improvement ideas and the detected causes [10 14]. Third, the tool should make it possible to vote on the most severe causes and best improvement ideas [9]. Fourth, the tool should also make it possible to capture and refine the findings of several retrospectives, in order to support organizational learning and knowledge management [3]. To the authors best knowledge 2 the most frequently lacking feature of current software tools for RCA is the syncing mechanisms needed for simultaneous co-creation of cause-effect diagrams, see Table 1. There are tools for simultaneous graph drawing, e.g., Google Docs drawings [15], but these tools lack features to support RCA, e.g. automatically capturing and refining the findings of retrospectives. Furthermore, to our knowledge, there are no empirical studies on the feasibility of using RCA in synchronous distributed retrospectives. While there is ample evidence for the benefits of RCA to detect the causes of problems and make improvements in various contexts [9 13,16 21], the existing studies have been conducted in a face-to-face context. Thus, in order to contribute to the existing studies, we developed an online tool for supporting synchronous RCA in distributed software project retrospectives called ARCA-tool. 3 It provides features for distributed RCA, idea development, and capturing the lessons learned in many retrospectives. The goals of this paper are to present ARCA-tool including its technology and main features, and to provide an empirical evaluation of the tool and synchronous RCA in the context of industrial software development with agile teams. In order to evaluate the usefulness of RCA and ARCA-tool, we used interviews, questionnaires, and observations in the retrospectives of geographically distributed industrial software teams, that followed the Scrum methodology [22]. Our research questions were: RQ1: Is ARCA-tool perceived as useful in the distributed retrospectives of agile software teams? RQ2: Is ARCA-tool perceived as easy to use in the distributed retrospectives of agile software teams? RQ3: Is RCA perceived as a good approach to use in the distributed retrospectives of agile software teams? While the first two questions are related directly to ARCA-tool, we evaluate the RCA method, since the evaluators might have difficulty separating the effect of the tool and the context in which it was applied, i.e. the synchronous retrospective method used and the company context. Naturally, ARCA-tool can be used without the retrospective with the RCA method and vice versa. The rest of the paper is structured in the following way. Section 2 covers the related work and identifies a gap in research, which is then filled by introducing ARCA-tool in Section 3. Section 4 explains the field study method used to evaluate the tool in real industrial contexts and the results of this evaluation are given in Section 5. Finally, Section 6 contains the discussion and Section 7 provides conclusions and directions for further work Investigation of proprietary RCA tools is difficult as freely available information of the tools is limited Related work In this section, we introduce the concept of software project retrospectives and present problems related to conducting RCA with distributed software teams. We also compare RCA software tools that we have found Software project retrospectives The key for effective problem prevention is controlling the causes of problems [23]. It is claimed that problems cannot be solved without solving their causes [9]. Retrospectives are one means to help identify and prevent the reoccurrence of problems that have occurred in prior projects [8,24 26]. In retrospectives, the team members share their experiences about problems and their causes [4,5,24]. Retrospectives enable learning at the individual, team, and organizational level. At the individual level, learning is based on shared experiences [27]. Thus, at the team level, learning is related to the shared experiences among the team members [27]. Furthermore, learning at the organizational level requires knowledge management, i.e. the shared experiences are captured and refined, and thereafter distributed to the teams [3]. Therefore, the output of retrospectives must be captured and refined. A software project retrospective can be viewed as a step-bystep process [5,28]. In the first step, problems related to the past project, iteration, or milestone are identified. Thereafter, the participants collaboratively identify the causes of the problems by using RCA. In RCA, the causes of problems are identified by constantly asking why for every cause [9]. The causes are visualized by using a cause-effect diagram, e.g., a fishbone diagram [5,14,19], or a directed graph [5,9]. The diagram represents the cause-and-effect relationships between the causes of problems. It aims to assist the participants to detect underlying causes for the problems. After the cause-effect diagram is finalized, the participants detect the root causes, defined as the underlying and controllable causes of the problem [9]. Process improvement ideas are then developed for the selected root causes. While the traditional use of retrospectives has been fraught with problems [25], modern agile development processes, such as Scrum [22], have made the practice common in modern organizations. As such, Scrum or other agile development processes do not require the use of RCA as part of their retrospectives however RCA can well be used in Scrum retrospectives as a practice that helps add both structure and provides additional value to the teams Root cause analysis and distributed retrospectives The issue of distributed team members has been considered as the greatest challenge that organizations face while conducting retrospectives [8]. Retrospectives should be lightweight [28] but under the influence of budget constraints and time pressure, they are rarely conducted [25]. While the project members are geographically dispersed, arranging face-to-face retrospectives requires too much effort. Conducting face-to-face retrospectives in such settings is often cumbersome. Distributed retrospectives are introduced as substitutes for face-to-face retrospectives [8]. Such retrospectives are typically conducted with the aid of an audio or video bridge [8]. Logically, in distributed software projects, conducting distributed retrospectives require less effort than conducting them face-to-face due to decreased traveling time. Conducting RCA in distributed retrospectives is difficult as it requires tools that are not yet mature enough. It has been claimed that a combination of s, spreadsheets, and an audio bridge are enough to support distributed retrospectives [8]. However, in

125 410 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Table 1 Comparison of RCA software (for more details see Appendix A). Software Technical features a RCA features a Costs Client: browser/ native Real-time collaboration Cause-effect diagram Idea development Voting Knowledge management ARCA-tool Browser Yes Graph Yes Yes Yes Free (MIT) Google Docs drawings Browser Yes Graph Yes Free to use TapRooT Enterprice ed. Both (Yes) Tree Yes Yes Fee REASON Both Yes Yes Fee XFRACAS Browser (Yes) Yes Yes Fee RCAT Software?? Tree?? Yes? PathMaker Native Yes Tree Yes Yes Fee Cause link Native Tree Yes Yes Fee Solve?? Tree???? SIM Ò Native (Yes) Tree Yes? Yes Fee PROACT Native Yes Tree Yes? Yes Fee Catalyst Native Free (GPL) Blackbox Native? Tree (Yes)? Yes Fee Investigator 3 Native? Tree Yes? Yes Fee Track Native Yes Fee Corrective Action Browser Yes Yes Yes Fee RealityCharting Both Yes Tree Yes Yes Yes Fee ABS Cons. Root Cause Map? Yes? Yes? Yes Fee RCA Software 5.1 Native Tree Fee ThinkReliability Excel Native Tree Yes Free Template Enablon IMS?? Tree Yes? Yes Fee Smartdraw Native Tree Fee Set-Based Thinking Native Graph Yes? Yes Fee PHRED (Browser) (Yes) Tree Yes? Yes Fee BowTieXP Native Tree ( )? Yes Fee FMEA Software Native ( ) Tree Yes Yes Fee Systems2win Native Graph Fee ireliability Browser Tree Yes ( ) Fee FMECA Software Native Tree ( ) ( ) (Yes) Fee Rapid Problem Isolation Native Tree (Yes) Fee Lassale [33] Native Tree ( )? CA Spectrum?????? Fee RootCause? ( ) Yes? (Yes) Fee Speechminer?????? Fee Root Cause Analyst (Native)? (Tree)?? ( ) Fee RCA GUI [35] Native Tree Yes Yes? a This feature is not available in the software tool, Yes = this feature is available in the software tool, ( ) = it is likely that this feature is not available in the software tool, but we were not able to verify that, (Yes) = it is likely that this feature is available in the software tool, but we were not able to verify that,? = we were not able to find any evidence on the occurrence of this feature, Fee = the software is subject to a fee, free (license) = the software is free, free to use = using the software is free. software projects, conducting RCA with spreadsheets is difficult [9]. This is because of the high number of detected causes [9,11 13]. For example, in our previous work, four software product companies conducted two hour RCA workshops (similar to retrospectives) each and causes of software project problems were found in each workshop [9]. The causes were spread over various process areas [29] and had complex cause-and-effect relationships to one another. Several tools for distributed software development exist [30 32]. The tool types that are the most similar to ARCA-tool are collaborative modeling tools [30] that allow collaborative and distributed software modeling. However, the main goal of those tools is software design modeling, while our tool is focused on RCA cause-effect diagram modeling. Additionally, knowledge management tools [30,31] include knowledge sharing features, which ARCA-tool also provides. Furthermore, our tool reduces but does not replace the need for the use of other communication tools, e.g., a chat, as the cause-effect diagram is constantly updated to all participants, which helps group awareness. Our tool also has similarity with virtual whiteboards, such as Google Docs drawings [15], but our tool has more specific features for cause-effect diagramming and the development of process improvement ideas. Additionally, none of the virtual whiteboards offers support for capturing and refining the shared experiences from the retrospectives of many teams. Based on the literature it seems that it would be possible to combine the existing collaborative tools for performing the same tasks as with our ARCA-tool. However, this would require switching between tools and require cumbersome copy-pasting (from the original cause-effect diagrams to some separate list of process improvement targets and ideas) between different tools Comparison of root cause analysis software tools Software tools that support RCA in synchronous distributed retrospectives are rare. We searched RCA software tools from Google, Sourceforge, Google Scholar, and Scopus. We found a total of 35 tools and compared their features with ARCA-tool (see Section 3). We searched for existing root cause analysis software in Google using two search strings:< root cause analysis software > and < root cause analysis software free>. The first search string resulted in 404,000 estimated hits. Thus, it appears the topic is of high interest. For both search strings, we included all software tools that we found from search result pages until there was a search result page which did not extend the found tools any further (10 hits + adds of the search result page). The number of search result pages was eight for the first and two for the second search

126 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) string. With this limitation, our search resulted in 24 unique tools for RCA. We applied this limitation in order to complete our search within reasonable time. Searching for the tools from Google also revealed two additional websites that summarize software tools for RCA. 4 We also included these tools in the evaluation. The websites revealed 17 different software tools for RCA. However, 8 tools were already found in Google. Thus, we were left with 33 unique tools for RCA. We also searched the sourceforge.com database with search string root cause analysis and found one open source alternative that claimed to support root cause analysis (DecisionTreeExpert). Unfortunately, there was no guidance on how to use the tool and we were unable to see how the tool could be used for RCA, and thus excluded it from the comparison. Furthermore, we searched academic works from Google Scholar and Scopus with the search string root cause analysis software. Google Scholar resulted in 58 articles and Scopus resulted in 37 articles. For each article, we read its heading, abstract, key words, and skimmed the content of paper. If the article indicated that a tool for RCA is introduced, we selected the article for further evaluation. Six articles were selected from Google Scholar and three articles were selected from Scopus. The selected articles were thereafter read. One article from Google Scholar [33] and two articles from Scopus [34,35] introduced a software tool for RCA. Two of these articles introduced a tool that we had already found (Lassale and REASON) from non-academic databases. Finally, we decided to make a comparison to Google Docs drawings, an online collaborative graph drawing tool. Thus, we had 35 existing software tools that we compared with ARCA-tool. We made our comparison based on the material freely available to us. The sources of information included demonstration videos, free trial versions, marketing material and other available documentation as the majority of the tools were proprietary. The features that we compared cover seven aspects important for conducting synchronous distributed software project retrospectives. We introduce these aspects below and present analytical arguments for them based on our experience in conducting industrial RCA sessions [9] and prior literature on software project retrospectives [4,5], and organizational learning systems [36]. The comparison is summarized in Table 1 while further details of the comparison are in Appendix A. First, we argue that web browser based software outperforms native client software in the ease of adoption. The software teams rarely have time to conduct retrospectives [25] and therefore the ease of adoption is an important aspect. Native client software requires installation whereas web browser based software can be immediately used. Furthermore, people can use web browser based software with computers having a different operating system and hardware including tablets and smart phones. This is the case unless the web browser based software requires plugins that only work on certain systems, e.g., the flash plugin. Web browser based software can also be used from home computers that might not have the native client software pre-installed or might lack the required licenses. Thus, web browser based clients make organizing retrospectives more lightweight and hassle free. Four of the existing tools are used with a web browser, see Table 1. Second, in order to conduct distributed synchronous retrospectives similarly to collocated retrospectives [4,5] the RCA software tool needs to support real-time collaboration among all participants. This means that the RCA software outcome stays in sync between the different sites. Additionally, all team members should be able to contribute to the analysis as it takes place. Therefore, all clients need to have synchronous editing access to the analysis results. We see that push pull technology is needed to implement such requirements as it removes the need for clients to constantly reload their view. Only six of the existing tools fully support realtime collaboration. Third, co-creation of a cause-effect diagram is at the core of RCA in retrospectives, as introduced in [4,5,9]. Using the cause-effect diagram helps the team members to understand and explain a complex problem in terms of its causes, sub-causes, and causal relationships. The majority of the RCA software tools enable creating the cause-effect diagram. Considering the structure of the cause-effect diagram, only three of the existing tools support drawing a graph, while the majority of the tools support tree based cause-effect diagrams. A graph structure has been claimed as more efficient for software project retrospectives than the tree structure [5]. Fourth, RCA aims to develop process improvement ideas for the causes of problems [9]. Thus, the RCA software tool should make it possible to develop and link improvement ideas to the identified causes of problems. Such features are supported by the majority of the tools. Fifth, it is important that the team members can vote on the most severe causes and best improvement ideas [9]. This is important if a high number of causes and improvement ideas are detected [9]. The team members can focus on the causes perceived as the most severe. Similarly, they can decide collaboratively which improvement ideas should be implemented. Voting is supported only in one of the existing tools. Sixth, the RCA software tool should support knowledge management, which is about creating learning organization [4]. Dingsøyr presents that retrospectives are a method for leveraging knowledge from the individual level to the organizational level [4]. Lee et al. [36] present that organizational learning system should include global knowledge base that combines cognitive maps (cause-effect diagrams of experiences) created by individuals. Thus, the software tool should include the knowledge base which enables combining the lessons learned from many retrospectives and teams over the years. The majority of the tools support knowledge management and allow accessing past RCA session results. Seventh, we analyzed the costs of existing tools. One of the tools is under an open source license, two are otherwise free to use, whereas the majority of the tools are subject to a fee. To summarize, in contrast to the existing RCA software tools, only ARCA-tool covers all of the seven aspects discussed above. However, our analysis was limited as described at the beginning of this section and the evaluation of many tools was challenging due to proprietary licenses and limited access to many commercial tools. Thus, it is possible that software tools with similar features as ARCA-tool exist. In any case, the results of our field study can be used as evidence for the usefulness of any tool that implements these features. Furthermore, the comparison of these 35 prior RCA software tools is the largest according to our knowledge. 3. ARCA-tool This section provides an overview of ARCA-tool. We will discuss how the tool supports distributed retrospectives and the features it includes Overview of ARCA-tool ARCA-tool is designed to be used when conducting RCA in retrospectives. The tool is open-source (MIT license) and was developed in two subsequent projects on the Aalto University software capstone project course 5 by 15 software engineering 5

127 412 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) students. During the projects, the primary author of this paper acted as the customer and provided the tool requirements. ARCA-tool supports the identification of problems and their causes by providing features particularly suitable for the creation of cause-effect diagrams in software project retrospectives. Among many useful features, the team members can develop process improvement ideas embedded in the detected causes and problems. The tool supports conducting distributed retrospectives, and makes it possible to capture and summarize the findings of a set of retrospectives. ARCA-tool uses a client server architecture with push-and-pull technology, i.e., the server and clients transmit and receive messages from one another. The core of the tool is a cloud server. The cloud server ensures that all clients (web browsers) are upto-date in real-time. This is important during distributed retrospectives as the contribution of team members is immediately visible to the other team members Key features of ARCA-tool In ARCA-tool, a retrospective facilitator creates a retrospective and shares it with the team members. The team members can join the retrospective from their own computers through a TCP network connection. Thus, the retrospectives do not need to be conducted face-to-face. Additionally, ARCA-tool allows the team members to contribute to the retrospective before and after the retrospective meeting. This is occasionally important as finding a common time is especially difficult in geographically dispersed projects [8]. However, such approach does not make it possible to ask clarifications about the detected problems from other team members. Then one can only see what the others have found. Respectively, the team members cannot contribute to the findings which are not yet detected. On the other hand, the team members can provide input for others or try to contribute to their findings. The team members start the retrospective by listing problems that occurred during the unit of analysis, which typically is an iteration [5]. Thereafter, they select problems (which can be done through voting that is supported by the tool or by managerial decision, see Points in Fig. 1), which are analyzed by using RCA [5]. In order to support RCA, a cause-effect diagram is provided. ARCA-tool uses a directed graph structure to model the cause-and-effect relationships (Fig. 1). Such a structure, a causeeffect diagram, has been found to be suitable for software project retrospectives [5,9]. The team members can enter the problems, the causes of problems and related cause-and-effect relationships to the cause-effect diagram. The tool protects the anonymity of team members. After the causes are entered, the team members can develop process improvement ideas related to the causes. In ARCA-tool, the team members develop their process improvement ideas for each cause separately. This increases the accuracy of the process improvement ideas as now they are cause specific corrective actions. Additionally, the ideas are visually embedded in the causes. ARCA-tool colors the causes that have correctives actions with a yellow color (see the cause Lack of commitment in Fig. 1). Embedding is important as it keeps the cause-effect diagram clean and simple. Naturally, for the evaluation of the process improvement ideas, the tool offers a separate view for browsing all or selected improvement suggestions as one list (see Fig. 2). All key features of ARCA-tool are embedded in a radial menu (see Fig. 1). The radial menu is activated when a team member selects a cause. Simultaneously, all causes that are directly connected with the cause are emphasized (see the edges connected with the cause Project members do not meet enough in Fig. 1). The key features are, starting from the one o clock position, and proceeding in counterclockwise order. Thumb up = Vote for this cause. Pencil = Edit this cause. Trashcan = Delete this cause. Light bulb = Create process improvement idea. Arrow left = Link this cause to another existing cause. + sign = Create a cause that is linked to this cause. Ticket = Classify this cause Additional features of ARCA-tool Voting is occasionally used in retrospectives to focus the attention of the team members to specific problems or causes. Voting is also used to indicate process improvement ideas the team members value the most [9]. In ARCA-tool, the team members can like or dislike the causes and process improvement ideas (see the Points and the thumbnail icon in the radial menu in Fig. 1). The amount of likes and dislikes is limited to ±1 for the team members while being unlimited for the retrospective facilitator. This way the causes and developed process improvement ideas can be voted on by the team members and emphasized by the facilitator. Classification of the causes of problems has been used to improve learning and to draw conclusions from detailed and high-volume observations made during RCA, e.g. [10,12,13]. In ARCA-tool, the classification can be done during or after the causes are entered in the cause-effect diagram. The tool provides two dimensions for classifying the causes. The pre-existing classification dimensions are the process areas and types of causes [29]. The process areas express in which parts of the software process the causes occur, whereas the types of causes explain what the causes are. In ARCA-tool, the team members can develop a retrospective specific classification or utilize the classifications used in their prior retrospectives. The tool also provides statistics about the classifications made during the retrospectives. For example, the team members can view the distributions of the detected causes (see Fig. 3). They can also view the distributions of liked causes, and causes that include process improvement ideas. The team members can also view the cause-and-effect relationships between the process areas. In order to support organizational learning, ARCA-tool provides features for monitoring the output of retrospectives, i.e., the causes and process improvement ideas. The tool enables the analysis of an individual retrospective as well as the combination of many retrospectives. This can be highly useful while capturing and refining the lessons learned from many retrospectives. The team members can view the output of all retrospectives they have participated in. The status of the detected causes ( detected, elimination, won t fix, fixed) and developed process improvement ideas (idea, will beimplemented, implemented, rejected) can also be managed. Additionally, the tool provides information about the classified causes. For example, senior managers would like to know what process areas are most often related to the problems analyzed in the retrospectives, see Fig. 3. They would also like to know what types of causes are usual in those process areas. In ARCA-tool, the cause-and-effect relationships between the classifications can be automatically visualized for the selected retrospectives. Additionally, the tool provides detailed statistics about the distributions of cause types in process areas. Furthermore, the team members can download a file which includes the detected causes and process improvement ideas from the monitored retrospectives. Thus, the team members can use ARCA-tool to analyze the detailed issues processed in the prior retrospectives and communicate the lessons learned to others. 4. Field study methodology For the empirical evaluation of ARCA-tool, we used a field study method [37] that allowed us to study the adoption and use of the tool in a real industrial setting. We observed and video recorded

T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) 408 437 413 Fig. 1. Screen view of ARCA-tool. Fig. 2. Monitoring view of ARCA-tool showing the causes of Fig.

128 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Fig. 1. Screen view of ARCA-tool. Fig. 2. Monitoring view of ARCA-tool showing the causes of Fig. 1 and their improvement ideas. four retrospectives conducted by four teams in two companies. After the retrospectives, all participants completed a questionnaire, and selected case participants were interviewed. Thus, we present a rich data set from four industrial software teams, but in contrast to a controlled experiment, we cannot present meaningful statistical comparisons, due to the low number of subjects and the lack of a control group providing an independent baseline for which we could compare our measures. This section presents the research method and context in more detail. The case companies are introduced in Section 4.1 and the retrospective method including the usage of ARCA-tool in Section 4.2. The data collection and analysis methods are shown in Sections 4.3 and Case companies The empirical part of this study was conducted in two software product companies, as summarized in Table 2. The rationale for the selection of these two case sites was that together they formed an interesting research setting allowing us to evaluate an industrially relevant retrospective method and software tool in collocated and distributed retrospectives. The similarities between the cases made them more comparable whereas the dissimilarities allowed us to evaluate the retrospective method and software tool in different case domains. The retrospectives of both cases followed a similar retrospective method and each retrospective was computer facilitated by ARCA-tool. The cases were also similar considering the number of retrospective participants, and effort used in the retrospectives. The roles of the case participants were also somewhat similar. Additionally, both cases were conducted in distributed agile software development organizations. Two important differences between the cases were present. First, in Case 1, the used retrospective method was their current method. Instead, the retrospective method was new in Case 2. Similarly, in Case 1, ARCA-tool was used in retrospectives previously. Instead, in Case 2, it was introduced the first time. Second, the participants of Case 1 were experienced with collocated retrospectives, which they used in this study, too. Instead, the case participants in Case 2 were experienced with distributed retrospectives and they followed that approach, respectively. Therefore, we characterize the retrospectives of Case 1 as collocated whereas the retrospective of Case 2 was distributed. The cases were also different considering the company size and specific target problems analyzed in the retrospectives.

414 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) 408 437 Table 2 Summary of the company cases. Fig. 3.

129 414 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Table 2 Summary of the company cases. Fig. 3. Pie chart view of ARCA-tool presenting the distributions of classified causes shown in Fig. 1. Case 1 Case 2 Case company Software company with >800 employees Software company with >100 employees SW development Agile with >30 employees Agile with >70 employees organization Case participants Product owners, scrum masters, architects, and developers. Scrum master, architects and developers. N = 5 N =3+5+3=11 Evaluation perspective Evaluation of the current method and tool Evaluation of a new method and tool Retrospective(s) 3 Collocated 1 Distributed Distribution All persons in a meeting room in Finland 1 person in Romania + 2 in the office in Finland + 2 at home Effort used 1 h meeting h retrospective (3 teams) 1 h meeting + 1 h retrospective (1 team) Target problem(s) Expectations of product owners do not meet the output of scrum teams (1) Lack of pair programming (2) Lack of merging the code (3) Lack of collaboration Causes found (23) + (20) + (39) = 82 ( ) = Case 1 Case 1 was conducted in a large-sized international software product company with over 800 employees. The products are highly complex software systems integrated into customized hardware provided by the company partners and to third party software modules. There are around 30 employees working for the core product of the company. The rest of the employees work in localization, integration, customer services and sales. Our study context, the software development organization of the core product is divided into two development teams, which are geographically distributed over several European countries. The organization follows agile software development practices, based on the Scrum methodology [22]. The development work is divided into sprints each lasting two weeks. In order to facilitate continuous improvement, the Scrum teams conduct 60 min faceto-face retrospectives regularly. These are conducted at the same time as the sprint demonstration and the planning of the upcoming sprint. The teams have found using RCA and ARCA-tool in the retrospectives to be useful. The retrospectives are conducted with the following procedure. The team members start by listing positive and negative experiences with ARCA-tool. Then they conduct RCA for some of the voted negative experiences. During RCA, the team members first list underlying causes to ARCA-tool. Then they discuss the findings and try to detect deeper level causes. Corrective actions are developed either during or after the retrospectives for the selected root causes. The problem of the current practice has been the fact that the team members have been forced to travel to the same physical location regularly, a challenge for many team

T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) 408 437 415 members. In order to reduce the need for travelling, distributed retrospectives have been considered as a substitute.

130 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) members. In order to reduce the need for travelling, distributed retrospectives have been considered as a substitute. We conducted our field study in the context of three teams, two distributed development teams, and one product owner team. In both development teams, members include approximately five software developers (software developers and architects) and one scrum master (team leader). The work of the teams is overseen by several product owners (business and product managers). The product owner team had three members, all product owners, representing the needs of customers in different countries. Each product owner is responsible for steering the customer needs to both development teams. The knowledge sharing between the development teams and product owners occurs mainly in the sprint planning sessions. It is assumed that all needed information about the customer needs is communicated during the sessions. However, the developers can ask for the product owners to give clarifications to the customer needs during the sprints. We were invited to observe the retrospectives of these three teams. The goal of the retrospectives was to analyze why the expectations of the product owners did not meet the output of the development work. The goal was defined by two software development managers before the retrospectives (see Section 4.2) Case 2 Case 2 was conducted in a medium-sized international software product company with over 100 employees. The company products are large and complex software systems released four times a year. The software development organization includes approximately 70 people. The development work is divided into seven teams, each including about ten people. The team members are geographically distributed over several countries in Europa and Asia. Like Case 1, the organization follows agile software development practices, based upon the Scrum methodology [22] and the teams conduct 60 min distributed retrospectives regularly. Unlike Case 1, the duration of sprints varies between two and four weeks. Additionally, the retrospectives are conducted in a distributed fashion using an online audio and video bridge. In the retrospectives, problems that have occurred are discussed, and process improvement ideas are developed. The teams do not use RCA in their retrospectives. Instead, the team members discuss positive and negative experiences and try to figure out how to make improvements in their development work activities. The retrospectives are occasionally summarized to the company s intranet pages. The problems of the current practice include informal discussions resulting in unfocused discussions and dominating team members who have spoken over the others. Thus, the team members have considered alternative practices, which may be more feasible for their needs. Our field study was conducted in a context of one distributed software team including the development roles of scrum master, software developers, and architects. We observed a distributed retrospective meeting, where the team members used ARCA-tool and the retrospective method which we introduced to them. Three problems were analyzed in the retrospective. The problems were identified in a separate meeting, which was conducted by the team members before the retrospective (see Section 4.2). The first problem was lack of pair programming, which the team members thought was not used enough. The second problem was merging the code between different work branches. The merge status was unclear, additionally; merging was not done often enough. The third problem was lack of collaboration with other teams in the company. 1 h. The meeting was conducted by the company representatives who wanted to give a specific goal for the retrospective. In Case 1, the representatives included a product owner and scrum master. In Case 2, the representatives included the scrum master and few software developers of the team. In the meeting, the representatives discussed about problems that had occurred in the development work. Based on the discussion, the representatives concluded the goal of the retrospective, i.e., to explain one (Case 1) or several (Case 2) high-level problems (see Table 2). Thereafter, the retrospective was arranged. The retrospective lasted approximately 1 h, and it was facilitated by a company representative. At the beginning of each retrospective, the facilitator briefly introduced the specific goal of the retrospective for the participants. In Case 2, the facilitator also shortly introduced the retrospective method and ARCA-tool. The used retrospective method is summarized in Fig. 4. ARCA-tool was used by all participants in every step of the retrospective. Each retrospective resulted in a cause-effect diagram emphasizing the most important root causes. In Case 1, three retrospectives were conducted for a single problem. The first two retrospectives were conducted with each development team having participants from all different roles of the development team including the scrum master, developers, and architects. The third one was conducted with the product owners. The facilitator in Case 1 was the scrum master of one development team. The facilitator steered the retrospectives and led the implementation. The retrospectives were conducted face-to-face at the same physical location. Each retrospective was conducted by using the following procedure. First, the participants were given 5 min to enter problems related to the target problem in ARCA-tool. At this stage, all participants used their own computers. During the next 15 min, each participant explained the problems entered to the tool. The other participants simultaneously commented and discussed the findings. Thereafter, the participants were given 5 min to enter underlying causes that explained the detected problems. This was also done simultaneously in ARCA-tool by all participants, working on their own computers. Then, during the next 15 min, each participant explained the underlying causes entered to the tool. The other participants commented on and discussed the findings. They also entered additional causes discovered during the discussion. Furthermore, they used the tool to note if some cause explained other causes. At the end of the retrospective, the participants held a summarizing discussion about the problems and causes entered to the tool. They also voted on the most controllable causes by using the liking feature of the tool. In Case 2, one retrospective was conducted and it was facilitated by the scrum master of the team who steered the retrospective and led the implementation. The retrospective was conducted as distributed with geographically dispersed participants. The participants included all roles of the development team (a scrum master, 4.2. Retrospective method used in the cases Each of the retrospectives across both cases was initiated by a separate meeting, where a high-level target problem for each retrospective was defined. The separate meeting lasted approximately Fig. 4. The retrospective method used in the study.

131 416 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) software developers and architects) and they used ARCA-tool to document and share their findings about problems and related causes, working on their own computers in their own locations in two European countries. Google+ was used as an audio and video bridge. Thus, the participants were able to discuss and see each other. The retrospective followed the same outline as the one in Case 1, see Fig Data collection The feedback was collected from the case participants using interviews and questionnaires, see Appendix B and C. Additionally, we used observations combined with video recording. The interviews were executed by the 2nd (Case 1) and 3rd (Case 2) author. The primary author observed the interviews. He wrote notes and ensured that the questionnaires were filled in by the case participants. Additionally, the retrospectives were video recorded. Thus, we were able to check if something was missing during the data analysis. A total of 16 case participants filled in the questionnaires. In addition, we interviewed eight participants. Our aim was to collect feedback about the introduced retrospective method and usefulness and ease of use of ARCA-tool. In Case 1, one participant from each retrospective was interviewed. In Case 2, we interviewed all retrospective participants. Interviews at Case 1 were conducted face-to-face, whereas the interviews at Case 2 were conducted as distributed by using online chat for three participants and face-to-face for two participants. The chat was used in interviews, because it was easier for the interviewees being geographically dispersed. Furthermore, in the questionnaires, the participants of Case 1 evaluated mostly their current retrospective method as the introduced retrospective method was very similar with it. In contrast, in Case 2, the participants compared the introduced retrospective method with their current methods being different than the introduced one. The scale in the questionnaires was a symmetric 5-point Likert scale Data analysis Both cases were analyzed separately as the questions asked in the questionnaires and interviews varied slightly between the cases. This was due to differences in the company context. Case 1 had used RCA and ARCA-tool previously while Case 2 had not. We transcribed and coded the interviews accordingly. We calculated the means, standard deviations, and medians of the questionnaires. Finally, we summarized the interviews and questionnaires in order to conclude whether the findings were similar between the cases. We are aware of the controversy of presenting means from a Likert scale. If the interval between the Likert scale items cannot be presumed equal, calculating means with standard deviations is inappropriate, as stated by Jamieson [38]. In our study, the interval between the Likert scale items can be presumed equal as the scale was symmetric and only the extreme values had a textual representation. In Case 1, the scale was: 1 = very minor, 2, 3, 4, 5 = very major, and in Case 2, the scale was: 1 = very low, 2, 3, 4, 5 = very high. Furthermore, mean contains more information in small samples, such as ours, than median, e.g., three responses with values 5, 5, and 1 give the median of 5 but mean of The latter is closer to the truth because the opinions were highly polarized and the median would only represent the opinion of the middle respondent. 5. Results In this section, we present the field study results. Feedback from ARCA-tool (see Section 3) is presented in Section 5.1 and the feedback from the retrospective method including the RCA method (see Section 4.2) is summarized in Section 5.2. Furthermore, Table 3 summarizes the feedback from the questionnaires, and Tables 4 and 5 summarize the results from the interviews. The tables separate the results regarding the research questions. While RQ1 and RQ2 aim to evaluate ARCA- tool, RQ3 evaluates the retrospective method ARCA-tool To summarize, our results indicate that ARCA-tool increases the cost-efficiency of retrospectives and it is perceived as essential in distributed retrospectives. Additionally, the tool is perceived easy to use and learn. Therefore, we believe that the tool supports the process of the retrospective method (see Section 4.2) and helps the participants to conduct the tasks of retrospectives. Regarding usefulness, the participants from both cases evaluated in questionnaires (see Table 3) that the tool helped to detect the causes of problems. The participants in Case 1 also evaluated that the retrospective would be less efficient and more difficult without the tool. Respectively, in Case 2, the participants evaluated that the cost efficiency of the retrospective increased with ARCAtool. Furthermore, the interview results from both cases (see Tables 4 and 5) indicate that the tool is essential in distributed retrospectives. Our results from Case 1 also indicate that when the retrospective is conducted face-to-face, the tool can be substituted with a whiteboard and postit notes, but in that case the analysis is not as efficient as it is with the tool. According to the interviews at Case 2, ARCA-tool should also be improved. It was said that the tool needs slight improvements while the detected causes are organized. Some participants perceived that currently the tool does not support the visualization of cause groups enough. Perhaps it would be useful to organize similar causes into the same set of causes to be visually represented well on the cause-effect diagram. Regarding ease of use, the participants from both cases evaluated in questionnaires (see Table 3) that the tool is easy to use and learn. This indicates that ARCA-tool supports the process of retrospective as it helps the participants to conduct the tasks of retrospectives easier, i.e., to detect and analyze the causes of target problems (see Table 2). In Case 1, the ease of use and learning the tool were both evaluated with a very high value. In Case 2, the values were also high, but less than in Case 1. We assume that this was because the tool was new to the participants of Case 2. Furthermore, also the interviews indicate that the tool is easy to use (see Tables 4 and 5). The interviews at Case 1 indicate that the tool makes it easier to visualize the detected causes. Respectively, the participants in Case 2 claimed that the user experience is intuitive and the tool is relatively easy to use. Furthermore, regarding the results from Case 2, there is no feature overload, but all essential features are included in the tool. It was also noted that the difficulty of analysis correlates with the number of causes of problems. The number of detected causes in the retrospectives was around (see Table 2) The retrospective method Considering the results from the interviews, using RCA in retrospectives was perceived as useful in both cases. This was due to the structured approach that the retrospective method followed and the in-depth analysis which improved collaboration. In Case 1, the participants said that the structured approach of the RCA method helped to detect the causes of problems. In Case 2, the participants said that the structured approach of the RCA method resulted in deeper understanding about the causes of problems which makes improvement to their current practices. Considering the questionnaires, the participants from both cases evaluated the easiness to collect causes and detect root causes as high (see Table

132 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Table 3 Summary of questionnaires. Case 1 Case 2 All teams a (Collocated) (Distributed) ScrumT1 (N = 3) ScrumT2 (N = 5) Product Owners (N = 3) ScrumT3 (N = 5) x r ~x x r ~x x r ~x x r ~x N x r ~x RQ1 Usefulness of ARCA-tool Retrospective efficiency without the tool Tool s cost efficiency compared with previous practices Assistance of the tool for cause detection Ability to detect the causes without the tool Retrospective ease of use without the tool RQ2 Ease-of-use of ARCA-tool Easiness to collect causes Easiness to detect root causes Ease of use of the tool Learnability of the tool RQ3 Retrospective method Personal contribution RCA cost efficiency compared with prior practices RCA ease of use compared with prior practices Correctness of detected causes Impact of the detected causes Openness in communication a N = the number of respondents, x = mean, r = standard deviation, ~x = median, Scale: 1 = very minor/low; 2, 3, 4, 5 = very major/high. Table 4 Summary of interviews in Case 1. Question Summary Quotes from the interviews Would we have found the same problems and causes without the tool? (RQ1 2) Did this retrospective method help us to find the causes of the problems? (RQ2 3) Do you think that we found the most critical problems? (RQ3) Similar causes could have been detected also by using a whiteboard, as an example. However, ARCA-tool improves the efficiency of the analysis. In geographically distributed teams, ARCA-tool is essential The key to finding the causes was the RCA method. ARCA-tool helped to visualize the causes of the problem. However, the tool itself was not perceived as the key to success We could have detected the same causes by using a whiteboard, however, ARCA-tool made the analysis easier. (person 1) The required effort by using the whiteboard would be higher (person 1) ARCA-tool improves the visualization of the detected causes. (person 2) ARCA-tool spares time when documenting the results. (person 2) ARCA-tool is essential when some participants are geographically dispersed. (person 1) The RCA method helped to found these causes. ARCA-tool itself is not the key to success, but the structured approach of the RCA method is. (person 1) The tool made it easy to see the big picture related to the problem causes. Each team member was additionally able to see what the other participants have detected. (person 2) The most critical causes of the target problem were found We did find the most important root causes (person 1) We did find the most critical problems (person 2) I think that we found most of the causes. (person 3) 3). Furthermore, they evaluated that the correctness and impact of the detected causes was high. The participants of Case 2 perceived that the RCA method improved collaboration. Additionally, they said that the RCA method is easy to use and learn. They explained that the method is based on an intuitive and simple idea. The results from questionnaires are in line with these results. The openness in communication was evaluated with high values in both cases (see Table 3). Additionally, the participants evaluated their personal contribution with high values. 6. Discussion In this section, we answer the research questions and discuss our findings and possible threats to the validity of this study Answering the research questions RQ1: Is ARCA-tool perceived as useful in the distributed retrospectives of agile software teams? In Case 1, ARCA-tool had already been found to be useful in collocated retrospectives. The tool was new to the participants of Case 2, but they were experienced in conducting distributed retrospectives. In order to answer this research question, we use the results from Case 2 and compare them to Case 1. Regarding ARCA-tool we claim the following: The tool enables the team members to contribute to the retrospective simultaneously. This improves the communication as the team members can write simultaneously while speaking simultaneously is not possible. This also reduces the risk that the participants forget some important comments if they are not written down. The cause-effect diagram structure provided by ARCA-tool improves the way the findings are visualized. This encourages the team members to consider the findings in-depth, as proposed in Case 2 (see Table 5). Our results support these claims. In both cases, the tool was evaluated as efficient (see Table 3), but in the distributed retrospective of Case 2 (see Table 5), the tool was characterized as

133 418 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Table 5 Summary of interviews in Case 2. Question Summary Quotes from the interviews In contrast to the company practices used to detect the causes of problems, do you consider ARCAtool as useful? (RQ1) Do you consider ARCA-tool as cost efficient when compared with RCA which is conducted by using postit notes or Google Docs drawings? (RQ1) Do you consider ARCA-tool as easy to use when compared with RCA which is conducted by using postit notes or Google Docs drawings? (RQ2) In contrast to our process improvement practices, do you consider the RCA method as easy to use? (RQ3) In contrast to our process improvement practices, do you consider the RCA method as cost efficient? (RQ3) ARCA-tool improves the company practices. The tool improves the analysis of the causes of problems and their relationships. Additionally, the tool is perceived as enjoyable ARCA-tool works well with distributed teams. This is because of the online automation and features supporting organizing the causes easily. In contrast to Google Docs drawings, the tool should support the grouping of causes ARCA-tool is learnable and intuitive. There is no feature overload either. The layout automation improves usability. On the other hand, when the number of causes increases, the difficulty of the analysis increases The RCA method fits the retrospectives well. It is learnable, simple, intuitive, and formal The RCA method improves current practices by providing deeper analysis with its structural approach. It also improves the collaboration and conceptualization related to the causes of problems It works! (person 4) The tool improves understanding about the causal relationships between the problems, which I consider as useful. (person 5) I found it very useful and fun to do. It certainly is a better practice than having an online video meeting like we had in the past. (person 6) The tool challenges the participants to consider the causes of problems deeper. (person 7) In our case, the postit notes do not work at all. This is because of the distributed team members. (person 4) In Google Docs drawings, a lot of time is spent to organize the causes and their relationships (person 7) Grouping the detected causes with ARCA-tool is currently difficult. (person 8) The tool is relatively easy to use and much more flexible than RCA which is conducted by using the postit notes. (person 4) The user experience was intuitive. (person 5) Layout automation is good. (person 7) Outlining a high number of causes is somewhat difficult. (person 8) After little practice it definitely helps us to improve the efficiency of the work. (person 4) The RCA method is not difficult to use. (person 7) It is based on intuitive and simple idea (person 5) Yes, because the RCA method is structural and straight forward (person 8) The RCA method works. (person 4) I think that the visualization of the causes is important. (person 4) The method improved the discussions and helped to consider the problem more deeply. (person 5) essential. Similar comments about distributed retrospectives were also given in the interviews with the participants of Case 1. In Case 1, ARCA-tool was used previously and the participants perceived that they would like to use the tool in their upcoming retrospectives too. Obviously, the tool was found to be useful in face-to-face retrospectives. A comparison of the results from Case 2 to Case 1 indicates that ARCA-tool is also useful in distributed retrospectives. In the distributed retrospective of Case 2, the tool was perceived as useful when it was compared with the current practices (see Table 3). Thus, regarding Case 2, ARCA-tool improves distributed retrospectives where only audio and video bridges are used (see Section 4.1.2). Furthermore, in collocated retrospectives of Case 1, the participants proposed that the tool made it possible to note what the other participants have found (see Table 4). It was also perceived that the visualization of the causes helped to outline the detected causes. In the distributed retrospective of Case 2, the participants perceived that the visualization of the detected causes is important and the tool helped to organize them (see Table 5). It was also claimed that the tool improved the analysis of the causes of problems and their relationships (see Table 5), probably one of the main advantages of RCA. To summarize, it seems that ARCA-tool is perceived useful in synchronous distributed retrospectives of small agile software teams. Probably we still need to continue its development by making slight improvements to it (see Section 5.1). However, the tool improves the contribution of participants and challenges them to consider the findings in-depth. RQ2: Is ARCA-tool perceived as easy to use in the distributed retrospectives of agile software teams? Considering the ease of use, ARCAtool was designed to be used in distributed retrospectives [8]. Additionally, we required that it enables conducting RCA [9]. Our aim was not to develop software supporting all kinds of different modeling needs, e.g., making complex software models [30]. Instead, we wanted to make a lightweight tool which is simple and easy to use in a small group of individuals, i.e., less than ten participants use the tool in a synchronous retrospective collaboratively, as introduced in [9]. ARCA-tool was perceived as easy to use in both cases. The number of participants was between three and five. In the collocated retrospectives of Case 1, the participants perceived that the tool made the analysis easier (see Table 4). In the distributed retrospective of Case 2, the participants perceived that the tool is learnable and intuitive (see Table 5). They also appreciated that only the necessary features are included in the tool. Additionally, it was noted in Case 2 that the way the tool automates the cause-and-effect structure improves its usability. Additionally, in the questionnaires, the participants from both cases evaluated the ease of use and learnability of the tool as high (see Table 3). It seems that the participants of Case 1 evaluated the ease of use and learnability with higher values than in Case 2. It is possible that this was due to the fact that the tool was new to the participants of Case 2, whereas the participants of Case 1 were already familiar with it. It is also possible that in distributed retrospectives, the perceived ease of use decreases. The participants are geographically dispersed, and therefore, asking assistance from others becomes more difficult. However, we did not observe such problems in the distributed retrospective of Case 2. To summarize, it seems that using ARCA-tool in distributed retrospectives does not make a major difference to its ease-of-use in

134 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) collocated retrospectives. The participants learn using the tool with a short introduction, and during the distributed retrospective they perceive that it is easy to use. RQ3: Is RCA perceived as a good approach to use in the distributed retrospectives of agile software teams? Both cases resulted in a similar finding. RCA was perceived as a good approach for retrospectives. This conclusion is well in line with prior studies. Problem prevention requires controlling the causes that create the problem. RCA makes it possible to detect the causes of the problem systematically and in-depth. In retrospectives [5,28], also with distributed settings, RCA helps the team members to consider the causes of their problems. This is important in order to make improvements in the team. Case 1 has used RCA previously, which indicates that the case organization have already found such an approach as useful in collocated retrospectives. Instead, the RCA approach was new to the participants of Case 2, but they were experienced with distributed retrospectives. In order to answer this research question, we use the results from Case 2 and compare them with Case 1. Regarding the interviews at Case 1, the key for finding the causes was the RCA method (see Table 4). The participants of Case 1 also perceived that the most critical causes of the target problem were found. Thus, the outcome of RCA was perceived accurate and useful in the collocated retrospectives of Case 1. Similarly, it was proposed in Case 2 that the structural approach of RCA improves their current practices by providing in-depth analysis. It was also noted that the RCA method improves the collaboration and conceptualization of the causes of problems. The participants of Case 2 also evaluated in the questionnaires that the detected causes were correct and their impact was high (see Table 3). Additionally, the participants of Case 2 evaluated that in contrast to their current practices the RCA approach is cost-efficient and easy to use (see Table 3). The core of RCA is the cause-effect diagram. Retrospectives using discussions only are concerned with the problem of it being difficult to remember all relevant findings and outline the findings as a whole. Level of detail and the coverage of the discussions are dependent on human memory. Retrospectives using RCA do not suffer the memory problem as the cause-effect diagram keeps the attention on relevant causes, but simultaneously helps the team members to remember the findings as they are registered to the diagram. In synchronous distributed retrospectives, this means that the cause-effect diagram has to be simultaneously reachable by all distributed team members. Otherwise conducting collaborative RCA would likely be difficult. In the distributed retrospective of Case 2, the RCA method was characterized as learnable, simple, intuitive, and straightforward (see Table 5). The results from the questionnaires are in line with the results from the interviews. The participants evaluated that the retrospective method helped to detect the causes of the target problems (see Table 3). The retrospectives of Case 1 were collocated and the retrospective of Case 2 was distributed. In both cases, ARCA-tool made the cause-effect diagram reachable for all participants. RCA worked well in the collocated retrospective of Case 1 and in the distributed retrospective of Case 2. There were no major differences in the evaluations of the case participants between the cases either. The empirical results from Case 2 are very similar with the results from Case 1. Thus, to summarize, we conclude that the RCA worked well in the synchronous distributed retrospective of Case 2. However, it required the tool for collaborative cause-effect diagramming Comparison to prior studies Regarding the scrum methodology [22], retrospectives are valuable and they should be conducted at the end of iterations. Our results are in line with this claim as both of our cases have used retrospectives accordingly and found them useful. Furthermore, the prior studies [5,28] introduce RCA as a part of retrospectives. Our results consolidate the prior studies by indicating that RCA is an important part of the retrospectives of small agile teams. The retrospective method used in this study is similar to the prior method called postmortem review [4] that also includes the step of RCA. Such method has been introduced as lightweight and useful for small software teams [5]. Respectively, Case 1 has used the method previously and found it useful. Furthermore, considering Case 2, their prior practices did not include RCA. They discussed positive and negative experiences and they tried to figure out how to make improvements in their development work activities, as recommended in the scrum methodology [22]. However, they did not create cause-effect diagrams or otherwise registered the causal structures of problems. The problems of the prior practices included informal discussions resulting in unfocused discussions and dominating team members who spoke over the others. When RCA was used in their distributed retrospective (see Section 4.2), the participants perceived that the method was better than their current practices. Prior work has also been conducted in the area of Group Support System (GSS). GSSs are systems whose main aim is to help individuals to arrive at correct decision in meetings effectively. GSS systems, such as one presented in [39], include features from three dimensions: (1) communication support, (2) process structuring and (3) information processing [40]. The features of communication support help in the information exchange between the participants [40]. The features of process structuring keep the meeting progressing according to the agenda [40]. Furthermore, the features of information processing provide access to important information, and enable sharing, aggregating, structuring, and evaluating the information [40]. The retrospective method together with ARCA-tool fulfills the three dimensions of GSS. Regarding the usefulness and ease-ofuse of ARCA-tool, we hypothesize that the tool provides communication support [40], especially in distributed retrospectives. The tool improves the information exchange around the problems and their causes. Additionally, the tool includes the features of the parallel communication, the anonymity of participants, and group memory [41]. Although our tool does not provide access to internal or external databases, the tool does makes it possible to model the important knowledge of participants through cause-effect diagrams, voting, cause classifications, and corrective actions. Therefore, we see that the tool also provides features for information processing [40]. Finally, we hypothesize that the retrospective method provides process structure [41] as it includes the rules for communication and process steps that are steered by a facilitator. ARCA-tool also records a cause-effect diagram that is a partial record of the meeting interaction and part of process support. The prior approach for distributed retrospectives [8], using a combination of s, spreadsheets, and an audio bridge, does not provide anonymity or parallel information exchange. Sending s between the participants is not an anonymous approach to exchange information. Furthermore, using spreadsheets does not provide parallel contribution to the outcome of retrospective. All individual spreadsheets need to be combined together. Additionally, describing and analyzing cause-effect relationships with spreadsheets is difficult [9]. Therefore, the distributed retrospectives also require collaborative cause-effect diagrams. Thus, we conclude that the retrospective method combined with ARCA-tool makes an improvement to the approach introduced in the prior work [8]. RCA is an important part of retrospectives and ARCA-tool improves them by providing communication support and information processing.

135 420 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Evaluation of the research This section discusses the validity of our empirical results using a validation scheme presented by [42]. Furthermore, as our results are based on the social construction of case companies, we will also use the evaluation principles of Interpretive Field Studies [43] in the validation scheme Construct validity Construct validity reflects the extent to which the studied operational measures represent what is investigated according to the research questions [42]. The participants represented experts while considering the current practices used in the retrospectives of their teams. Thus, we believe that they were able to compare the introduced retrospective method with their current practices. Additionally, the participants covered most of the organization members, i.e., the organization members of Case 1 and the team members of Case 2. Therefore, we believe that the research data was not biased by a homogenous group of individuals. Instead, various interpretations about the RCA approach and ARCA-tool were captured. This enabled us to draw out multiple interpretations about the study results, an important aspect for validity introduced in [43]. Using interviews and questionnaires were therefore reasonable data collection methods, which increases the construct validity [42]. However, our results are not based on the comparison of the outputs between the previous and the introduced retrospective method, as such information was not available for our purposes. Thus, even though the feedback from all case participants was highly positive, it should be noted that these evaluations are based on perceptions. Separating the effect of RCA from the use of ARCA-tool was also difficult. In the interviews with both cases and in the questionnaire of Case 2, we asked the participants to evaluate the tool and RCA approach separately. Instead, the questionnaire used in Case 1 asked the participants to evaluate ARCA-tool and the output of RCA thoroughly, but there were no questions about the RCA approach itself. Thus, in Case 1, separating the evaluations of the tool from the evaluations of the RCA approach was difficult. It was based on the interviews only. Therefore, regarding the RCA approach, we were not able to compare the questionnaire results between the cases External validity External validity is concerned with whether it is possible to generalize the findings of the study and to what extent they can be generalized [42]. Contextualization has been presented as an important principle for generalizing the study results [43]. Both cases varied and thus they evaluated RCA and ARCA-tool from slightly different perspectives. This increases the external validity [42]. The participants of Case 1 were experienced with the used retrospective method and ARCA-tool, but inexperienced on using it in a geographically distributed setting. Instead, the participants of Case 2 were experienced on conducting retrospectives in a geographically distributed setting, but inexperienced in using the retrospective method and ARCA-tool. The feedback from both cases, however, was very similar. Naturally, the evaluations of the case participants reflected the advances of the introduced retrospective method in comparison with the current practices. If the companies would have previously used the existing RCA software tools, perhaps, the feedback of ARCA-tool would have been different. Furthermore, we had only two cases in which one fully investigated the intended research questions. All of the retrospectives were conducted at the team level and the number of case participants in each retrospective was between three and five. Four teams were studied. DeSanctis and Gallupe [44] present in the study of group decision support systems that the nature of technological support is dependent on three important aspects: group size, membership proximity, and the task confronting the group. Our case contexts included only small groups, but the member proximity covered both extremes face-to-face and dispersed settings [44]. Furthermore, the tasks the groups confronted included analyses of problems faced at the agile software development organizations and teams. Thus, we cannot generalize our findings to organization wide distributed heavy-weight retrospectives using different RCA methods [9] and a higher number of participants. We can only conclude that our results are likely valid in similar case contexts to ours, i.e., geographically dispersed small agile software teams using retrospectives regularly in order to create continuous learning and improvements (see Section 4.1). We cannot conclude that the distributed retrospectives can fully substitute the face-to-face retrospectives either. Building trust in global software teams is crucial for success, which requires frequent communication, face-to-face meetings, and socialization [45]. The tool support for distributed retrospectives likely enables conducting retrospectives more frequently, which we assume would improve the communication. However, if the team members communicate on distributed settings only, then the risk for decreased information exchange and feedback increases [45]. Finally, considering the uniqueness of ARCA-tool, the evaluation of the prior RCA tools in Section 2.3 was not complete due to an excessive number of hits to our search strings in Google. However, we used five additional data sources. We studied all the tools listed in the two websites of useful RCA tools. Additionally, we searched for RCA tools in Sourceforge.com, but did not find any tools suitable for doing RCA. Finally, we searched for RCA tools in Google Scholar and Scopus. The data from these six sources resulted in 35 RCA tools that we compared with our ARCA-tool. In contrast to ARCA-tool, none of the other tools matched all the seven aspects used in our comparison. However, it is still possible that a similar tool to our ARCA-tool exists. Nevertheless, according to the authors best knowledge our comparison of 35 RCA software tools is the largest one existing Reliability Reliability is concerned with the extent to which data and analysis are dependent on a specific researcher [42]. Klein and Myers [43] state that the social tie between the researchers and participants should be critically reflected in order to evaluate the validity of results. The retrospectives were conducted by the employees of the companies. Thus, it is possible that the case participants overstated the goodness of the retrospective method in the questionnaires and interviews as they conducted the method by themselves. It is also possible that the social tie between the researchers and participants biased the results. We controlled this risk by using triangulation in data collection [46] through observations, video recording, questionnaires, and interviews which increases the reliability of our results. In the observations, we did not note any practical issues during the retrospectives. Additionally, our observations indicate that the case participants truly liked the used retrospective method. Furthermore, considering the data analysis, there is a slight risk for researcher bias which is a common problem in qualitative data analysis. While the number of interviews increases, summarizing the results becomes challenging as people answer the same questions differently. We controlled this risk by using questionnaires. Similar responses from the questionnaire forms make it unlikely that researcher bias would have had large effect on the qualitative results. Additionally, our conclusions were based on both (1) the analysis of individual parts of research data and (2) the analysis of all research data combined together, the key principle in Interpretive Field Research, called Hermeneutic Circle [43]. Our conclusions are also in line with prior literature (see Section 2.1).

136 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) The approach of RCA has already been introduced as valuable for retrospectives [5,28]. The prior literature did not tell the story behind our results, a threat to validity introduced in [43], but consolidate our findings. Our conclusions are explicitly derived from the results (see Section 5), as can be seen in Section Conclusions and future work This paper proposed a real-time cloud-based tool for solving the problem of being infeasible to conduct collocated retrospectives in geographically distributed software teams [8]. ARCA-tool enables conducting collocated and distributed retrospectives with RCA. The most important feature of the tool is the up-to-date real-time view of the retrospective outcome. Additionally, the tool provides features for the co-creation of cause-effect diagrams, the development of improvement ideas, the voting of the causes and improvement ideas, and support for organizational learning by allowing the data exploration of past retrospectives. Finally, our analysis of 35 prior RCA tools showed that none of the prior tools had all the main features of our ARCA-tool (see Section 2.3). We evaluated the tool and RCA approach in industrial field studies with four different teams in two different software companies. Although our field study context differed (See Section 4.1), the results are remarkably similar in both contexts. The results indicate that using ARCA-tool in the synchronous collocated and distributed retrospectives of small agile software teams is useful and easy. The tool was perceived as useful and highly easy to use and learn. It was claimed that the tool increases the efficiency of retrospectives and helps to visualize the causes of problems. Additionally, the field study evaluated the use of RCA in synchronous distributed retrospectives. RCA was perceived as highly useful because of its structural approach, which improves collaboration and provides deeper analysis challenging the team members to consider the problems in-depth. After this case study was conducted, both case organizations have continued using the RCA approach with ARCA-tool in their retrospectives. In addition, Case 1 substituted all of their collocated retrospectives with distributed retrospectives. In the future, we are planning to continue the development and evaluation of ARCA-tool and RCA as other companies have also expressed interest in them. During the first year after the release, ARCA-tool website 6 has had over 63,000 page views with 1357 unique visitors (63.5% are returning visitors) with an average visiting time of slightly less than 8 min. Our motivation for the development of ARCA-tool was academic, i.e., to develop and evaluate an open source solution, which is freely available for anyone who needs it. Obviously, ARCA-tool also has business potential through SAAS business models. However, replicating studies are needed. The tool and the RCA method should be evaluated with different case contexts including larger group sizes and various target problems. Prior literature on group decision support systems [40,41,44] could help to evaluate the tool from various important aspects. ARCA-tool was published under MIT license in order to enable replication studies and future business applications of the tool in a range of settings. Acknowledgements The authors would like to thank the companies participating in the field studies and software engineering students implementing ARCA-tool, in alphabetical order: Helin Anssi Matti, Hovi Roope, Jaanto Jari, Kekäle Mika, Kere Markus, Koistinen Joona, Laukkanen Eero, Patana Jussi, Rihtniemi Pekka, Saarinen Jerome, Sevenius Toni, Valjus Mikko, and Viitanen Jonne. 6 Appendix A. Raw data of RCA tool comparison Costs From Idea development Voting Knowledge management Cause-effect diagram Software Client: native/browser Real-time collaboration ARCA tool Browser Yes Yes, graph Yes Yes Yes Free (MIT) - wirca.soberit.hut.fi Google Docs Drawing Browser Yes Yes, graph Yes Yes No Free to use Author drive.google.com Various By using shapes with numbers. By using different shapes for causes and ideas. drawing capabilities are provided, e.g., shapes with text and arrows. These can be used to create a CED graph. TapRooT Enterprice ed. Both (Yes) Yes, tree Yes No Yes Fee Google, Opentube & Root Cause Live (continued on next page)

137 422 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Appendix A. (continued) Software Client: native/browser Real-time collaboration Cause-effect diagram Idea development Voting Knowledge management Costs From Native software + IE7 support (if activex components) Likely not real-time, but accessible and editable. collaboration of multiple investigators at separate locations over the net AND the user has access to edit or view all Audit/ Investigation data as well as the ability to status Corrective Actions. When an Audit/Investigation is created or edited the user is provided the 7- Step Process Flow where the progress of the audit/investigation can be tracked and each technique can be viewed and edited report incidents, analyze root causes, develop corrective actions, write and approve reports, track fixes, validate the effectiveness of the fixes, and trend performance REASON Browser No No Yes No Yes Fee Google #, Opentube & Root Cause Live, Scopus reason 9 is an online based software AND Individual investigator. Timeline is used instead. The whole process resulting to the issue is modeled. Many timelines can be created. The preserve and communicate the knowledge learned

138 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) ESASP V the software runs in a web browser timelines result to a causal model which is a tree of causes automatically created. enables you to manage and track your corrective action plans and communicates the lessons learned from your problem solving activities XFRACAS Browser (Yes) No Yes NO Yes Fee Opentube & Root Cause Live The system s Webbased user interface allows for easy access, collaboration and deployment throughout multiple sites, suppliers and dealers. Support for real-time system is not explicitly indicated, however, people can access to the analysis from various computers. Trees or graphs are not provided. XFRACAS allows you to categorize the incident with the Function > Failure > Effect > Cause that will map the event to a new or existing FMEA analysis and corrective action software AND configure XFRACAS to support any problem resolution methodology, from 4 to 8 steps, such as the four step DCOV process, the five step Six Sigma DMAIC process or the eight step 8D. build a knowledge base of lessons learned that will be instrumental to future troubleshooting and development efforts RCAT Software?? Yes, tree?? Yes? Opentube (continued on next page)

139 424 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Appendix A. (continued) Software Client: native/browser Real-time collaboration Cause-effect diagram Idea development Voting Knowledge management Costs From nsc.nasa.gov/rcat/ Create and edit a fault tree, Create and edit an event and causal factor tree perform and document root cause analysis, identify corrective actions, perform trending, and generate data usable in precursor analysis and probabilistic risk assessment available free to government Agencies and contractors PathMaker Native YES Yes, tree PathMaker s cause and effect diagram, or Ishikawa diagram tool helps users discover the root causes of problems. pathhome.asp The tool is native software, which requires installation for each PC. use the open floor option to allow anyone using the same PathMaker project as you to enter their ideas with yours in real time AND This tool s design, based on the classic brainstorming method.., allows the team recorder to keep pace with group thinking. Yes No Yes Fee Opentube & Root Cause Live Ideas can be entered to the tool. We use PathMaker for our Process Improvement Program and have implemented an SPC program which has reduced our process cycle times by 25-50%. Also use for our organizational Strategic Planning process. - John Heinrich RealityCharting (Cause link) Native No Yes, tree Yes identify effective solutions No Yes Fee Google # & Opentube Web-browser based application only for reports. Native software for analysis. You need to share your analysis by exporting a file. Manual updating is required. Users need to click a button to refresh the content. At the client software, a tree diagram can be created. Solutions can be entered to the tree diagram which embeds the ideas in the causes. Store RCAs and data. Maintain causal relationships. Knowledge management

140 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) CED can be created only at the client software. Solve?? Yes, tree???? Opentube & Root Cause Live Gpw Computer Consulting, Inc. solverootcauseanalysis.htm The found link to the page does not work. Additionally, the tool cannot be found from Google. SIM Ò software (tripod beta) Native (Yes) Yes, tree Yes? Yes Fee Opentube & Root Cause Live incident-management/simrsimple-incident-analysismethod incident analysis report is automatically generated by the SIM Ò software, which can be further edited in MS- Word Support for real-time system is not explicitly indicated. the incident will be analysed by the people within the department in which the incident took place The program will ask you to indicate why the event could take place. After that, the same question will be repeated 4 more times, so that the analysis tree can indicate 5 layers of causation. The corrective actions are a unique part of the SIM Ò analysis. Provides a report which can be further edited in MS-Word. PROACT Native (Yes) Yes, tree (Yes)? Yes Fee Google #, Opentube & Root Cause Live proact_templates.html Native client installation is required. Real-time capabilities are not explicitly stated. However, it seems that the tool provides some team work features through sharing and permissions. PROACT Ò RCA provides the tools for the RCA analyst to easily document, validate, report and track findings and recommendations. PROACT automatically builds your knowledge management database of completed analyses creating (continued on next page)

141 426 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Appendix A. (continued) Software Client: native/browser Real-time collaboration Cause-effect diagram Idea development Voting Knowledge management Costs From Robust Root Cause Analysis Process for Collaboration, Trending, Streamlining & Standardization of Analyses AND Team Permissions - Access, Read-Only, Delete your own customized and interchangeable templates for future incident investigations. Investigation Catalyst Native No No (No) No No Free (GPL) Opentube & Root Cause Live code.google.com/p/meslib/ source/checkout Native client Installation is required. The software works only on mac. However, data entries can be made with webbrowsers. This requires using third party services. The entries need to be imported to the system. To use computers for remote data entry with Web Browsers on computers with Windows, Linux or Mac operating systems, contact Starline to set up a private server URL for your passwordprotected project files and designate an e- mail account to which data entered remotely will be forwarded for importing into Mac project work files. MES worksheet matrixes are used instead. A major difference between the MES investigation system and current The system is introduced as a solution for analyzing problems. This in turn helps to make improvements. However, making the improvements is not introduced as a part of the system. Single case reports are provided, however, you cannot combine many reports together.

142 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) investigation paradigms is that MES uses a process model of phenomena, instead of a causation model. Blackbox (tripod beta) Native? Yes, tree (Yes)? Yes Fee Google# & Opentube incident-management/ blackbox Native client installation is required. Not explicitly stated. Blackbox automatically creates a clear and standardized report, including an incident report, a cause tree and recommendations Provides a report including recommendations and causes detected. Investigator 3 (tripod beta) Native? Yes, tree Yes? Yes Fee Google# & Opentube software/incidentmanagement/investigator-3 Native client installation is required. Investigator 3 supports all the stages of the incident investigation process, from initially identifying what happened, through the analysis process and to writing the recommendations. Investigator 3 supports a perfectly editable native Word export to make your report meet your organizations standards. Track (tripod beta) Native No No No No Yes Fee Opentube (continued on next page)

143 428 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Appendix A. (continued) Software Client: native/browser Real-time collaboration Cause-effect diagram Idea development Voting Knowledge management Costs From incident-management/track Native client installation is required. The tool is based on questionnaires about incidents. Questionnaires are followed with it follows questions. The software outputs a track incident report which includes the distributions of cause types and actual causes organized as a structural list. Web-based Quality Management Tool CorrectiveAction.htm Browser (works only in IE 6 or higher) A web-based quality system for Manufacturing, Automotive, OEM/ODM, Food and Drug, and Service industries Yes No Yes No Yes Fee Google # Collaborating suppliers, departments, and divisions in global scale AND Real-time corrective/ preventive tracking and reporting Forms to conduct RCA are provided. The causes are not organizer as CED. Sharing and monitoring improvement activities Managing ALL types of corrective/ preventive actions and monitor action progress online RealityCharting Ò Software Both Yes Tree Yes Yes Yes Fee Google #, Opentube & Root Cause Live software (Internet Explorer 7+ standalone client for mac and Windows) The Track Changes tool allows groups of people to work together and share constructive input with the visible notification of any addition, deletion, reposition, or text change of a cause. The solution generation process systematically moves from cause to cause allowing you to propose solutions until each cause has been reviewed. The assessment evaluates each solution against the 5 default criteria. To change a default criteria setting, select the related field and type in your own criteria entry. The Action Item Report stores automatically generated action items from evidence fields and cause path endings. ABS Consulting Root Cause Map? Yes? Yes? Yes Fee Google # & Root Cause Live

144 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) The web-based system is designed to capture, analyze and report all adverse impacts to your organization. AND Real-time reporting and management dashboards Corrective Actions/Preventive Actions (CAPA) Management Centralized system (multiple language support) One location for all incidents, events, investigations, recommendations, root causes and action items AND Flexible classification system OSHA, EU, ABS Consulting or any other standards ROOT-CAUSE-ANALYSIS-SOFTWARE CAUSE-ANALYSIS/ Native No Tree No No No Fee Google # Native client installation is required. ThinkReliability Excel Template Native No Tree Yes No No Free (MS Excel) excel-tools.aspx MS Excel is required in order to run this template. This is an excel sheet only. Furthermore, it does not work in Google Drive and thus real-time collaboration is not an option. Google # Enablon IMS?? Tree Yes? Yes Fee Google # Creation of corrective and preventive action plans Enablon IMS meets all event (incidents, accidents, etc.) reporting, management and monitoring needs, both for individual sites and the Group as a whole. AND Management & monitoring of corrective & preventive action plans (continued on next page)

145 430 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Appendix A. (continued) Software Client: native/browser Real-time collaboration Cause-effect diagram Idea development Voting Knowledge management Costs From Smartdraw Native No Tree No No No Fee Google # Requires to install client software This software is used from one PC. The user can export a file, which can be opened from other PC. Only causes can be added to the diagram. Only comments can be added The software is only for drawing, not for analysis of many drawings. Set-Based Thinking Native No Graph Yes? Yes Fee Google # com/tcc-can-help/ software-features/ Based on the screenshots, the tool requires to install client software A3 reports are used to collect together a set of visual models which concisely tell the story of the ongoing discussion. packages are designed to chart the particular relationship they are analyzing just fine; but generally we need to pull together analyses from many different tools of many different relations that must all be considered when making a decision The corrective actions are called as decisions. The feature list of the tool states that identify what decisions must change to implement those remedies [of causes] The tool is used to combine knowledge from various sources (e.g. individuals). This includes results from root cause analysis and decisions made. PHRED (Browser) (Yes) (Yes) Yes? Yes Fee Google com/customerfocus.html PHRED is a web-based problem solving system. It makes it easy for you, your suppliers and contract manufacturers to enter, edit and manage problems. Problem solvers, experts and managers share a common process and information The causes are detected by using questions only. In the end, software provides a report which is a tree based diagram. PHRED takes you through outlining a solution and presenting it for agreement, sign-off and implementation. PHRED tracks the multiple implementation actions. The tool includes a database which is used to Share Root Cause information between people and plants. AND

146 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Standard reports, individually defined user query reports, management summaries and charts. Export the information into Excel or PDF. Send only the information you want to share with your customers and suppliers. BowTieXP Native No Tree (No)? Yes Fee Google software/incident-analysis/ incidentxp/rca The screenshot of the tool reveals that the software requires client installation There is no evidence that the tool is used to create corrective actions. It seems that the tool is used to detect causeand-effects only. The output of RCA can be refined and stored. Ongoing effort shall be made to examine ways in which a similar improved learning from incidents can be realized by correlating RCA with bowtie analysis. One could think about classification of events, and perhaps correlation of events to barriers in the bowtie diagrams. FMEA Software Native (No) Tree Yes No Yes Fee Google APIS_FMEA_software.html Requires client installation in order to be used. Shares the information through the web, but the analysis is conducted on a single PC. The tool provides features for developing and registering corrective actions. procedure of focusing on what can go wrong, what possibly could cause it and (continued on next page)

147 432 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Appendix A. (continued) Software Client: native/browser Real-time collaboration Cause-effect diagram Idea development Voting Knowledge management Costs From what are the potential effects. Quantification of the risk, taking into account the current controls, then indicates areas of weakness. It is widely used in manufacturing industries in various phases of the product life cycle. Failure causes are any errors or defects in process, design, or part, especially ones that affect the customer. Systems2win Native No Graph No No No Fee Google # solutions/brainstorming.htm Is an Excel template. Is an Excel template which is used to substitute whiteboards in brainstorming sessions ireliability Root Cause Analysis Browser No Tree Yes No (No) Fee Google root-cause-analysis/ The analysis is conducted on a single PC. Track and facilitate implementation activities The existing RCAs are stored and the user can view and refine them. However, there is no evidence that the user can share the information with other users (e.g. organization s members) FMECA Software Native No Tree (No) (No) (Yes) Fee Google fmeca.html?gclid= Require client installation The analysis is conducted on a single The screenshots Nothing indicates that corrective This feature is not explicitly stated,

148 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) CNbprZTd9bkCFWd 7cAodG2cAjQ PC. indicate that the tool supports tree diagrams. Additionally, an enhanced hierarchy tree and tabular views AND Graphically constructed system hierarchy diagrams actions are registered with the tool. however, it is stated that build and open multiple systems and project files AND Powerful reporting and charting facilities. Rapid Problem Isolation Native No Tree No No (Yes) Fee Google /it-incident-managementtool/root-cause-analysissoftware/ download and install a small-footprint collector that communicates securely with our cloudbased application. This software is used to link technical solutions to the business application, automatically. The software is used to view the technical causeand-effect linkages only. The personnel get information about the technical solution linkage to the business solutions. However, it seems that this information needs to be shared manually. Lassale Native No Tree No No (No)? Google, Google Scholar gov/pmc/articles/ PMC419418/#!po= Screenshots indicate that a client installation is required. Additionally, it is stated that designed for Visual Basic The software is used to create cause-and-effect diagrams only. The analysis is run on a single PC. However, it is stated that The database backend is SQL Server. CA Spectrum?????? Fee Google root-cause-analysis.aspx RootCause? No (No) Yes? (Yes) Fee Google Root_Cause_Analysis.aspx It seems that the analysis is conducted on a single PC and the results can thereafter be used in future analyses. The tool uses questions answered by the user. Thereafter, a report is made. Send action items to multiple recipients and track their progress Not explicitly stated. However, Import RL6:Risk data into your root cause analysis, to reduce rework and (continued on next page)

149 434 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Appendix A. (continued) Software Client: native/browser Real-time collaboration Cause-effect diagram Idea development Voting Knowledge management Costs From No indication that there is a cause-andeffect diagram available. reduce the chance of errors AND Monitor your ongoing improvement in frequency of process failures with the RL6 Report Center Speechminer?????? Fee Root Cause Live speechminer/ Root Cause Analyst (Native)? (Tree)?? (No) Fee Root Cause Live Products/ RootCauseAnalyst/ RCAProductDescription. aspx Programmed in Visual Basic Seems to be a tool that uses questions which the user answers. Thereafter, the user can view the cause-andeffects as a tree. One-button generation of flowcharts & factor trees There is no evidence that the tool provides any features for refining and sharing prior analyses over the tool. However, it is stated that Import & export analyses and Factor Guides AND All reports generated in Microsoft Office format. RCA GUI Native No Tree Yes No Yes? Scopus journals.htm?articleid= &show =abstract Screenshots reveal that the software require client installation The application is used by a user, who uses the tool by asking experts over the root causes detected. Experts in the PCA manufacturing industry were questioned over the most likely root cause of the problem from The application creates a tree diagram based on the information gathered automatically from a technical system. the user can....propose it as a design change to eliminate the manufacturing defect investigated. The tool seems to be integrated to a knowledge management system. Additionally, the system provides assistance for the user based on prior knowledge, e.g.,

150 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) the software provides guidance to the user on the investigation of a defect based on previous knowledge formalized in the form of integrated models. those provided by the RCA module No = this feature is not available in the software tool, Yes = this feature is available in the software tool, (No) = it is likely that this feature is not available in the software tool, but we were not able to verify that, (Yes) = it is likely that this feature is available in the software tool, but we were not able to verify that,? = we were not able to find any evidence on the occurrence of this feature, Fee = the software is subject to a charge, Free (license) = the software is free, Free to use = using the software is free. Google = Found from our Google search root cause analysis software. Google# = Found from our Google search root cause analysis software free. Open-tube = Found from Root Cause Live = found from Scopus = found from scopus.com. Google Scholar = found from Google Scholar. Appendix B. Questionnaire used in Case 1 1. What is your title? 2. Select the roles that describe your responsibility best a. Manager b. Product Owner c. Developer d. Something else, what? 3. How long have you worked in this role(s)? 4. How long have you worked at the company? 5. Target Problem Give a value (1 = very minor, 2, 3, 4, 5 = very major) that corresponds the question best a. Effort the company has used to try to prevent the target problem earlier b. The internal impact of the target problem for the company c. The external impact of the target problem for the company d. The impact of the target problem for team s communication 6. Target problem causes Give a value (1 = very minor, 2, 3, 4, 5 = very major) that corresponds the question best a. The correctness of the detected causes b. The correctness of the detected root causes c. Impact of resolving the found causes of the problems 7. Retrospective method Give a value (1 = very minor, 2, 3, 4, 5 = very major) that corresponds the question best a. The easiness to collect the causes b. The easiness to detect the root causes c. The easiness to organize the causes d. The easiness to detect the root causes of the target problem e. My own contribution in the retrospective f. The openness of the communication in the retrospective Give a value (1 = absolutely NO, 2, 3, 4, 5 = absolutely YES) that corresponds the question best g. Is the ARCA tool easy? h. Is the ARCA tool learnable? i. Did the ARCA tool help in finding problems and causes? j. Would you have found the same causes without the tool? k. Would the retrospective have been easier without the tool? l. Would the retrospective have been more effective without the tool? Appendix C. Questionnaire used in Case 2 1. What is your title 2. Select the roles that describe your responsibility best 3. Target Problem Give a value that corresponds the question best [1 = very low, 2, 3, 4, 5 = very high] a. Effort the company has used to try to prevent the target problem (or similar ones) earlier (continued on next page)

151 436 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) b. Internal impact of the target problem for the company 4. Target problem causes a. Correctness of the detected causes b. Impact of resolving the found causes of the problems 5. Retrospective method Give a value that corresponds the question best [1 = very low, 2, 3, 4, 5 = very high] a. Easiness to collect the causes b. Easiness to detect the root causes c. My own contribution in the RCA session was d. Openness of the communication in the RCA session was e. Compared to our team s current process improvement practices, do you find using the RCA method cost efficient? f. Compared to our team s current process improvement practices, do you find using the RCA method easy? g. Compared to our team s previous practices to find causes behind problems or issues, do you find using the RCA method useful? h. The easiness to detect the causes of the target problem was. i. To solve the target problem, was detecting causes for the target problem useful? j. In contrast to the company s practices, was the method used to detect target problem causes useful? k. Was it easy to detect the target problem causes? 6. The ARCA tool Give a value that corresponds the question best [1 = very low, 2, 3, 4, 5 = very high] a. Compared to an RCA session done by using post-it notes, do you find using the online ARCA-tool cost efficient? b. Compared to an RCA session done by using post-it notes, do you find the online ARCA-tool easy to use? c. Compared to our team s previous process improvement practices, do you find the online ARCA-tool useful? d. Is the online ARCA tool easy to use? e. Is the online ARCA tool easy to learn? f. Did the online ARCA tool help to find problem causes? g. Would we have found the same problem causes without the tool? h. Was it difficult to organize the problem causes with the online ARCA tool? i. In contrast to our company practices, is the online ARCA tool cost efficient to detect process improvement targets? References [1] K.C. Desouza, T. Dingsøyr, Y. Awazu, Experiences with conducting project postmortems: reports versus stories, Softw. Process Improve. Pract. 10 (2005) [2] R.J. Latino, K.C. Latino (Eds.), Root Cause Analysis: Improving Performance for Bottom-Line Results, 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL , CRC Press, [3] T. Dingsøyr, N.B. Moe, Ø. Nytrø, Augmenting experience reports with lightweight postmortem reviews, in: PROFES 01 Proceedings of the Third International Conference on Product Focused Software Process Improvement, 2001, pp [4] T. Dingsøyr, Postmortem reviews: purpose and approaches in software engineering, Inf. Softw. Technol. 47 (2005) [5] F.O. Bjørnson, A.I. Wang, E. Arisholm, Improving the effectiveness of root cause analysis in post mortem analysis: a controlled experiment, Inf. Softw. Technol. 51 (1) (2009) [6] J.D. Herbsleb, D. Moitra, Global software development, IEEE Softw. 18 (2001) [7] T. Jaanu, M. Paasivaara, C. Lassenius, Near-synchronity and distance: instant messaging as a medium for global software engineering, in: 2012 IEEE Seventh International Conference on Global Software Engineering, 2012, pp [8] J. Terzakis, Virtual retrospectives for geographically dispersed software teams, IEEE Softw. 28 (2011) [9] T.O.A. Lehtinen, M.V. Mäntylä, J. Vanhanen, Development and evaluation of a lightweight root cause analysis method (ARCA method) field studies at four software companies, Inf. Softw. Technol. 53 (10) (2011) [10] D.N. Card, Learning from our mistakes with defect causal analysis, IEEE Softw. 15 (1) (1998) [11] M. Leszak, D.E. Perry, D. Stoll, A case study in root cause defect analysis, in: Proceedings of the 2000 International Conference on Software Engineering, 2000, pp [12] R.B. Grady, Software failure analysis for high-return process improvement decisions, Hewlett-Packard J. 47 (4) (1996) [13] P. Jalote, N. Agrawal, Using defect analysis feedback for improving quality and productivity in iterative software development, in: Proceedings of the Information Science and Communications Technology (ICICT 2005), 2005, pp [14] B. Andersen, T. Fagerhaug (Eds.), Root Cause Analysis: Simplified Tools and Techniques, Tony A. William American Society for Quality, Quality Press, United States, Milwaukee 53203, [15] News and notes from the Google Drive and Docs teams, Introducing Google Docs drawings, Docs Blog, 13th April [16] M. Kalinowski, G.H. Travassos, D.N. Card, Towards a defect prevention based process improvement approach, in: Proceedings of the 34th EUROMICRO Conference on Software Engineering and Advanced Applications, Parma, Italy, 2008, pp [17] R.G. Mays, Applications of defect prevention in software development, IEEE J. Sel. Areas Commun. 8 (1990) [18] M. Siekkinen, G. Urvoy-Keller, E.W. Biersack, D. Collange, A root cause analysis toolkit for TCP, Comput. Netw. (2008) [19] T. Stålhane, Root Cause Analysis and Gap Analysis A Tale of Two Methods, EuroSPI 2004, Trondheim, Norway, 2004, pp [20] I. Bhandari, M. Halliday, E. Tarver, D. Brown, J. Chaar, R. Chillarege, A case study of software process improvement during development, IEEE Trans. Softw. Eng. 19 (12) (1993) [21] Z.X. Jin, J. Hajdukiewicz, G. Ho, D. Chan, Y. Kow, Using root cause data analysis for requirements and knowledge elicitation, in: International Conference on Engineering Psychology and Cognitive Ergonomics (HCII 2007), Berlin, Germany, 2007, pp [22] K. Schwaber, J. Sutherland, Scrum Guide, Scrum Alliance, [23] J.J. Rooney, L.N. Vanden Heuvel, Root cause analysis for beginners, Qual. Prog. 37 (7) (2004) [24] F.T. Anbari, E.G. Carayannis, R.J. Voetsch, Post-project reviews as a key project management competence, Technovation 28 (2008) [25] R.L. Glass, Project retrospectives, and why they never happen, IEEE Softw. 19 (2002) [26] L. Williams, What agile teams think of agile principles, Commun. ACM 55 (2012) [27] W.F. Boh, S.A. Slaughter, A.J. Espinosa, Learning from experience in software development: a multilevel analysis, Manage. Sci. 53 (2007) [28] T. Stålhane, T. Dingsøyr, G. Hanssen, N. Moe, Post mortem an assessment of two approaches, Emp. Methods Stud. Softw. Eng. (2003) [29] T.O.A. Lehtinen, M.V. Mäntylä, What are problem causes of software projects? Data of root cause analysis at four software companies, in: ESEM 11 Proc. of the 2011 International Symposium on Empirical Software Engineering and Measurement, 2011, pp [30] F. Lanubile, C. Ebert, R. Prikladnicki, A. Vizcaíno, Collaboration tools for global software engineering, IEEE Softw. 27 (2010) [31] M. Jiménez, M. Piattini, A. Vizcaíno, Challenges and improvements in distributed software development: a systematic review, Adv. Softw. Eng (2009) 3. [32] F.Q. da Silva, C. Costa, A.C.C. França, R. Prikladinicki, Challenges and solutions in distributed software development project management: a systematic literature review, in: th IEEE International Conference on Global Software Engineering (ICGSE), 2010, pp [33] L. Williams, D. Grayson, J. Gosbee, Patient safety incorporating drawing software into root cause analysis software, J. Am. Med. Inform. Assoc. 9 (2002) S52 S53. [34] W. Vantine, K. Benfield, D. Pritts, K. Ballard, Evaluating and incorporating new age software technology for identifying systemic root causes, in: Joint ESA- NASA Space-Flight Safety Conference, 2002, pp [35] L. Huertas-Quintero, P. Conway, D. Segura-Velandia, A. West, Root cause analysis support for quality improvement in electronics manufacturing, Assem. Autom. 31 (2011) [36] S. Lee, J.F. Courtney, R.M. O Keefe, A system for organizational learning using cognitive maps, Omega, Int. J. Manage. Sci. 20 (1992) [37] T.C. Lethbridge, S. Elliott Sim, J. Singer, Studying software engineers: data collection techniques for software field studies, Emp. Softw. Eng. 10 (3) (2005) [38] S. Jamieson, Likert scales: how to (ab) use them, Med. Educ. 38 (2004) [39] J.F. Nunamaker, A.R. Dennis, J.S. Valacich, D. Vogel, J.F. George, Electronic meeting systems, Commun. ACM 34 (1991) [40] I. Zigurs, B.K. Buckland, A theory of task/technology fit and group support systems effectiveness, MIS Quarterly (1998) [41] A.R. Dennis, C.K. Tyran, D.R. Vogel, J.F. Nunamaker Jr., Group support systems for strategic planning, J. Manage. Inf. Syst. 14 (1997)

152 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) [42] P. Runeson, M. Höst, Guidelines for conducting and reporting case study research in software engineering, Emp. Softw. Eng. 14 (2008) [43] H.K. Klein, M.D. Myers, A set of principles for conducting and evaluating interpretive field studies in information systems, MIS Quarterly (1999) [44] G. DeSanctis, R.B. Gallupe, A foundation for the study of group decision support systems, Manage. Sci. 33 (1987) 5. [45] N.B. Moe, D. Šmite, Understanding lacking trust in global software teams: a multi-case study, in: Product-Focused Software Process Improvement, Springer, 2007, pp [46] T.D. Jick, Mixing qualitative and quantitative methods: triangulation in action, Adm. Sci. Q. 24 (1979)

153 Article IV IV Perceived causes of software project failures An analysis of their relationships Timo O.A. Lehtinen, Mika V. Mäntylä, Jari Vanhanen, Juha Itkonen and Casper Lassenius Journal of Information and Software Technology, Volume 56, Issue 6, June 2014, Pages Elsevier B.V. Reprinted with permission.

154

Information and Software Technology 56 (2014) 623 643 Contents lists available at ScienceDirect Information and Software Technology journal homepage: www.elsevier.

Mäntylä, Jari Vanhanen, Juha Itkonen, Casper Lassenius Department of Computer Science and Engineering, School of Science, Aalto University, P.O.

155 Information and Software Technology 56 (2014) Contents lists available at ScienceDirect Information and Software Technology journal homepage: Perceived causes of software project failures An analysis of their relationships Timo O.A. Lehtinen, Mika V. Mäntylä, Jari Vanhanen, Juha Itkonen, Casper Lassenius Department of Computer Science and Engineering, School of Science, Aalto University, P.O. Box 19210, FI Aalto, Finland article info abstract Article history: Received 31 May 2013 Received in revised form 13 December 2013 Accepted 17 January 2014 Available online 10 February 2014 Keywords: Root cause analysis Cause and effect relationships Software project failure Multiple case study Context: Software project failures are common. Even though the reasons for failures have been widely studied, the analysis of their causal relationships is lacking. This creates an illusion that the causes of project failures are unrelated. Objective: The aim of this study is to conduct in-depth analysis of software project failures in four software product companies in order to understand the causes of failures and their relationships. For each failure, we want to understand which causes, so called bridge causes, interconnect different process areas, and which causes were perceived as the most promising targets for process improvement. Method: The causes of failures were detected by conducting root cause analysis. For each cause, we classified its type, process area, and interconnectedness to other causes. We quantitatively analyzed which type, process area, and interconnectedness categories (bridge, local) were common among the causes selected as the most feasible targets for process improvement activities. Finally, we qualitatively analyzed the bridge causes in order to find common denominators for the causal relationships interconnecting the process areas. Results: For each failure, our method identified causal relationships diagrams including causes each. All four cases were unique, albeit some similarities occurred. On average, 50% of the causes were bridge causes. Lack of cooperation, weak task backlog, and lack of software testing resources were common bridge causes. Bridge causes, and causes related to tasks, people, and methods were common among the causes perceived as the most feasible targets for process improvement. The causes related to the project environment were frequent, but seldom perceived as feasible targets for process improvement. Conclusion: Prevention of a software project failure requires a case-specific analysis and controlling causes outside the process area where the failure surfaces. This calls for collaboration between the individuals and managers responsible for different process areas. Ó 2014 Elsevier B.V. All rights reserved. 1. Introduction Corresponding author. Tel.: addresses: timo.o.lehtinen@aalto.fi (T.O.A. Lehtinen), mika.mantyla@ aalto.fi (M.V. Mäntylä), jari.vanhanen@aalto.fi (J. Vanhanen), juha.itkonen@aalto.fi (J. Itkonen), casper.lassenius@aalto.fi (C. Lassenius). The discipline of software engineering (SE) was born in 1968 due to software project failures [1]. Preventing software project failures is the main objective of software process improvement (SPI) as it aims at lowering the costs of development work, shortening the time to market, and improving product quality [2]. When considering preventive measures, analyzing the causes of failures becomes important as they explain why the failures occur [3]. This requires understanding the causal relationships, i.e., the causes of failures and their effects. Analyzing the causal relationships between the causes helps develop effective and feasible software process improvement ideas [4,5] as it allows extracting cause and effect relationships [6] that can then be used in process improvement. While trying to understand why a failure occurs, it is important to analyze all relevant areas of work [7]. Prior studies of software project failures [5,8 24] support this, as the causes of failures reported are spread over various areas including project management, requirements engineering, and implementation. Furthermore, prior studies indicate that the causes are interconnected [8,25], i.e., they have causal relationships between one another. Thus, in order to control the causes of failures, it is important to understand their causal relationships. Even though software project failures and their causes are widely studied [8 18], prior studies have failed to explain and present the causal relationships between the identified causes. In /Ó 2014 Elsevier B.V. All rights reserved.

156 624 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) the prior studies, the common causes of failures are often presented as lists where the causes are isolated from one another. This unlikely reflects the real case in software companies [8,25]. Surveys [8,12 14,16,26] and interviews [10,11,17] have been commonly used to detect the causes of failures [27], but not to explain the causal relationships between the causes. In contrast to prior studies, this study utilizes root cause analysis (RCA), allowing explicitly identifying the causal relationships between the identified causes of project failures. RCA is a structured investigation of a problem to detect the causes that need to be controlled [28]. RCA takes the problem as an input and provides a set of perceived causes with causal relationships as an output [4]. It aims to state what the causes of the problem are, where they occur and why they occur. This helps with software process improvement in various contexts [4,5,19,21,23,24,29 36], and across software organizations [31]. There are many RCA methods available. The method that we used to conduct RCA is called ARCA [4]. The definition for a software project failure is problematic, as the term failure is perceived to be vague and challenging to measure [27]. McLeod and MacDonell [27] characterize a software project failure as a breakdown in the software project outcome covering a wide variety of definitions. The term failure is either directly related to the outcome of the development process or it is multi-dimensional covering technical, economic, behavioral, psychological, political, subjective, contested/negotiated, and temporal interpretations [27]. Ahmad et al. [37] claim that it may be almost impossible to find agreement about whether a project succeeded or failed. It has happened that the developers perceive the project as a total success and the other stakeholders perceive it as a dramatic failure [38]. Agarwal and Rathod [39] state that a success and a failure are related to the perceptions of project members. They conclude that the perceptions about a success or a failure are often related to fulfilling the project goals [39]. Similar claims are presented by Procaccino et al. [40]. In our terminology, a software project failure means a recognizable failure to succeed in the cost, schedule, scope, or quality goals of the project. The recognizable refers to a project failure perceived as severe enough to be prevented in the upcoming projects. In this paper, we present perceived causal relationships between the causes of project failures in four software product companies. Additionally, we discuss which causes the company personnel perceived as the most promising targets for process improvement activities, and how these differ from the other causes. We extend our prior paper on the causes of the failures of software projects [41], which resulted in a general list of the separated causes of project failures. The rest of the paper is structured as follows: Section 2 introduces the theoretical background. We present the common causes of software project failures, the law of causality, and the relationship between RCA and software process improvement. Section 3 presents our research objective, research questions, as well as how the research data was collected and analyzed. Section 4 presents the case study results through the distributions of the causes of failures and their causal relationships that interconnect the process areas. Section 5 answers the research questions and discusses the most interesting findings and threats to their validity. Finally, Section 6 states the conclusions and proposes future work on this topic. 2. Theoretical background In this section, we first analyze the prior work to point out the common causes of software project failures. Second, we discuss the causality in software engineering. Then, we present a brief review of RCA that we used as a data collection method in our study. Finally, in Section 2.4 we point out the gaps in prior works Common causes of software project failures In this section, we discuss the common causes of software project failures introduced in prior studies. We consider the following three questions: (1) what causes of software project failures are introduced, (2) where in the development processes do the causes occur, and (3) what is the relationship between the causes? We base our reasoning on a review of software engineering outcome factors introduced by McLeod and MacDonell [27]. The review covers a total of 177 empirical studies published in the years Additionally, we supplement the review with otherwise missing, but relevant papers on software project failures we found using the Google Scholar and Scopus databases from 1998 to Fig. 1 summarizes the common causes of failures presented in the prior studies and Section elaborates the prior work behind the figure further. The existing software engineering literature on software project failures indicates that the causes of failures are commonly caused by the project environment, tasks, methods, and people. The causes of failures occur in various processes, which include management, sales & requirements, and implementation. Furthermore, the failures are likely an effect of many interconnected causes having causal relationships to one another. However, while considering such relationships, it seems that there is a gap in the prior studies. Thus, it is difficult to conclude how the causes of failures are interconnected Causes of failures and affected process areas McLeod and MacDonell listed factors that affect the outcome of software systems development projects. These include factors related to project environment, people, methods, and tasks. Their findings resulted in a theory indicating that the development and deployment of software systems is a multidimensional process where people and technology are interconnected [27]. The project environment characterizes the environmental conditions and organizational properties [27] that have an impact on the software project outcome. Moløkken-Østvold and Jørgensen [14] indicate that a chaotic environment is a common cause for software project overruns. In the case of software project failure, the project environment is commonly related to the project complexity [18,42,43], organizational factors [37], available assets [12,37,42 44], policies [43], structure [43], business domain [27,37,45] and technology [45]. The people related causes [27] cover social interaction [27,42,45,46], skills [13,42,45,46], and motivation [16,47]. McLeod and MacDonell [27] indicate that social interaction affects the People Social interaction Skills Motivation Tasks Sales Customers Requirements Contracting Project management Quality control Development work Software testing Methods Development work Users Top management External agents Project team Cooperation Environment Project complexity Available assets Policies Business domain Organizational structures Technology Fig. 1. Summary of the common causes of software project failures.

157 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) outcome of software projects. Moløkken-Østvold and Jørgensen [14] claim that if too many people are involved, then the risks for software project overrun increases. Lorin [17] claims that stakeholder conflicts and breakdowns in communication are common causes of failures. Egorova et al. [46] present that team spirit have a significant impact to the success of software projects. Verner et al. [16] present that if the staff is not rewarded for working long hours, their motivation decreases resulting in increasing risk for failures. Lack of skills and lack of subject matter experts are also introduced as the common causes of failures [13,46]. That may explain why social interaction among different stakeholders is also crucial. The expertise of an individual could be insufficient [45] and thus he needs assistance from others. Methods used in the project, i.e., the work practices used by the project members, are common causes for failures. McLeod and MacDonell [27] present that understanding how people are conducting their work is necessary in order to improve the outcome of software projects. The causes of failures related to the methods may occur at every action conducted. These cover especially the actions of developers, users, top management, external agents, and project team [27]. The outcome of project tasks including the project scope, goals, resources, and technologies, have been introduced as the common factors of the software project outcome [27]. Verner and Abdullah [18] supplement this list by introducing the tasks of contracting, financing, legal, and requirements. Furthermore, the list could be extended with the findings of Nasir and Sahibuddin [42] covering the project schedule, budget, project plan, progress reports, top management support, and assignment of roles and responsibilities. The project tasks are conducted in development processes. Considering the common sources of software project failures, McLeod and MacDonell [27] recognized the processes of requirements determination, project management, user participation, user training, and change management. Furthermore, this list could be extended with the processes of sales [14], customers [11,14,48,49], end users involvement [45], contracting [14,18], risk management [8,12,16,46], configuration management [12], quality control [46,50,51], software development [42,49], software testing [51] and subcontractor management [14,43] Relationships between the causes of failures The prior literature does not say much about how the causes of failures are interconnected. The common causal relationships of software project failures presented in the literature [8] are hypothetical and based on the authors own experiences. McLeod and MacDonell [27] summarizes that the factors of people and technology are interconnected through multidimensional processes. Similarly, Xiangnan et al. [25] present that software project failures are caused by internal and external causes being interconnected. Cerpa and Verner [8] hypothesize that the causes of failures have causal relationships to one another and Lehtinen et al. [4] claim that the problems and events of software engineering are interconnected through causal relationships. However, none of the articles we identified had systematically studied the interconnections between the causes of failures Causality of software engineering problems Causality has been of interest to scientists and philosophers for centuries starting from Aristotle [52], Hume [53], and recently Pearl [54]. Causality refers to the relationship between two sequential and mutually exclusive events [55], i.e., the cause and its effect [6]. Such relationship is commonly known as a causal relationship. Linking all causal relationships together forms a causal model defined as a complete specification of the causal relationships that govern a given domain [56]. Focusing on causal relationships helps to structure the problem into its sub-causes making sense for its occurrence and solution [57]. This can be helpful in software process improvement [4,5,19,21,23,24,29 36]. Monteiro et al. [58] claim that software engineering processes are interconnected. The commonality of all software process models is that they describe a set of linked activities [59], e.g., the development work follows the specification work, which in turn follows the work of sales. Therefore, problems in one process area may cause problems in other areas, e.g., it is difficult to create test cases from insufficient requirements. Thus, it is reasonable to consider problem solving through the causes of the problems and related process areas. We conclude that: Software engineering problems are interrelated causally through software processes. It is valuable to study the causalities as it can help to prevent software project failures Root cause analysis and software process improvement Analyzing the causes of software development problems can help improve development work [60,61]. RCA is a group work technique used to detect and analyze the causes of problems, providing a cause and effect structure by recursively identifying causes, constantly asking why [4]. Various software process improvement models, e.g., CMMI, ISO/ IEC 12207, and Six Sigma [24], list RCA as a mandatory method for process optimization, and even agile methods recommend reflection meetings that can utilize RCA [19,35,62]. In general, RCA has been presented as a logical part of retrospectives [62,63] and defect prevention activities [5,6,20,21,24,30,31,64]. In the defect prevention activities, RCA has been used to detect the causes of defects from various company processes, whereas in the retrospectives, RCA has been used to detect the causes of problems internal to the project team helping the team to improve their work practices. Software process improvement aims to lower the costs of development work, shorten the time to market, and improve the product quality [2]. Due to the relative complexity of these three general problems, the number of potential problem causes may become extensive [30]. To avoid this problem, RCA has been applied to such relatively focused problems as a high number of a specific type of software defects [5,19 22,24,30 32,35,64,65] Gaps in the prior work A high number of causes of software project failures have been listed [27] and the need for understanding how the causes of failures are interconnected has been acknowledged [8]. The concept of a causal model has been defined as a complete specification of the causal relationships that govern a given domain [56]. It explains what happens, where, and why. A causal model of a software project failure would completely model the causal relationships affecting the failure. Simply providing lists of the causes of failures, as done in prior works [27], does not create a causal model as it separates the causes from each other. This paper focuses on the perceived causal relationships between the causes, taking a step towards building a causal model of software project failure. 3. Methodology This section presents our research objective, research questions, as well as how the research data was collected and analyzed. The overall research approach is a multiple case study in four software product companies [66]. This approach was reasonable as we wanted to study real-world phenomena in a real-world context.

158 626 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Research objectives and questions Our research objective is to reveal perceived causal relationships and interconnections between process areas, and evaluate their importance for analyzing software project failures and feasibility for process improvement. To elaborate our research objective, we introduce three more detailed research questions: RQ1: Which process areas and cause types were frequently used to explain the software project failures? We developed, evaluated, and applied the taxonomy of the perceived causes. We classified the causes by using two dimensions: process areas and cause types. The process area expresses where in the software process the cause occurs (see Fig. 2), whereas the cause type describes what the cause is. RQ2: What causal relationships bridge the process areas? We qualitatively studied the perceived causal relationships, for which the process areas of the effect was different from the one of the cause. We call such causes bridge causes, while the others are called local causes (see Fig. 2). RQ3: Do the causes perceived as feasible targets for process improvement differ from the other detected causes, and if so, how? After the RCA was conducted, the company people proposed and selected causes to be processed further in software process improvement activities. We call these proposed causes and selected causes. In terms of the cause types, process areas, and interconnectedness, we compared the proposed causes and selected causes with the other detected causes Data collection We used the ARCA root cause analysis method [4] to identify the perceived causes of software project failures. The ARCA method implements RCA [28], and it includes four steps common for RCA methods [4]: the problem detection, root cause detection, corrective action innovation, and documentation of the results. In this section, we briefly introduce how the steps of the problem detection and root cause detection were conducted to explain how our research data was collected using RCA Problem detection We first analyzed the project failures in each case company using a focus group with key representatives. The key representatives were senior managers, who had the power to make process changes in their companies. In each case, measurable evidence was used in the focus group to specify a target problem that has systematically caused a project failure, e.g., in the first case (Case Defects), it was shown that a high number of defects detected at the very end of the projects have systematically caused schedule overruns Root cause detection Following the focus group session, we identified the causes for the selected failure in two phases: a preliminary cause collection, and a causal analysis workshop. The key representatives selected six to nine participants, who included company employees from various fields of expertise, as shown in Table 3. In the preliminary cause collection, the RCA facilitators (researchers and one key representative) sent an to the case participants asking them to list at least five causes of the target problem. This forced the case participants to think about the problem and its causes in advance. Additionally, it helped the key representatives to select only the most important causes for further analysis in the causal analysis workshop. The responses were handled anonymously. The RCA facilitators organized the preliminary causes into a cause and effect diagram, as illustrated in Fig. 3. Based upon the cause and effect diagram, the key representatives selected the most important cause entities to be processed in the causal analysis workshop. A cause entity includes a cause and its sub-causes, which together form an entity that is perceived as reasonable to process together (see the dark and light causes in Fig. 3). The causal analysis workshop was a time-boxed meeting of 120 min in which new causes were identified for each selected cause entity. The cause entities were processed one at a time. Each cause either deepened or widened a cause entity. Detecting new causes for a cause entity was done in three parts: 1. The case participants used 5 min to individually brainstorm new causes, writing them down on paper. 2. Each case participant presented the causes, and explained where they should be placed in the cause and effect diagram. 3. The case participants briefly discussed the cause entity s causes, trying to brainstorm more causes and to recognize whether a cause had a relationship to other causes. After all the selected cause entities were processed, the related cause and effect diagram was analyzed as a whole. The RCA facilitators asked the case participants to point out essential causes and to discuss them. The finalized cause and effect diagram was sent to the case participants of the causal analysis workshop. The participants were asked to select causes for which they thought that corrective actions should be developed. Then, the key representatives selected five to six causes of failures to be further processed, using the judgment and analysis of the causes proposed by the case participants Validity of findings As the output of the ARCA method is based on the expert judgment of the case participants, we found it highly important to evaluate whether correct and accurate causes were detected. Triangulation of the data sources and the data collection methods Fig. 2. Terminology used in the study.

159 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) The development of the classification system was done iteratively. We started with a literature review to conclude what kind of the cause classification dimensions have been previously used in the software engineering domain [5,9,19 24,30 32,35,64,65]. We concluded that two dimensions are important: process areas and cause types. The process area expresses where the cause occurs [9,19,20,22], whereas the cause type describes what the cause is, e.g., the product [9,19,22] and human resources [9,20,21,23]. Following the literature review, we created preliminary categories for the cause types and their related process areas. We then combined the preliminary classifications with the grounded theory approach, classifying a sample of the causes, similar to the approach used in [70]. During this iteration, we modified the preliminary categories to create a final classification corresponding to the causes identified in our cases. increases the reliability of the results [66 68]. Before using the ARCA method, we conducted interviews [66] with the key representatives to detect the causes of failures they perceived important. We hypothesize that the causes the key representatives underlined in the interviews should also be recognized by the case participants using the ARCA method. We compared the results of the ARCA method with the perceptions of the key representatives on the causes of failures. At each case, the case participants detected and extended most of the causes underlined by the key representatives. This comparison is presented among the case study results at Section 4. Furthermore, after using the ARCA method, we used interviews and questionnaires [69] to evaluate whether correct and accurate causes of failures were detected. In each case, the case participants and key representatives perceived that the causes detected explained the failure extensively. This validation is further described in [4], where the ARCA method was originally presented Data analysis Fig. 3. A cause and effect diagram of the ARCA method [4]. In order to analyze the causes and their perceived causal relationships systematically, we developed and applied a detailed classification system. The classification system includes four dimensions for each cause: process area, type, interconnectedness, and feasibility for process improvement Process areas Table 1 introduces the classification dimension used to describe the process areas. In the classification system, a process area describes where the cause occurs, e.g., sales & requirements or software testing. The process areas are similar to the ones found in software engineering process literature. If we compare the process areas with commonly recognized software processes such as RUP [71] or the waterfall model [72] we can see several similarities such as requirements engineering, implementation, software testing, and product release and deployment. However, there are also some differences. First, we have merged software design and implementation under the process area of implementation. It would not have been feasible to separate whether the technical problems of the product were due to poor design or implementation, because our data did not support such a division. Similarly, the process area of software testing merges test design, execution and reporting. Another difference is that we have a process area called management that gathers causes such as insufficient decision making at the top management. Such issues cannot be placed under the process area of project management [73]. Thus, the process area of the management was needed to enable a descriptive and honest presentation of the causes. The Unknown process area includes causes that cannot be classified into any specific process area, such as laziness Cause types Table 2 presents the classification used to describe the types of causes. These are used to describe what the cause is, e.g., lack of instructions & experience or lack of monitoring. The cause type characterizes the causes of failures on a general level, i.e., People, Table 1 The process area categories expressing where the cause occurs. Process area General characterization of the detected causes Concrete examples of detected causes Management (MA) Company support and the way the project stakeholders are managed and allocated to tasks The quality of the product is of low priority in the company Lack of managing the projects and their related interactions Sales & requirements (S&R) Requirements and input from customers Too many change requests from the customers It is assumed that a developer understands an ambiguous specification Implementation (IM) The design and implementation of features including defect fixing The features are implemented without caring about quality Too much unreported error handling Software testing (ST) Test design, execution, and reporting Defects are not reported immediately Tests are not conducted against proper requirements Release & deployment (PD) Releasing and deploying the product Product installation is experienced to be difficult Developers do not configure the features they have implemented Unknown (UN) Causes that cannot be focused on any specific process area Laziness Gratuitous work is done

160 628 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Table 2 The cause types expressing what the cause of the failure is. Type/sub-type General characterization of the detected causes Concrete examples of detected causes People (P) This cause type includes the people related causes Instructions & experience Missing or inaccurate documentation and lack of individual experience Lack of instructions when and how to verify No knowledge on how many files have to be configured Values & responsibilities Poor attitude and lack of taking responsibility People do not care if the number of bugs increases We have a problem in our organization culture Cooperation Inactive, inaccurate, or missing communication The requirements were not reviewed together enough Miscommunication between the developers and testers Company policies Not following the company policies Features are marked as done without testing New issues are not registered Tasks (T) This cause type includes the task related causes Task output Low quality task output Requirements are insufficient Management work is insufficient Task difficulty The task requires too much effort, or time, or it is highly challenging Design and execution of standard tests is too difficult It is difficult to create a comprehensive specification Task priority Missing, wrong, or too low task priority The amount of features is more important than the quality The priority of defect fixing is too low Methods (M) This cause type includes the methodological causes Work practices Missing or inadequate work practices Unit testing was insufficient throughout the project Implementation is done directly in the test environment Process The process model is missing, unclear, vague, too heavy, or inadequate The process for software testing is missing The product versions are developed in parallel for too long Monitoring Lack of monitoring An opaque view of the product quality during the development work Installations are usually scattered and nobody knows which one is at use Environment (E) This cause type includes the environment related causes Existing product Complex or badly implemented existing product The structure of the product has decayed during the past Nobody knows the whole product as it is an extremely large system Resources & schedules Wrong resources and schedules Lack of time in software testing Lack of time to report defects specific enough Tools Missing or insufficient tools The requirement template does not force to define constraints The version control system does not support customization Customers & users Customers and users expectations and need The customers desires are not analyzed and prioritized quickly Importance for the customers is not well defined Tasks, Methods, and Environment. The cause types are based on the classification dimensions introduced in prior work [5,20,30]. We wanted to extend the cause types to provide more details on the causes of failures and thus we added the sub-types under each cause type [41]. Considering the reliability of this extended classification dimension, similar factors affecting the software project outcome has been introduced by McLeod and MacDonell [27] Dimension of interconnectedness We qualitatively analyzed the bridge causes, i.e., the causes that directly interconnected the process areas, as shown in Fig. 2. We did this by selecting the causal relationships for which the cause and effect were in different process areas. We grouped the selected causal relationships according to the process areas, e.g., management & implementation. For each group, we explored the causal relationships by looking up the causes from the original cause and effect diagram created in the case. Thereafter, we concluded and concretized the whole path of causes and effects from the original cause and effect diagram related to the bridge causes. For example, in the first company case (Fig. 4), the values of the company managers (a local cause) was used to explain why the quality was ignored during the task prioritization (a bridge cause), which was used to explain the low priority of the defect fixing (a bridge cause), which was used to explain the wrong task priorities perceived in the implementation (a local cause) Feasibility of the causes for process improvement We quantitatively studied the causes that were perceived as feasible targets for process improvement. During the cause classification, we registered whether a cause was selected or proposed for further processing. The causal analysis workshop, see Section 3.2.2, resulted in detected causes with cause and effect structures. After the causal analysis workshop, the case attendees were asked to propose which causes they believed needed to be solved first, i.e., proposed causes. The key representatives, i.e., senior managers having the power to make process changes in their companies, selected five to six causes that were later processed further by developing corrective actions. These causes are called selected causes. Furthermore, the selected causes were explained by their sub-causes, which were processed among the selected causes. During the analysis, we divided the perceived feasibility for process improvement into three categories. The highest importance is assigned to the selected causes that were further processed by developing corrective actions for them. The selected causes represent the senior managers perceptions on the causes of failures feasible for process improvement. The second highest importance is related to the proposed causes. They represent the participants perceptions about which causes are feasible for process improvement. The third category consists of the causes detected but neither proposed nor selected for process improvement. We quantitatively analyzed how the causes from these three categories differed. We compared the distributions for process areas, see Table 1, and types, see Table 2. Additionally, we compared the share of bridge causes with the share of other detected causes to understand the perceived importance of bridge causes for process improvement. 4. Case study results This section introduces our industrial cases and provides the results of each case followed by a cross case analysis. The cases and

161 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Fig. 4. The causes and causal relationships in Case Defects (bolded text/line indicates the selected causes; normal text/line indicates the sub-causes of the selected causes; dashed line/grey text indicates that the cause was neither a selected cause nor a sub-cause; lines with arrows and text between the process areas indicate the direction of causal relationships interconnecting the process areas). their background are the same than in our prior work [4]. In the cross case analysis, we first present the distribution of the causes, followed by pictorial descriptions of the causal relationships bridging process areas as well as the types of other causes in each process area. Table 3 summarizes the company cases. Due to confidentiality and to make it easier to follow the discussion for the readers, the names of the case companies have been replaced by pseudonyms that reflect the main causes of failures selected for analysis in the case in question. At the table, the qualitative data is based on the results from the interviews and the quantitative data is based on the questionnaires introduced in [4]. The table summarizes how the key representatives characterized the failure and how the case participants evaluated it. The similarities of the cases made them more comparable, whereas the dissimilarities consolidated the case study results in different case contexts. In each case, the software project failure was found to be highly complex and difficult to prevent. Similarly, in each case, the impact of the failure was experienced as relatively high. Instead, the failure itself and the effort the company had employed to try to prevent it varied between the cases and between the opinions of the case attendees. Additionally, there were also differences in the roles of the case participants Case Defects The failure and the background Case Defects is a medium-sized international software product company with approximately 100 employees. The average size of the project organization is about seven people. The main product is a large and complex software system, released twice a year, consisting of a major and a minor release. The failure selected for deeper analysis in the case was that the product releases are often delayed due to a high number of soft- Table 3 Summary of the case contexts [4]. Case defects Case quality Case complicated Case isolated Case company Software product company with 100 employees Software product company with 450 employees Software product company with 100 employees Software product company with 110 employees Project failure Fixing and verifying defects delays the project schedules Blocker type defects are detected in the product after release New product installation and updating are challenging tasks Issues lead time is sometimes intolerably long Roles of the case participants Project managers, quality managers, developers, sales personnel, N = 9 Mostly developers, N = 9 Project managers, testers, developers, N =7 Project managers, testers, developers, sales personnel, N =6 Extremely costly and complex Failure characteristics Extremely costly and complex Not very costly, but very complex High impact on customer relationships and complex Difficulty of preventing Average = 5.3 Average = 5.6 Average = 5.4 Average = 5.5 the analyzed failure a Standard deviation = 1.1 Standard deviation = 0.8 Standard deviation = 1.3 Standard deviation = 1.0 Impact of the analyzed Average = 5.8 Average = 5.0 Average = 5.6 Average = 5.9 failure a Standard deviation = 1.1 Standard deviation = 1.3 Standard deviation = 0.9 Standard deviation = 0.9 a Scale: 1 = very low; 2, 3, 4 = neutral; 5, 6, 7 = very high.

162 630 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) ware defects detected at the end of the development projects. The main goal of the company was to understand why defect fixing takes an intolerably long time and why the fixes are not verified quickly. The key representatives underlined that the company has continuously tried to prevent this failure during recent years. Additionally, they assumed that the failure is extremely complex and costly for the company. They claimed that the main causes of the failure are that the technical blocks in the software are too large and that employees attitudes are not professional enough to develop high-quality software. Additionally, they assumed that increasing discipline among the developers and releasing the software in shorter cycles would help prevent the failure The causes and causal relationships In Case Defects, the causes of failure were related to all process areas as can be seen in Table 4. Software testing included the highest number of causes (37.6%), but management (24.0%), implementation (23.0%), and sales & requirements (11.6%) also included a high number of causes. Furthermore, the causes related to management (6.1%), implementation (10.8%), and software testing (11.5%) were most often proposed as targets for process improvement. The selected causes were related to the values & responsibility and task output of management, the task priorities of implementation, and the task output of software testing. Our results indicate that Case Defects was highly focused on management, implementation and software testing, as all of these process areas included a high number of causes that were proposed and selected. Furthermore, it seems that in Case Defects, the role of the release & deployment was very low as it included only 1.6% of the detected causes, and no proposed or selected causes. Fig. 4 depicts the identified causes, and the relationships between the process areas. In the figure, the selected causes are in bold. Normal texts and lines depict the sub-causes of the selected causes. A dashed line or grey text indicates that the cause was neither a selected cause nor a sub-cause. Looking at the figure, we can see that the project failure was caused by management ignoring the importance of software quality. This is in line with the early assumptions of the key representatives, being closely related to the attitudes of the company employees, a cause emphasized in the interviews. Our quantitative results support this conclusion, as a high number of causes related to the task output and values & responsibility taking in management were detected, proposed, and selected, as shown in Table 4. These causes were interconnected to software testing and implementation, as shown in Fig. 4. Our qualitative results indicate that lack of respect, lack of resources, and incorrect scheduling of software testing caused difficulties in verifying the fixed defects quickly. Additionally, software testing was influenced by lack of cooperation and low quality of implementation. Considering the causal relationships of implementation, software quality was systematically prioritized lower than new feature development. While the development work continued, the technical debt increased as new functionality was built upon a low quality implementation. This was caused by that the state of product quality was unclear during the project as the development work and software testing did not report the work properly. Simultaneously, the feature requests made by sales & requirements and the defect reports given by software testers were unclear. Thus, the workload of the implementation was difficult to estimate and the tasks were allocated to the wrong developers with incorrect priorities Case Quality The failure and the background The second case was a medium-sized international software product company with approximately 450 employees. The company releases new software versions regularly and its products can be characterized as complex model-based software. The failure of the case was that blocker-type defects are often detected after the product releases, which increases the costs through re-work and customer dissatisfaction. The main goal of the company was to understand why blocker-type defects were made and not detected during the project. The company has recently reacted to this failure by setting a clear goal to decrease the number of defects that leak to the customers. The key representatives characterized the failure as very complex and related to many different causes. The main causes for the failure were believed to be that new code is built on the old, low-quality code; too many different methods are used in the development work; and the lack of different hardware set-ups decreases the coverage of the software testing. They said that the failure could be best prevented by refactoring the old code. They also believed that the failure is not very severe because the customers are currently highly satisfied The causes and causal relationships In Case Quality (see Table 4), the process area of software testing included the highest number of causes (38.5%). However, the implementation (34.0%) and sales & requirements (19.5%) also included a relatively high number of causes. Similarly, the causes related to sales & requirements (5.3%), implementation (9.6%), and software testing (8.0%) were often proposed. These results indicate that Case Quality was highly focused on sales & requirements, implementation, and software testing. However, the selected causes were related to implementation and software testing. These included the task priority and cooperation related to the implementation, and work practices, instruction & experiences and task output related to software testing. Perhaps this was because the key representatives perceived that the case participants mostly developers were incapable of developing good corrective actions for the process area of sales & requirements. Furthermore, our results indicate that management and release & deployment played a minor role in Case Quality. Fig. 5 shows that the failure of Case Quality arose from the uncontrollable side effects of the existing product, which were difficult to take into account during implementation and detect during software testing. Our quantitative results supplement this conclusion by indicating a high number of causes related to the existing product in implementation and software testing, see Table 4. The assumption of the key representatives was that the failure was caused by the existing product seems to be in line with this conclusion. However, the uncontrollable side effects of the existing product were mostly symptoms of other causes. This may explain why the causes related to the existing product were not selected. Lack of cooperation caused insufficient requirements, which lowered the efficiency of the development work and software testing through missing information. This may explain why the causes related to cooperation were proposed and selected. Due to lacking instructions & experience, the people of sales & requirements were incapable of taking the information needed for implementation and software testing into account. This was caused by a lack of collaboration in reviewing the requirements. It was also claimed that the people of sales & requirements did not take enough responsibility for producing good requirements, as they assumed that the developers understand ambiguous specifications. New feature development was prioritized higher than the quality of the existing features. This was caused by too early customer promises and a time boxed project schedule. Thus, while development continued, the technical debt increased as new functionality was built on top of the low quality existing implementation. Thus, there was not enough time left to improve the software quality at

163 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Table 4 Detected, proposed, and selected causes of the cases. Type & sub-type Examples from the cases Case defects Case quality Case complicated Case isolated Detected % Proposed % Selected % Detected % Proposed % Selected % Detected % Proposed % Selected % Detected % Proposed % Selected % Management Inst. & exp. P New managers are not familiarized enough Value & resp. P Managers behavioral give bad examples Comp. policies P Resource allocation neglected for some tasks 0.7 Work practices M Workload estimations were not done Task output T Managerial duties are conducted insufficiently Task difficulty T Prioritizing hundreds of issues is difficult Existing product E Only few people know the product technically Res. & sch. E Project manager is busy Cust. & users E SLA forces us to react to the complaints Tools E PM tool does not support knowledge sharing Sales & requirements Inst. & exp. P RE people does not understand the problem Value & resp. P Assumption that insufficient requirements are ok Comp. policies P Many requests are documented in one request Cooperation P Lack of communication with developers Work practices M Requirements are not verified with customers Task output T Requirements are documented insufficiently Task difficulty T Very difficult to make a perfect specification Existing product E The product requires new issues constantly Res. & sch. E Lack of resources to document the issues Cust. & users E Customers cannot describe their needs Tools E Missing fields in the requirement template 1.1 Implementation Inst. & exp. P Lack of orientation to implement features Value & resp. P Features quality is ignored in the development Comp. policies P Definition of done is not obeyed Cooperation P Inactive communication with the task requester Work practices M Tasks are not finalized once started Process M No stabilization phases in the process Monitoring M The progress of development tasks is unclear Task output T The quality of code is low Task difficulty T Difficult to design high quality interfaces Task priority T Features are more important than the quality Existing product E Architectural dependencies hamper dev Res. & sch. E Lack of time to implement Cust. & users E Customers requests often break the product Tools E Insufficient version control system Software testing Inst. & exp. P Lack of instructions on what to test Value & resp. P Manual testing work is frustrating Comp. policies P Verification guidelines are not obeyed (continued on next page)

164 632 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Table 4 (continued) Type & sub-type Examples from the cases Case defects Case quality Case complicated Case isolated Detected % Proposed % Selected % Detected % Proposed % Selected % Detected % Proposed % Selected % Detected % Proposed % Selected % Work practices M Lack of test automation Process M Process for software testing is missing Monitoring M Test coverage not reported Task output T Low quality of testing work Task difficulty T Difficult to define inoperative test cases Task priority T Other tasks are prioritized higher than testing 1.5 Existing product E Units of the old code cannot be separated Res. & sch. E No schedules for software testing Cust. & users E No testing in the production environment Tools E Lack of tools to detect the side effects Release & deployment Inst. & exp. P Lack of instructions on how to install the system Comp. policies P Blockers were detected, but release was done Work practices M Most of the tasks have to be conducted manually Process M The release process is time boxed Monitoring M It is not monitored which installations are in use Task output T Wrong files are overwritten during installation Task difficulty T The product installation is difficult Existing product E System configuration files are scattered Res. & sch. E Too busy to do installations carefully Cust. & users E Some installations are done to customers servers 4.2 Unknown process area Total % Total P

165 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Fig. 5. The causes and causal relationships in Case Quality (bolded text/line indicates the selected causes; normal text/line indicates the sub-causes of the selected causes; dashed line/grey text indicates that the cause was neither a selected cause nor a sub-cause; lines with arrows and text between the process areas indicate the direction of causal relationships interconnecting the process areas). the end of the project, which may explain why the causes of failure related to task priority of implementation were proposed and selected. Furthermore, management did not sufficiently support learning of unit testing, nor allocated sufficient testing resources to make good test cases, which was experienced as an important problem. These causes had following effects: defect detection was insufficient through weak unit testing, and junior developers did not have enough examples on how to use the existing functionality. The causes of failure related to weak unit testing were perceived as important, as such causes were often proposed and selected. The insufficient defect detection was also caused by implementation and sales & requirements. The development tools did not help detecting the uncontrollable side effects of the existing product, and in particular non-functional requirements were specified insufficiently and in an immeasurable way. This caused difficultly in detection and reporting defects, since the expected level of quality was unclear. Interestingly, such causes were not selected or proposed. Perhaps the company people experienced that controlling such causes is highly difficult Case Complicated The failure and the background The third case was a medium-sized international software product company with approximately 100 employees. The main product can be characterized as a highly configurable software service. The product is delivered for the customers through installation projects that occasionally include the development of new features. New software versions are released regularly. The failure selected for analysis in the case was that the installation projects are too challenging to be performed efficiently. Re-engineering due to unexpected defects caused by the complex software configurations and development of new features during the installation projects was common. The main goal of the company was to understand why new product installation and updating are highly challenging tasks. The company had not expended much effort to manage the failure earlier. However, the key representatives stressed that the failure has a significant impact on their customer relationships and that it is very complex to prevent. They said that the main cause for the failure was that the employees have too many different ways in which to perform a product installation. Additionally, the number of different stakeholders was considered too high with respect to the quality of communication between them. They also indicated that creating checklists and simplifying the installation process could minimize the failure The causes of failure and causal relationships In Case Complicated (see Table 4), the causes of failure were related to all process areas except sales & requirements. The process area of release & deployment included the highest number of causes (52.5%), but software testing (18.2%), implementation (21.0%), and management (8.4%) also had their share. Interestingly, the number of causes related to management implodes while considering the proposed causes, while the number of causes related to other process areas remains high. Thus, our results indicate that Case Complicated was mostly focused on the release & deployment, software testing, and implementation. These process areas included a high number of the proposed and selected causes. The selected causes included the impact of the existing product in implementation, lack of instructions & experience, task difficulty of product deployment, and task difficulty of software testing.

166 634 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Fig. 6 shows that the failure of Case Complicated arose from complex version dependencies, insufficient software testing, and difficult manual configurations during release & deployment. Our quantitative results supplement these conclusions as we can see a high number of causes related to the impact of the existing product and task difficulty in the release & deployment, and insufficient task output of the implementation and software testing, see Table 4. Interestingly, the assumption of the key representatives indicating the failure being caused by the employees having too many different ways in which to perform a product installation supplements only the conclusion related to the difficult manual configurations. Additionally, the causes related to the quality of communication between different stakeholders were not detected in the causal analysis workshop. The failure was driven by causal relationships between management, implementation, software testing, and release & deployment. The work practices of release & deployment were not planned, resulting in unsystematic work practices. Additionally, there was very little documentation to support the work of release & deployment. This caused a lack of information during the release & deployment, and explains why the installation work was experienced as difficult. Most of the selected and proposed causes were related to these problems. The existing product was also relatively complex, unreliable, and required a high number of manual configurations during release & deployment. Considering the causes related to the existing product, there was a problem that newly developed features occasionally crashed the existing functionality, which may explain why such causes were selected and proposed. The problem was influenced by uncontrollable version dependencies and low priority of software quality in contrast to the amount of new features. The version control system was used inadequately, as occasionally new software versions were not added to the version control system, and different customer versions were overlapped under the same version branch. Furthermore, the company employees did not get enough relevant information related to the version history. Thus, it was difficult to conclude what configurations were relevant and for what version. The misuse of the version control system was influenced by management giving little value to the usage of the version control system. This was caused by a lack of knowledge. Furthermore, software testing before release & deployment was insufficient. This problem was influenced by lack of time and resources. There was no feature freeze in the development process before releasing new versions. Thus, there was very little time to test the software and fix the defects before the release & deployment, which may explain why the causes related to task difficulty of software testing were selected and proposed. The developers also utilized the test environment as an active development environment, which meant that the detected defects were difficult to map to any active product version. The company product was also occasionally integrated into the server environments of the customers. Unfortunately, it was practically impossible to test these installations in a safe test environment in advance, as the customer environments varied a lot, and were difficult to clone to the test environment Case Isolated The failure and the background The fourth case was a medium-sized international software product company with approximately 110 employees. The main product can be characterized as a highly complex software system. The product is delivered to customers through complex integration projects where the product is configured into the software systems of the customers. The failure selected for analysis in the case was that the lead time of an issue is occasionally intolerably long, resulting in delays in projects. The main goal of the company was to understand why some project issues were not implemented and verified quickly. Fig. 6. The causes and causal relationships in Case Complicated (bolded text/line indicates the selected causes; normal text/line indicates the sub-causes of the selected causes; dashed line/grey text indicates that the cause was neither a selected cause nor a sub-cause; lines with arrows and text between the process areas indicate the direction of causal relationships interconnecting the process areas).

167 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) The company had not expended much effort to manage the failure earlier. However, they have tried to improve communication between the stakeholders of the company. The key representatives valued the failure as high because it had a severe financial impact. It follows that the projects are not finalized on time. They said that the main causes of the failure include lack of communication between the stakeholders and the way the company divides resources between the issues. Usually, an issue with fairly low priority does not get enough resources. They concluded that preventing the failure is not an easy task. This would require increasing face-to-face meetings, increasing the number of inspections, and allocating skilled project managers to be responsible for the issues The causes of failure and causal relationships In Case Isolated, see Table 4, the causes were related to all process areas. However, implementation (36.2%), sales & requirements (36.0%), and management (20.9%) had significantly higher numbers of causes than software testing (3.6%) and release & deployment (2.4%). The distribution of proposed causes follows this trend. However, almost every cause from software testing (2.4%) and release & deployment (2.4%) was proposed for further processing. This indicates that the causes related to software testing and release & deployment were considered important even though they were not often underlined in the causal analysis workshop. Furthermore, most of the selected causes were related to sales & requirements, which indicate that the key representatives found these causes to be feasible targets for process improvement. Fig. 7 shows that the failure was partially caused by lack of cooperation, task priorities, and the inflexible development process, which is in line with the conclusion of the key representatives indicating the failure was caused by lack of communication between the stakeholders and the way the company divides resources between the issues. However, our results also indicate that the failure was caused by company policies and lack of individual responsibility. Our quantitative results supplement the conclusions on the inflexible development process, company policies, and lack of taking individual responsibility by showing a high number of detected, proposed, and selected causes related to these cause types. Instead, the central roles of the causes related to the cooperation and task priorities dissipate while focusing only on the numbers of detected causes. Lack of cooperation was caused by distributed team members who ignored the team. Management did not arrange project meetings or reviews. Lack of cooperation had causal relationships to all process areas. Management was strongly relying on the project management tool. Resource allocation and task planning including task prioritization was done solely with the tool. Unfortunately, the tool did not support the project members communication needs, nor did it include relevant information related to task history. This resulted in the project issues being allocated to the wrong developers and testers. Another consequence was that the project members did not receive clarifying information related to their tasks. The specifications were inadequate, including duplicate project issues. Similarly, the developers and testers did not understand what the task required. Interestingly, the selected causes did not include causes related to cooperation. Perhaps the key representatives knew that such causes are highly difficult to control as they already had tried to improve cooperation. It is also possible that such causes were already under the process improvement. Incorrect task priority was caused by insufficient requirements and customer coordination. Additionally, the age of project issues was ignored by management. The insufficient requirements were influenced by difficult customer requests. The customer requests were driven by external consultants who had little knowledge about the required level of details needed in development. The requests were unclear and promised to be implemented immediately. The customer requests were not sufficiently managed, Fig. 7. The causes and causal relationships in Case Isolated (bolded text/line indicates the selected causes; normal text/line indicates the sub-causes of the selected causes; dashed line/grey text indicates that the cause was neither a selected cause nor a sub-cause; lines with arrows and text between the process areas indicate the direction of causal relationships interconnecting the process areas).

168 636 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) resulting in a problem that project issues were unclear with ambiguous specifications. The company policies were not followed in sales & requirements, which was a cause both selected and proposed. It was an effect of the inflexible development process, which may explain why the causes of failure related to the process of the implementation were also selected and proposed. It was decided that all project issues have to be registered into the project management tool so that all project issues are systematically handled through the development process including specification, task prioritization, resource allocation, implementation, and software testing. Unfortunately, some customer requests were experienced as too severe to be processed through the dilatory development process. Instead, the customer requests were directly allocated to the developers without specifications. This caused a problem of missing information in implementation and software testing. Additionally, the people in sales & requirements modified their prior issues to decrease the lead time of their new feature requests. They claimed that adding a new feature request as part of some prior issue would decrease the lead time of the feature request being disguised as a bug of the prior issue. Lack of individual responsibility had causal relationships to sales & requirements and implementation. It was claimed that there is a problem of lack of orientation in the implementation and sales & requirements, which was caused by systematic interruptions and resource changes during the project. It was difficult for new developers and requirement engineers to get started, because they had to continue the work of other people without proper instructions. The systematic interruptions of the implementation and resource changes were caused by changing task priorities and lack of taking individual responsibility, which was driven by difficult tasks. The project members did not want to continue to work with tasks that some other members may execute easier and faster. This also caused a change in priorities Cross case analysis In this section, we compare our cases by focusing on the detected causes and their causal relationships bridging the process areas. The comparison is based on two questions: (1) are the cases similar in terms of the detected, proposed, and selected causes, and (2) are the common causes of failures related to similar causal relationships interconnecting the process areas? Similarities of the causes of failures Table 4 summarizes the detected, proposed, and selected causes in the cases. It can be seen that the distributions of causes in process areas vary heavily between the cases. This means that the causes of failures were different and dependent on the case context, e.g., in Case Quality, the problem was software testing being a highly difficult task, whereas in Case Defects, the problem was software testing being a low priority task. Despite the differences in the distributions of the causes in process areas, all of the cause types were frequent in all cases and process areas. These included causes related to People (avg. 29%, std. 6%), Tasks (avg. 26%, std. 4%), Methods (avg. 22%, std. 3%), and Environment (avg. 22%, std. 5%), as can be concluded from Table 4. Considering the sub-types of causes, seven sub-types are common between the cases and over most of the process areas. These include the instructions & experience (avg. 16%, std. 4%), values & responsibilities (avg. 8%, std. 6%), work practices (avg. 16%, std. 4%), task output (avg. 16%, std. 2%), task difficulty (avg. 7%, std. 3%), existing product (avg. 7%, std. 5%), and resources & schedules (avg. 9%, std. 4%). Another similarity between the cases is related to the selected and proposed causes. While the detected causes (a total 630) distributed equally to the cause types, the selected (a total 22) and proposed (a total 216) causes distributed mostly into the cause types of People and Tasks. At each case, the selected causes were most often related to the cause types of People (avg. 41%, std. 7%) and Tasks (avg. 45%, std. 15%). Furthermore, the proposed causes were most often related to the cause types of People (avg. 26%, std. 6%), Tasks (avg. 29%, std. 9%), and Methods (avg. 27%, std. 6%). Considering the bridge causes, it seems that the company people perceived them as feasible targets for process improvement. Fig. 8 shows that the proportion of the bridge causes increases in the proposed (average 56%) and selected causes (average 68%) when compared with the detected causes (average 50%). This indicates that the company people perceived being feasible to control the causes related to the causal relationships interconnecting the process areas Common causal relationships bridging the process areas Fig. 9 summarizes the common causes and related causal relationships bridging the process areas. In prior studies, the cause of failure has been concluded as common if it occurs in 60% to 80% of the software project failures, e.g., in [8,16]. Our inclusion criterion for combining the cases was that the cause and its related causal relationship occurred at least in three of our four cases, i.e., 75%. Similar causes were often related to similar causal relationships. Despite the fact that the failures selected for analysis in the cases were highly different, three common causal relationships bridging the process areas, shown in Fig. 9, stand out in our cases: Weak Task Backlog, Lack of Cooperation, and Lack of Software Testing Resources. The causal relationship Weak Task Backlog bridges the sales & requirements, management, implementation, and software testing. Management is a cause for Weak Task Backlog through incorrect decisions on task priorities, i.e., features vs. quality. Caused by lack of instructions & experience, and the high priority of new feature development, it was claimed to be difficult for managers to prioritize hundreds of opaque tasks. Sales & requirements is also a cause for Weak Task Backlog through vague requirement specifications. Caused by lack of instructions & experience, it was claimed to be difficult to write good requirement specifications. Implementation is an effect of Weak Task Backlog through vague task descriptions and incorrect task priorities. Caused by opaque specifications, it was difficult for the developers to know what their tasks required. The high priority of new feature development caused difficulties in implementation through increasing technical debt. Software testing is a cause and effect for Weak Task Backlog through missing verification criteria and vague defect reports. 1 As an effect, the testers did not know what to verify. As a cause, the defect reports registered to the task backlog were vague complicating the work of managers when prioritizing the tasks. The causal relationship Lack of Cooperation interconnects sales & requirements, implementation, and software testing. Missing information in sales & requirements is an effect of Lack of Cooperation through the missing assistance of developers while making the specifications. Caused by the difficult existing product and the lack of instructions, the people of sales & requirements were incapable of documenting the specifications detailed enough to support the implementation and software testing. Similarly, missing information in implementation is an effect of Lack of Cooperation through the inactive assistance from the people of sales & requirements while the developers are trying to understand the task descriptions. Furthermore, missing information in software testing is an effect of Lack of Cooperation through the inactive assistance from the people of sales & requirements and implemen- 1 This cause was detected only at cases Quality and Defects.

T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) 623 643 637 Fig. 8. Proportion of the bridge causes in the detected, proposed, and selected causes. Fig. 9.

169 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Fig. 8. Proportion of the bridge causes in the detected, proposed, and selected causes. Fig. 9. Common causes and bridging causal relationships found in at least three out of the four cases (bolded text/line indicates the selected causes; normal text/line indicates the sub-causes of the selected causes; dashed line/grey text indicates that the cause was neither a selected cause nor a sub-cause; lines with arrows and text between the process areas indicate the direction of causal relationships interconnecting the process areas). tation while the testers try to understand what to verify. The common cause for Lack of Cooperation seems to be missing. Perhaps the managers did not force the people to work together or maybe the people ignored cooperation, as it was in Case Isolated. The causal relationship Lack of Software Testing Resources bridges management and software testing. Management is the cause for Lack of Software Testing Resources through the causes related to values & responsibility. In Case Defects, the testers were forced to do other tasks than testing, and thus they simply did not have enough time to do testing. In Case Quality, the managers allowed new feature development at the very end of the project schedule, and thus there were not enough testing resources available to verify that the developed features actually worked. In Case Complicated, the resource allocation of software testing was considered a failure without more detailed explanations. In Case Isolated, the tasks related to software testing were allocated to the wrong testers having only a little knowledge on the issue to be verified. Many of the causes were a case specific. Thus, explaining the failures solely with the common causal relationships between the process areas is infeasible, but improves the knowledge related to possible common causes of software project failures. Analyzing the causal relationships between the causes helps us understand why the failures occur. However, the common causal relationships alone or even at together do not explain any of our cases solely. Comparison of the case-specific results (Table 4, and Figs. 4 7) shows that each failure was also caused by different, case-specific causes having different causal relationships. Additionally, none of the common causes were either proposed or selected at every case, and causes other than the common ones were proposed at each case. 5. Discussion In this section, we answer the research questions and compare our findings with the prior studies of common causes for software project failures. Additionally, we discuss the validity threats related to our conclusions Answering the research questions RQ1: Which process areas and cause types were frequently used to explain the software project failures? The causes of software project failures analyzed in this study were equally distributed into the types of People, Tasks, Methods, and Environment, see Section 4.5. We believe that this finding can be generalized as similar conclusions are also presented in prior studies. McLeod and MacDonell [27] summarized in an extensive survey of literature that the outcome of software system development is affected by People & Action, Development Processes and Project Content. The causes of failures were commonly related to the instructions & experience, values & responsibilities, work practices, task output, task difficulty, existing product, cooperation, and resources & schedules. On average, 81% of the detected causes in the cases (std. 2%) were related to these sub-types of causes. We believe that this finding is generalizable as it can be logically compiled and prior studies [27] have resulted in similar findings. In software product companies, the projects are related to maintaining and improving the existing product. Over the years, the existing product grows, increasing the complexity of both the product and the project tasks. This leads to an increasing need for instructions & experience, values & responsibilities, cooperation, resources & schedules, and improved work practices. McLeod and MacDonell

170 638 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) [27] present that development practices and the project outcome are influenced by domain knowledge, experience, values & beliefs, communication & social skills, and motivation. Considering the process areas, the causes were related to management, sales & requirements, implementation, software testing, and release & deployment. The cause distributions over process areas varied between the cases, see Table 4, which indicates that the process areas where the causes occur are dependent on the case context. However, despite the differences between the cases, their commonality was that each of them was influenced by insufficient management, as indicated by Verner et al. [16], and by the problems of software testing and implementation. In the prior studies of the common causes of failures, management and sales & requirements are mentioned most frequently. For example, inaccurate estimating and schedule planning [8,10,11,16,45,46], overoptimistic status reporting [10], insufficient quality control [10,16,46], unrealistic schedule pressures [8,10,16], lack of top management support [11,13,16,45], weak project manager [13,46], task priorities [13], and insufficient development process [14,16] are all part of management, which influences software project failures. Furthermore, unrealistic expectations of customers [8,11,16,18], lack of customer support when gathering the requirements [8,11,14,16], changing scope [8,11,13,16,45], scope creep [14], failure to specify appropriate measures [18], and inadequate requirements [11,13,14,18,46] indicate that there are many improvement opportunities at the sales & requirements too. Interestingly, prior studies have recognized, but not emphasized, the central role of implementation and software testing in causing software project failures. In our cases, these two process areas included the highest number of causes and they had a central role at each failure. It was claimed that the quality of the implementation was low. The work was not well reported and the defect density was high. These problems were influenced by the values & responsibility and prioritizing implementation of features higher than quality. Other problems of the implementation were related to the technical debt and low maintainability of the existing (legacy) product. Furthermore, it was also claimed that the quality of software testing was low. The defect reports were vague and a high number of defects were not detected. Software testing was claimed to be a difficult task. It was challenged by the existing product and lack of resources & schedules. Software testing and implementation are essential parts of software engineering. However, despite this, the prior studies have not introduced these two as highly common reasons for project failures. We hypothesize that this is due to the data collection methods. Using the RCA workshops allowed many individuals to participate and encouraged deeper thinking instead of using factor-based large-scale surveys [27] resulting in the answers of premade questions that are filled out by a single company employee. Furthermore, we studied product companies rather than bespoke software projects, which might have caused these differences. RQ2: What causal relationships bridge the process areas? A high number of the causes of failures in implementation, software testing, and release & deployment were interconnected to the output of management and sales & requirements. This finding contributes and consolidates the prior studies as it indicates that software project failures are caused by management work and sales & requirements, see Fig. 9. For example, the lack of resources in software testing was claimed to be caused by management, and similarly, the lack of instructions in implementation was claimed to be caused by sales & requirements. Thus, in order to prevent the failures, should we focus on management and sales & requirements only? Management was influenced by sales & requirements and software testing, which means that solving the problems related to management requires improvements in these process areas. Task difficulty, lack of instructions & experience, work practices and values & responsibility were the types of causes related to management. These types were used to explain the insufficient task output including the wrong prioritization of implementation work and lack of resources & schedules in software testing. However, solving these two problems requires improvements in sales & requirements and software testing. In three cases, the participants claimed that the requirements were vague. In two cases, the participants claimed that the test reports were insufficient. The managers were doing resource allocation and prioritization with inadequate information. Furthermore, in one case, the participants noted that the service level agreement with the customers forced the managers to prioritize the customer requests higher than fixing defects. As an effect, severe defects and low priority customer wishes were transposed. The low priority of software quality was perceived as a problem in each case. Sales & requirements was influenced by implementation. This means that solving the problems of sales & requirements requires improvements in implementation. The insufficient task output of sales & requirements was caused by instructions & experience, work practices, task difficulty, existing product, values & responsibilities, company policies, and customers & users. Making good requirements was likely very difficult in the case companies. The existing products were complex and required extensive technical knowledge. Additionally, the managers, developers, and testers had somewhat different needs, which meant that extensive practical knowledge was needed while documenting the requirements. In three cases, the participants noticed that the existing product was so complex that it was likely that nobody knew the system fully. It was also claimed that the customers and users cannot express the features they really need and that the customer requests are not well analyzed by the company people. Furthermore, in one case, the participants claimed that the process for handling the customer requests was ignored. Considering these challenges, it was claimed that the people at the sales & requirements were not well instructed, especially because of the lack of cooperation with the implementation. Lack of Cooperation, Weak Task Backlog, and Lack of Software Testing Resources indicated causal relationships common to the software project failures of our cases. Obviously, these three are also related to one another. The people in sales & requirements, implementation, and software testing did not work enough together. In two cases, the participants used the problem of lack of cooperation to explain why the output of sales & requirements was insufficient for the developers and testers. In two cases, the problem was relevant between the developers and testers. Respectively, in two cases, the defect reports were claimed as vague, which made it difficult for the managers to monitor the status of ongoing project and for the developers to fix the defects. Furthermore, in each case, the managers prioritized new features higher than defect fixing, which was partially caused by the insufficient output of sales & requirements and software testing, but also because of the values of the managers. The values of the managers also explained the lack of software testing resources, a problem in each case. Furthermore, in each case, the participants used the low priority of defect fixing to explain why the implementation resulted in low product quality. While considering what the managers could have done better, enabling the cooperation and critically monitoring the task backlog would have been important. This is because many of the causes of failures were interconnected to the cooperation and task backlog. Considering the values of the managers related to the defect fixing, we should understand that technical debt is not always a negative thing for the software product company [74].

171 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Dependencies over various process areas have been introduced [58] and causal relationships between the common causes of software project failures have been considered [8]. Our results indicate that many of the common causes of failures are not isolated. Instead, they explain one another through the bridge causes. For example, in Case Isolated, vague requirements were explained by the lack of collaboration which was explained by stakeholder conflicts. Prior work has presented the same causes as separated [17]. In order to correctly react to the failures, we need to understand, not only the causes of failures, but the actual mechanism of how the multiple causes together manifest as a failure. More attention should be given to analyzing the causal relationships between the causes. Considering the failures of our cases, it would be too general to state that the failures were caused by the lateness or inaccurate scope estimations only, as presented in [75]. Instead, our case companies needed to understand why the lateness and inaccuracy occurred. The quality of implementation and software testing had a central role in the failures. Additionally, management and sales & requirements explained the problems occurring in implementation, software testing, and release & deployment. Furthermore, management and sales & requirements were influenced by the complex existing product and organizational issues including motivation and collaboration. In order to prevent the failures, there was a need to manage the existing product including its requirement debt, test debt, architectural debt, and documentation debt. Additionally, there was a need to improve the collaboration and motivation among the company people. RQ3: Do the causes perceived as feasible targets for process improvement differ from the other detected causes, and if so, how? Considering the distributions of the causes in the process areas, it seems that the proposed and selected causes do not differ from the other detected causes. The process areas that include the highest number of detected causes also include the highest number of proposed and selected causes. Similarly, it seems that the process area that includes the highest number of causes in one case does not include the highest number of causes in the other cases. This indicates that the importance of a specific process area for process improvement, e.g., management, is dependent on the case context. Our results show that the causes of failures perceived as feasible targets for process improvement are related to the cause types of Tasks, People, and Methods. Most of the selected causes are related to Tasks (45%) and People (41%). Furthermore, when looking at causes proposed for process improvement, the types of Tasks (29%), People (26%), and Methods (27%) were more frequent than Environment (18%). A comparison of the selected and proposed causes with the detected causes shows that the share of the causes related to Tasks, People and Methods increases. Furthermore, the comparison of the selected causes with the proposed causes shows that the selected causes are related to People highly more often than Methods. This indicates that the key representatives perceived that the improvements need to be focused on the company employees whereas the case participants perceived that also the work practices should be improved. Xiangnan et al. [25] used the concept of internal and external causes to indicate the causes of failures that are under the control of the project team. They claimed that the internal causes are under the control of the project team and they include the project manager, project team, process & technology, and completion of project delivery results. Instead, the external causes are not under the control of the project team and they include the customers and causes related to other stakeholders. This may explain why only a few of the proposed and none of the selected causes included causes related to customers. Furthermore, the causes related to customers belong to the sub-type of Environment. The other subtypes of Environment were rarely proposed or selected. Perhaps such causes are more difficult, or even impossible, for the company to control compared with those related to Tasks, Methods and People. Considering the causal relationships interconnecting the process areas, we hypothesize that the causes of failures perceived as feasible targets for process improvement point out and explain weaknesses between the process areas. The causes of failures were distributed into various process areas at each case whereas the failures surfaced at implementation, software testing, and release & deployment. This means that you must be willing to make process changes outside of the process areas where the failure surfaces. Comparison of proposed and selected causes with the other detected causes shows that the proportion of the bridge causes increases in the proposed and selected causes. This means that the company people found such causes as feasible targets for the process improvement. Perhaps this is because these causes cannot be controlled solely by the people in the process area where the failure surfaces, as indicated by Keil et al. [11]. For example, the failure in Case Defects, i.e., Fixing and verifying defects delay the project schedules, surfaced at implementation and software testing. However, the failure of Case Defects cannot be prevented by making improvements only to the implementation and software testing, as the failure was also caused by sales & requirements and management. This was also the case in the other companies Implications In this section, we present the implications of our results and provide recommendations for future works. We start with a discussion on the implications for the causal analysis of software project failures. Thereafter, in Section 5.2.2, we discuss the applicability of the RCA method for causal analysis. Finally, we discuss the implications of our results for software outcome prediction models Causes of software project failures We propose that studying the relationships between the factors affecting the outcome of software projects requires modeling the software development as a system where process areas are interconnected multi-directionally. McLeod and MacDonell [27] claimed that people and technology are interconnected through multidimensional processes. Our results indicate that a software project failure is an effect of various important causes bridging the process areas together. Considering software product development as a set of linked activities [59], the direction of cause and effect relationships over the bridge causes follows the logical workflow from sales & requirements to verification and software release. Thus, the detection of bridge causes could start by considering problems in the workflow items (e.g. requirements, resource allocations, estimations, developed features, test reports, etc.). Studying the problems in the workflow items, however, requires explaining why the problems occur. In the cases of this study, the process areas were bridged together multi-directionally. The insufficient task output of process areas created one direction, but insufficient social and informative interaction between people in different process areas created another direction being more exploratory and unforeseeable. The bridge causes related to the interaction between people explained the bridge causes related to the task output. Thus, explaining the problems in workflow items required that possible bridge causes and local causes in all process areas were considered. We also noted that software project failures are different and dependent on the case context. Therefore, the data collection should not be limited to the common causes of failures, because

172 640 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) such causes alone do not cover the case specific problems comprehensively enough. For example, the common causes of software project failures introduced in Section 2.1 would not have covered the project failures in Case Complicated (see Section 4.3) where the process areas of release & deployment, implementation work and software testing had a central role. Despite that many of the causes of failures were common in the cases of this study, a high number of cause and effect relationships were case specific. Thus, a case specific data collection and analysis was needed Root cause analysis We used RCA in the data collection. Therefore, considering the outcome of RCA in the context of software project failures, our results contribute to the research on RCA. It seems that in the case of software project failures, RCA helps to explain the perceptions of domain experts on what happened, where it happened, and why it happened. Prior studies have used questionnaires to conclude the causes of project failures [27]. The difference between these two approaches is that RCA provides information about the perceived cause and effect relationships. On the other hand, a questionnaire helps to generalize the findings as it can be used to collect information from a high number of subjects. We hypothesize that future works aiming to understand the causes of software project failures should utilize both heavy data collection for statistical analysis as well as RCA depicted in this paper. RCA could be used to model the problem domain whereas questionnaires could be used to generalize it. Considering the data analysis, the advantage and disadvantage of RCA is a high number of detected causes. The high number of causes helps to reveal process improvement targets [4]. However, this also makes it somewhat challenging to conclude the causal mechanisms related to the failure under analysis. Thus, the data analysis techniques for RCA should be a part of future works. In order to reveal the causal mechanisms, we found it useful to classify the detected causes into four dimensions: process area, type, interconnectedness, and feasibility for process improvement (see Section 3.3). This made it possible to model the bridge causes and local causes. Additionally, we were able to characterize the causes perceived as the most important for process improvement. We hypothesize that our analysis approach is useful when RCA is applied to software project failures. However, this approach needs further validation. The efficiency and ease-of-use of RCA were evaluated by the case participants of the companies. The participants considered RCA as more efficient than the prior practices used in the companies. Additionally, they perceived RCA as easy-to-use. The analysis resulted in tens of high quality corrective actions, which resulted in process changes in the companies. Thus, we hypothesize that RCA is a useful practice to detect process improvement opportunities in software product companies. Considering the limitations, we used the ARCA method to conduct RCA. Therefore, the evaluation results of RCA are also limited to the work practices of ARCA. The evaluations of the participants in detail can be found in our prior paper [4], which also includes the development of ARCA and literature review about prior RCA methods Prediction models Prediction models have been used to predict the likelihood of success and failure in ongoing software projects [76,77]. The models are used to consider corrective actions for a failure before the failure occurs [76]. They take the state of risk factors as an input and provide the likelihood of a failure as an output [76,77]. The knowledge about the current state of risk factors is based on questionnaires [76,77]. Thus, the accuracy of prediction models is dependent on the accuracy of the used questionnaire. The accuracy of prediction models is also dependent on the causal model used in the likelihood calculation. The causal model is based on prior statistical evidence about the correlations between the risk factors [77,78]. We believe that our results provide useful domain knowledge for prediction models. Alaeddini and Dogan [79] introduced an approach where the output of RCA is used with the Bayesian network in order to improve the real-time identification of the causes of failures. Cheng and Greiner [80] claimed that one way to improve the prediction accuracy is to use domain knowledge when creating the causal model. Such knowledge includes the order of causes and effects, forbidden links, and cause and effect relationships [80]. Steck and Tresp [78] presented a similar idea by stating that learning the structure of data provides better information about the domain than basing the reasoning on correlations and distance measures only. Our results provide rich knowledge about the situation in a particular case, including cause and effect relationships between the risk factors and the process areas where a specific risk factor causes a problem. Thus, we assume that our results could improve the prediction models. First, the bridge causes and local causes could be used to extend the questionnaires of risk factors used in the prediction models. Second, they could be used to consolidate the potential cause and effect relationships found by using statistical correlations. Third, their types and process areas (see Table 4) could help to consider various important aspects while making the questionnaires. Considering the risk factors used in prior works [48,77], it seems that our results include more factors particularly in the process areas of implementation, software testing, and release & deployment. By extending the prior works with these missing process areas, it could be possible to detect bridge causes from the requirements and management by using statistical methods. We conclude that the outcome of prediction models is important for RCA and vice versa. We also believe that RCA could be used among the questionnaire in order to improve the knowledge about the current state of the project under analysis, as presented in [63]. RCA is not limited to specific risk factors as it is with premade questionnaires, e.g., [77]. Instead, the outcome of RCA reflects the true domain knowledge being limited to the experience of the group of participants Threats to validity This section discusses the validity of our empirical results using a validation scheme presented by [67]. We will present the construct validity, the reliability, and the external validity of the study Construct validity Construct validity indicates whether the studied operational measures really represent what is investigated according to the research questions [67]. The validity of the RCA outcome has been criticized as relying strongly on assumptions and critical thinking of practitioners [81]. Using the ARCA method creates a threat to the construct validity as the research data was relying on human input. This means that the causes of failures and their causal relationships were perceived, and dependent on the experience, awareness, memory, expertise and analytical capabilities of the case participants. In our prior work [4], we evaluated the accuracy, correctness and usefulness of the detected causes of this study by utilizing interviewing, questionnaires, and observations. Even though our results indicate that the outcome of the ARCA method was correct, we were not able to prove that the detected causes and their causal relationships were 100% correct and completely covered the failures. It should be noted that this problem is also relevant in other studies utilizing RCA, interviews, or surveys.

173 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) In order to analyze the causes of failures and their causal relationships systematically, we developed a detailed cause classification system. Regardless of our effort trying to make the classification system as comprehensive as possible, classifying the causes of failures always dissipates the dissimilarities and simultaneously highlights the similarities. This means that there is a risk that using the classification system creates systematic errors. This validity threat was controlled by developing the classification system based on the detected causes. Thus, our cause classifications likely correspond the detected causes accurately Reliability Reliability indicates the extent to which the data and analysis are dependent on a specific researcher [67]. Considering the reliability of the classification system, inter-rater agreement of the classifications was analyzed by using Kappa values for randomly selected 10% of the cause statements of each case resulting to the classification of 62 causes by the second author of this paper. Kappa value for the process area dimension was 0.65 that can be concluded as good agreement between the raters. Kappa value for the type dimension was 0.55 that can be concluded as moderate agreement between the raters. As far as we know the reliability of cause classification systems has not previously been reported. In comparison with the reliability of the classification systems used in code reviews, higher Kappa values 0.80 [82], 0.76 [82], and 0.79 [83] have been achieved. We think that the classification systems of code review defects are more mature and code review defects are simpler to classify as the possible defects are limited to the expressiveness of the programming languages while the causes of failures were expressed in natural language which is richer but also more opaque reducing the agreement level between raters. Additionally, our classification system includes a higher number of categories and dimensions. The qualitative analysis on the causal relationships interconnecting the process areas creates a threat for the reliability, because the qualitative analysis was based on the interpretations of the first author. As already presented, the inter-rater agreement for the process area dimension was good. This means that the causes analyzed qualitatively likely covered most of the causal relationships bridging the process areas. Furthermore, the causes bridging two process areas were mostly similar and their amount was low (around one to five). Thus, it was not difficult to summarize the relationships when possible External validity External validity indicates whether it is possible to generalize the findings of the study [67]. All of our cases varied and thus considered the causes of failures and their causal relationships from different perspectives, e.g., the case companies, the case attendees, and the case failures varied. This increases the external validity. There were also similarities between the cases, which made them more comparable while consolidating the case study results, e.g., each failure was highly complex, similar stakeholders were present at each case, and the same amount of effort (120 min) was used at each causal analysis workshop. Considering the results on the types of the causes of failures and the importance of bridge causes for process improvement, we believe that the external validity is high (see Section 5.1). However, as discussed earlier, the specific causes detected in the cases varied. Therefore, the validity of the results on the common causes for failures needs further validation. The software projects studied in this paper were medium-sized (less than 100 people involved), which means that we cannot conclude that our results are valid in small-sized (a few people) or large-sized projects (hundreds people). Additionally, the projects that we studied were geographically distributed and most of them were conducted in European countries only. Therefore, we cannot generalize our findings to collocated projects or other than western cultures. Furthermore, the software development processes used in the projects varied from more traditional to less traditional processes. Thus, we cannot limit our results to traditional waterfall based software development processes or modern agile methods either. Our study is based only on four cases conducted in product companies, and as far as we know, there are no prior studies on the causal relationships between the causes of software project failures. Thus, it was difficult to compare our findings with the findings of the prior studies. Therefore, in order to increase the external validity of our results, replicative studies are needed in different case contexts including projects with different size, cultures, geographical distribution, software development methods, and failures. 6. Conclusions This paper makes four contributions. First, our results indicate that there is no single root cause for software project failures, as also claimed in [16]. Instead, a software project failure is the result of several causes in our cases, we had causes per analyzed failure. Furthermore, these causes form a network where the causes are connected to each other. Additionally, the causes come from many process areas. This matches prior works (see Section 2.1) arguing that software project failures are influenced by social and technical causes, which are spread over various process areas. However, in the prior works, analyses on the causal relationships between the causes of failures are missing, an oversight that we have tried to correct in this paper. Second, our results consolidate the prior works by showing that a software project failure is a result of a multidimensional process where people, tasks, methods and project environment are interconnected. Lack of cooperation, weak task backlog, and lack of software testing resources were common bridge causes for the failures of our cases. These causes interconnected process areas of management, sales & requirements, implementation, software testing, and release & deployment. Furthermore, the prior studies have commonly recognized the process areas of management, requirements engineering, and implementation causing failures. We found that also software testing has a central role in the software project failures. Based on our industrial experience it seems unlikely that the case companies in this paper would have had more problems in software testing than companies in average. We hypothesize that the discrepancies between this work and prior studies are caused by the data collection methods employed. Third, the bridge causes, and causes related to tasks, people, and methods were particularly common among the causes perceived as the most feasible targets for process improvement. The causes of software project failures were equally distributed into the types of People, Tasks, Methods, and Environment. However, the causes related to Environment were seldom perceived as feasible targets for process improvement. We also found a high number of bridge causes interconnecting the process areas, 50% on average. The bridge causes had even a higher share of the causes perceived as feasible for process improvement, 68% on average. For software practitioners, this means that to fix weaknesses in a specific process area, one must be willing to make changes outside of the process areas where the failure surfaces, e.g., the failure of ineffective testing might be most effectively fixed with improvements to collaboration between developers and sales people rather than hiring more testers. Fourth, our results indicate that the causes of failures and their causal relationships are diverse and depend on the case context.

174 642 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) Despite the notification that some of the causes of failures were common between the cases, the detailed analysis showed that a high number of the causes were a case specific. Thus, a case specific analysis is likely needed every time a failure occurs. Considering future work, we believe that the research methods used in this study allow constructing the story behind the data, which is a key component also in more formal works on causal reasoning [54,84]. We see that future works aiming to understand the causes of software project failures should utilize both heavy data collection for statistical analysis as well as RCA depicted in this paper. Replicating studies are needed to increase the external validity of our findings and test our hypotheses. More empirical research is needed to better understand the complicated mechanisms and relationships of causes leading to software project failures. Furthermore, for industry it is recommended that process improvement is done with teams that represent different process areas as causes interconnecting process areas were often perceived the most important for fixing. References [1] P. Naur, B. Randel, Software Engineering: A Report on a Conference Sponsored by the NATO Science Committee, NATO, [2] A. Burr, M. Owen (Eds.), Statistical Methods for Software Quality: Using Metrics for Process Improvement. ITP A Division of International Thomson Publishing Inc., [3] J.J. Rooney, L.N. Vanden Heuvel, Root cause analysis for beginners, Qual. Prog. 37 (7) (2004) [4] T.O.A. Lehtinen, M.V. Mäntylä, J. Vanhanen, Development and evaluation of a lightweight root cause analysis method (ARCA method) field studies at four software companies, Inform. Softw. Technol. 53 (10) (2011) [5] D.N. Card, Learning from our mistakes with defect causal analysis, IEEE Softw. 15 (1) (1998) [6] R. Chillarege, I.S. Bhandari, J.K. Chaar, M.J. Halliday, D.S. Moebus, B.K. Ray, M. Wong, Orthogonal defect classification a concept for in-process measurements, IEEE Trans. Softw. Eng. 18 (11) (1992) [7] K. Ishikawa (Ed.), Introduction to Quality Control, JUSE Press Ltd., 3A Corporation, Shoei., 6-3, Sarugaku-cho 2-chome, Chiyoda-ku, Tokyo 101, Japan, [8] N. Cerpa, J.M. Verner, Why did your project fail?, Commun ACM 52 (2009) [9] J. Dye, T. van der Schaaf, PRISMA as a quality tool for promoting customer satisfaction in the telecommunications industry, Reliab. Eng. Syst. Saf. 75 (3) (2002) [10] J. Capers, Social and technical reasons for software project failures, Crosstalk J. Defense Softw. Eng. 6 (2006) 4 9. [11] M. Keil, P.E. Cule, K. Lyytinen, R.C. Schmidt, A framework for identifying software project risks, Commun. ACM 41 (11) (1998) [12] K.A. Demir, A survey on challenges of software project management, in: Proceedings of the 2009 International Conference on Software Engineering Research Practice, 2009, pp [13] L.A. Kappelman, R. McKeeman, L. Zhang, Early warning signs of IT project failure: the dominant dozen, Inform. Syst. Manage. 23 (2006) [14] K. Moløkken-Østvold, M. Jørgensen, A comparison of software project overruns flexible versus sequential development models, IEEE Trans. Softw. Eng. 31 (9) (2005) [15] K. Moløkken-Østvold, M. Jørgensen, A review of survey on software effort estimation, in: Proc ACM-IEEE Int l Symp. Empirical Software Eng. (ISESE 2003), 2003, pp [16] J. Verner, J. Sampson, N. Cerpa, What factors lead to software project failure, in: Proceedings of Research Challenges in Information Science (RCIS 2008), 2008, pp [17] L.J. May, Major causes of software project failures, Crosstalk J. Defense Softw. Eng. 11 (1998) [18] J.M. Verner, L.M. Abdullah, Exploratory case study research: outsourced project failure, Inf. Softw. Technol. 54 (2012) [19] R.B. Grady, Software failure analysis for high-return process improvement decisions, Hewlett-Packard J. 47 (4) (1996) [20] J. Jacobs, J. Van Moll, P. Krause, R. Kusters, J. Trienekens, A. Brombacher, Exploring defect causes in products developed by virtual teams, Inf. Softw. Technol. 47 (2005) [21] M. Leszak, D.E. Perry, D. Stoll, A case study in root cause defect analysis, in: Proceedings of the 2000 International Conference on Software Engineering, 2000, pp [22] T. Nakashima, M. Oyama, H. Hisada, N. Ishii, Analysis of software bug causes and its prevention, Inf. Softw. Technol. 41 (1999) [23] T. Stålhane, Root cause analysis and gap analysis a tale of two methods, in: EuroSPI 2004, Trondheim, Norway, 2004, pp [24] M. Kalinowski, G.H. Travassos, D.N. Card, Towards a defect prevention based process improvement approach, in: Proceedings of the 34th EUROMICRO Conference on Software Engineering and Advanced Applications, Parma, Italy, 2008, pp [25] L. Xiangnan, L. Hong, Y. Weijie, Analysis failure factors for small & medium software projects based on PLS method, in: The 2nd IEEE International Conference on Information Management and Engineering (ICIME), 2010, pp [26] X. Lu, H. Liu, W. Ye, Analysis failure factors for small & medium software projects based on PLS, in: Proceedings of Information Management and Engineering (ICIME 2010), 2010, pp [27] L. McLeod, S.G. MacDonell, Factors that affect software systems development project outcomes: a survey of research, ACM Comput. Surv. 43 (2011) [28] R.J. Latino, K.C. Latino (Eds.), Root Cause Analysis: Improving Performance for Bottom-Line Results, CRC Press, 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL , [29] F.O. Bjørnson, A.I. Wang, E. Arisholm, Improving the effectiveness of root cause analysis in post mortem analysis: a controlled experiment, Inf. Softw. Technol. 51 (1) (2009) [30] P. Jalote, N. Agrawal, Using defect analysis feedback for improving quality and productivity in iterative software development, in: Proceedings of the Information Science and Communications Technology (ICICT 2005), 2005, pp [31] R.G. Mays, Applications of defect prevention in software development, IEEE J. Sel. Areas Commun. 8 (1990) [32] S.O. Al-Mamory, H. Zhang, Intrusion detection alarms reduction using root cause analysis and clustering, Comput. Commun. 32 (2) (2009) [33] M. Siekkinen, G. Urvoy-Keller, E.W. Biersack, D. Collange, A root cause analysis toolkit for TCP, Comput. Netw. (2008) [34] A. Traeger, I. Deras, E. Zadok, DARC: dynamic analysis of root causes of latency distributions, in: SIGMETRICS 08, Annapolis, Maryland, USA, 2008, pp [35] I. Bhandari, M. Halliday, E. Tarver, D. Brown, J. Chaar, R. Chillarege, A case study of software process improvement during development, IEEE Trans. Softw. Eng. 19 (12) (1993) [36] Z.X. Jin, J. Hajdukiewicz, G. Ho, D. Chan, Y. Kow, Using root cause data analysis for requirements and knowledge elicitation, in: International Conference on Engineering Psychology and Cognitive Ergonomics (HCII 2007), Berlin, Germany, 2007, pp [37] W. Al-Ahmad, K. Al-Fagih, K. Khanfar, K. Alsamara, S. Abuleil, H. Abu-Salem, A taxonomy of an IT project failure: root causes, Int. Manage. Rev. 5 (2009) [38] R.L. Glass, Evolving a new theory of project success, Commun. ACM 42 (1999) [39] N. Agarwal, U. Rathod, Defining success for software projects: an exploratory revelation, Int. J. Project Manage. 24 (2006) [40] J.D. Procaccino, J.M. Verner, K.M. Shelfer, D. Gefen, What do software practitioners really think about project success: an exploratory study, J. Syst. Softw. 78 (2005) [41] T.O.A. Lehtinen, M.V. Mäntylä, What are problem causes of software projects? Data of root cause analysis at four software companies, in: ESEM 11 Proc. of the 2011 International Symposium on Empirical Software Engineering and Measurement, 2011, pp [42] M. Nasir, S. Sahibuddin, Critical success factors for software projects: a comparative study, Sci. Res. Essays 6 (2011) [43] N.H. Arshad, A. Mohamed, Z. Matnor, Risk factors in software development projects, in: Proceedings of the 6th WSEAS Int. Conf. on Software Engineering, Parallel and Distributed Systems, 2007, pp [44] M. Tarawneh, H. AL-Tarawneh, A. Elsheikh, Software development projects: an investigation into the factors that affect software project success/failure in Jordanian firms, in: First International Conference on the Applications of Digital Information and Web Technologies, ICADIWT, 2008, pp [45] K. El Emam, A.G. Koru, A replicated survey of IT software project failures, Softw. IEEE 25 (2008) [46] E. Egorova, M. Torchiano, M. Morisio, Actual vs. perceived effect of software engineering practices in the Italian industry, JSS 83 (2010) [47] R.C. Mahaney, A.L. Lederer, The effect of intrinsic and extrinsic rewards for developers on information systems project success, Proj. Manage. J. 37 (2006) 42. [48] J. Drew Procaccino, J.M. Verner, S.P. Overmyer, M.E. Darter, Case study: factors for early prediction of software development success, Inf. Softw. Technol. 44 (2002) [49] N. Cerpa, M. Bardeen, B. Kitchenham, J. Verner, Evaluating logistic regression models to estimate software project outcomes, Inf. Softw. Technol. 52 (2010) [50] C. Jones, Software tracking: the last defense against failure, Crosstalk J. Defense Softw. Eng. 21 (2008). [51] R. Kaur, J. Sengupta, Software process models and analysis on failure of software development projects, Int. J. Sci. Eng. Res. 2 (2011) 2 3. [52] M.P. Álvarez, The four causes of behavior: Aristotle and Skinner, Int. J. Psychol. Psychol. Ther. 9 (2009) [53] D. Hume, A Treatise of Human Nature [1739], Clarendon Press, Oxford, [54] J. Pearl (Ed.), Causality: Models Reasoning, and Inference, Cambridge University Press, United States of America, [55] C.W. Granger, Some recent development in a concept of causality, J. Econ. 39 (1988) [56] D. Galles, J. Pearl, Axioms of causal relevance, Artif. Intell. 97 (1997) 9 43.

175 T.O.A. Lehtinen et al. / Information and Software Technology 56 (2014) [57] C. Eden, Analyzing cognitive maps to help structure issues or problems, Eur. J. Oper. Res. (2004) [58] P. Monteiro, R.J. Machado, R. Kazman, C. Henriques, Dependency analysis between CMMI process areas, PROFES, LNCS 6156 (2010) [59] Y. Wang, G. King, Software Engineering Processes: Principles and Applications, CRC Press LLC, [60] K. Lyytinen, D. Robey, Learning failure in information systems development, Info Syst. (1999) [61] D.L. Cooke, Learning from incidents, in: Proceedings of the 21st International Conference of the System Dynamics Society, New York, NY, USA, 2003, pp [62] T. Dingsøyr, Postmortem reviews: purpose and approaches in software engineering, Inf. Softw. Technol. 47 (2005) [63] B. Collier, T. DeMarco, P. Fearey, A defined process for project post mortem review, Softw. IEEE 13 (1996) [64] D.N. Card, Defect-causal analysis drives down error rates, Qual. Time 10 (4) (1993) [65] A. Gupta, J. Li, R. Conradi, H. Rönneberg, E. Landre, A case study comparing defect profiles of a reused framework and of applications reusing it, Empir. Softw. Eng. 14 (2) (2008) [66] R.K. Yin (Ed.), Case Study Research: Design and Methods, SAGE Publications, United States of America, [67] P. Runeson, M. Höst, Guidelines for conducting and reporting case study research in software engineering, Empir. Softw. Eng. 14 (2008) [68] T.D. Jick, Mixing qualitative and quantitative methods: triangulation in action, Adm. Sci. Q. 24 (1979) [69] W. Foddy (Ed.), Constructing Questions for Interviews and Questionnaires, Cambridge University Press, Hong Kong by Colorcraft, [70] S. Salinger, L. Plonka, L. Prechelt, A coding scheme development methodology using grounded theory for qualitative analysis of pair programming, in: 19th Annual Psychology of Programming Workshop, Joensuu, 2007, pp [71] I. Jacobson, G. Booch, J. Rumbaugh, The Unified Software Development Process, Addison-Wesley, [72] W. Royce, Managing the development of large software systems, Proc. IEEE WESCON 26 (August) (1970) [73] CMMI Product Team, Capability Maturity Model Integration, Version 1.2, CMMI for Development CMU/SEI-2006-TR-008, ESC-TR , [74] P. Conroy, Technical debt: where are the shareholders interests?, IEEE Softw 29 (2012) [75] T. DeMarco, All late projects are the same, IEEE Softw. (2011) [76] F. Reyes, N. Cerpa, A. Candia-Véjar, M. Bardeen, The optimization of success probability for software projects using genetic algorithms, J. Syst. Softw. 84 (2011) [77] Y. Takagi, O. Mizuno, T. Kikuno, An empirical approach to characterizing risky software projects based on logistic regression analysis, Empir. Softw. Eng. 10 (2005) [78] H. Steck, V. Tresp, Bayesian belief networks for data mining, in: Proceedings of the 2nd Workshop on Data Mining Und Data Warehousing Als Grundlage Moderner Entschidungsunterstuezender Systeme, DWDW99, Sammelband, Universität Magdeburg, [79] A. Alaeddini, I. Dogan, Using Bayesian networks for root cause analysis in statistical process control, Expert Syst. Appl. (2011) [80] J. Cheng, R. Greiner, Comparing Bayesian network classifiers, In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, 1999, pp [81] A. Ayad, Critical thinking and business process improvement, J. Manage. Dev. 29 (2010) [82] K. El Emam, I. Wieczorek, The repeatability of code defect classifications, In: ISSRE 98 Proceedings of the Ninth International Symposium on Software, Reliability Engineering, [83] M.V. Mäntylä, C. Lassenius, What types of defects are really discovered in code reviews?, IEEE Trans Softw. Eng. 35 (2009) [84] R.M. Daniel, M.G. Kenward, S.N. Cousens, B.L. De Stavola, Using causal diagrams to guide analysis in missing data problems, Pubmed 21 (2012)

176

177 Article V V An experimental comparison of using cause-effect diagrams and simple memos in software project retrospectives Timo O.A. Lehtinen, Mika V. Mäntylä, Juha Itkonen and Jari Vanhanen Journal of Systems and Software (2014), 26 pages, in revision.

178

179 An experimental comparison of using cause-effect diagrams and simple memos in software project retrospectives Abstract Timo O.A. Lehtinen¹, Mika V. Mäntylä, Juha Itkonen, Jari Vanhanen Department of Computer Science and Engineering, Aalto University School of Science P.O. BOX 19210, FI-00076, Aalto, Finland Fax: (Software Business and Engineering Institute) Tel (Timo Lehtinen) timo.o.lehtinen@aalto.fi Root cause analysis (RCA) is a recommended practice in retrospectives and cause-effect diagram (CED) is a commonly recommended technique for RCA. Our objective is to evaluate whether CED improves the outcome of RCA and the perceptions of retrospective participants. We conducted a controlled experiment with eleven student software project teams by using two-by-two crossover design resulting in total of 22 experimental units. Two visualization techniques of underlying causes were compared: CED and a simple memo, i.e. a structural list of causes. We used the output of RCA, questionnaires, and group interviews in order to compare the two techniques. CED increased the total number of causes with medium effect size. CED also increased the links between causes, thus, suggesting more structured analysis of problems. Furthermore, the participants perceived that CED improved organizing and outlining the detected causes. The implication of our results is that using CED in the RCA of retrospectives is recommended, yet, not mandatory as the groups also performed quite well with the structural list. CED is visually more attractive and more effective than the structural list, but it is somewhat harder to read and requires specific software tools increasing the burden of adaptability. Key words: Root Cause Analysis, Retrospective, Post Mortem Analysis, Cause-Effect Diagram, Controlled Experiment 1. Introduction Root cause analysis (RCA) is used in software project retrospectives, which are recommended practice for example in the Scrum software development method (Schwaber and Sutherland 2011). In retrospectives, individuals work together in order to create an understanding of what worked well in the prior project, and what could be improved (Bjørnson, Wang, and Arisholm 2009). RCA helps in capturing the lessons learned from individuals (Lehtinen, Mäntylä, and Vanhanen 2011) and aims to state what the perceived problem causes are and where they occur (Lehtinen and Mäntylä 2011; Lehtinen et al. 2014a). Furthermore, RCA can be a part of project retrospectives, but it can also be a part of continuous software process optimization as recommended by the CMMI model (Software Engineering Institute 2010). A cause-effect diagram (CED) is a commonly recommended technique for RCA (Bjørnson, Wang, and Arisholm 2009; Lehtinen, Mäntylä, and Vanhanen 2011; Anbari, Carayannis, and Voetsch 2008; Dingsøyr 2005). The diagram is used to register and visualize the outcome of RCA, i.e., the underlying causes of the problem. Its objective is to ease the detection and communication of the underlying causes and their causal structures. However, there are no studies comparing the use of CED with the use of simple memos which represent the most straightforward approach to documenting retrospectives as they require no special tools, i.e., a word processor or just a pencil and paper is enough. The use of simple memos can be thought as a natural baseline, which graphical diagrams, such as the CED, should be compared with. In our previous work, we operated with software organizations that have used simple memos about the problems instead of CEDs (Lehtinen, Mäntylä, and Vanhanen 2011; Lehtinen et al. 2014b). Thus, reporting and visualizing the causal structures of a problem do not necessarily require CED and the benefits of CED have not been investigated in previous work. Our research problem is the following: Is CED really needed in the RCA of software project retrospectives, and if so, why? To contribute to the research problem we organized a controlled student experiment as part of a software engineering capstone project course, where students conduct software projects in industrial like environment. We

180 compared the outcome of RCA and the perceptions of retrospective participants between a CED and a structural list technique. The rest of the paper is structured as follows. Section 2 introduces the related work, which includes using RCA in the retrospectives of software projects. Additionally, we will present how the CED and structural list techniques can be used in RCA to visualize and organize the causes of problems. At the end of the section, gaps in the existing research are presented. Section 3 presents the research objectives, questions, and methods. We will also introduce the research context, research hypotheses, the used retrospective method (Bjørnson, Wang, and Arisholm 2009) and the experiment design including the treatments, response variables, and controlling the undesired variation. Section 4 presents the study results. Furthermore, we will answer the research questions and discuss the validity threats in Section 5. Section 6 summarizes our findings and suggests future work on the topic. 2. Related work We start this section by presenting the concept of RCA in retrospectives. Thereafter, Section 2.2 introduces CED techniques which are commonly recommended in RCA. Thereafter, Section 2.3 presents the structural list technique which is claimed useful in order to detect the causes of problems during RCA. Section 2.4 concludes the gap in the research Root cause analysis of retrospectives Retrospectives are aimed to facilitate learning from occurred problems. In retrospectives, the team members use RCA to detect the underlying causes for the detected problems. At the beginning of retrospectives, the team members list problems they have faced during the project or milestone (Bjørnson, Wang, and Arisholm 2009). Thereafter, the team members select important problems to be further analyzed with RCA (Bjørnson, Wang, and Arisholm 2009). Next, the causes of the problems are detected (Bjørnson, Wang, and Arisholm 2009). This can be done by constantly asking why? for every cause detected (Lehtinen, Mäntylä, and Vanhanen 2011), e.g., by using Five Whys technique (Andersen and Fagerhaug 2006). While the causes are detected, they are organized into CED (Bjørnson, Wang, and Arisholm 2009). The ultimate output of RCA is the causal structure of problems (Lehtinen et al. 2014a; Stålhane et al. 2003). Software project retrospectives have been introduced as synchronous face-to-face meetings (Dingsøyr 2005; Dingsøyr, Moe, and Nytrø 2001), but today s company practices seem to favor distributed settings (Terzakis 2011). Respectively, CEDs have been introduced as useful for retrospectives (Bjørnson, Wang, and Arisholm 2009), but the existing company practices seem to favor simple memos (Lehtinen, Mäntylä, and Vanhanen 2011; Lehtinen et al. 2014b). Software tool support for collaborative cause-effect diagramming is also missing (Lehtinen et al. 2014b) and therefore conducting RCA in distributed settings is currently challenging by using the methods introduced in prior studies (Stålhane et al. 2003). Thus, in terms of the tool support, we should determine how to visualize the outcome of RCA Cause-effect diagrams Cause-effect diagrams are the most frequently used techniques in RCA. They are commonly used to register and visualize the causal structures of problems. Various techniques to draw CED are introduced, e.g., a fishbone diagram (Andersen and Fagerhaug 2006; Burnstein 2003; Stevenson 2005; Ishikawa 1990), a fault tree diagram (Andersen and Fagerhaug 2006), a directed graph (Bjørnson, Wang, and Arisholm 2009), a matrix diagram (Nakashima et al. 1999), a scatter chart (Andersen and Fagerhaug 2006), a logic tree (Latino and Latino 2006), and a causal factor chart (Rooney and Vanden Heuvel 2004). However, only few of them are utilized in the retrospectives of software projects. These include the fishbone diagram (Bjørnson, Wang, and Arisholm 2009; Andersen and Fagerhaug 2006; Stålhane et al. 2003; Burnstein 2003; Stevenson 2005; Stålhane 2004) and directed graph (Bjørnson, Wang, and Arisholm 2009; Lehtinen, Mäntylä, and Vanhanen 2011; Lehtinen et al. 2014b). The fishbone diagram applies a tree structure where the causes of problems are organized into some premade classes of causes (Lehtinen, Mäntylä, and Vanhanen 2011). Instead, the directed graph applies a network structure where the causes of problems are organized solely based on their cause and effect relationships (Lehtinen, Mäntylä, and Vanhanen 2011). In the context of software project retrospectives, the use of the fishbone diagram has been compared with the directed graph (Bjørnson, Wang, and Arisholm 2009). It has been claimed that the directed graph outperforms the fishbone diagram (Bjørnson, Wang, and Arisholm 2009). This means that the outcome of RCA is at least somewhat dependent on the technique used to visualize the causes of problems. The directed graph increases the number of detected causes (Bjørnson, Wang, and Arisholm 2009). It also improves the analysis by increasing the number of

181 hubs, which are defined as causes that are related to more than one problem (Bjørnson, Wang, and Arisholm 2009). The strict hierarchical manner and weak layout of the fishbone diagram is one its main weaknesses (Bjørnson, Wang, and Arisholm 2009). Another problem of the fishbone diagram is the tree structure (Lehtinen, Mäntylä, and Vanhanen 2011). The tree structure creates a problem of duplicating the same cause under many effects whereas in the network structure only references to the effects are duplicated (Lehtinen, Mäntylä, and Vanhanen 2011). Thus, in the network structure, the number of cause statements remains as low as possible Structural lists A structural list is an alternative approach of CED. It is used to register and visualize the causal structures of problems as simple memos, i.e. textual notations, of problems, including the representation of causes and their effects. Ammerman (1998) presented a technique for RCA called Causal Factor List. He claims that listing the causes into a computer file helps in detecting the root causes of problems. Drawing CED requires writing down cause statements with graphical nodes and edges to interconnect the detected causes (Dingsøyr, Moe, and Nytrø 2001). Instead, listing the causes requires only that the cause statements are written down and simultaneously placed under one another. Additionally, making a structural list of causes does not require specific software tools for RCA as it is with CEDs (Lehtinen, Mäntylä, and Vanhanen 2011; Lehtinen et al. 2014b). Thus, it can be easily adapted to distributed software project retrospectives where the participants are geographically distributed, a research problem introduced by Stålhane et al. (Stålhane et al. 2003). Furthermore, the retrospective outcome and the perceptions of participants utilizing a structural list have rarely been compared with the use of CED (Stålhane et al. 2003; Stålhane 2004). In our prior study (Lehtinen, Mäntylä, and Vanhanen 2011), we criticized the feasibility of using the structural list technique in RCA. We claimed that in the context of software engineering, using that technique makes the analysis difficult, because of the high number of detected causes (Lehtinen, Mäntylä, and Vanhanen 2011). However, our conclusions were mostly based on assumptions. Instead, the structural list has the same practical problem as the fishbone diagram; when a cause explains more than one effect, you need to place the same cause under many effects. This means that while using the structural list in RCA, the workload actually increases as now you need to write down causes more than once (Lehtinen, Mäntylä, and Vanhanen 2011). However, comparison between the fishbone diagram and the directed graph (Bjørnson, Wang, and Arisholm 2009) is not enough for determining the effectiveness of using the structural list, because the fishbone diagram utilizes different visual structure than the structural list Gap in the research The prior studies have failed to address the questions whether the use of CED outperforms simple memos formulated as a structural list (Ammerman 1998) during the RCA of retrospectives. Instead, the prior studies have indicated that the effectiveness of RCA is dependent on the technique used to visualize the causes of problems (Bjørnson, Wang, and Arisholm 2009; Lehtinen, Mäntylä, and Vanhanen 2011). Yet, those studies compare two different visualization techniques rather than comparing CEDs directly with the simple memos. Comparison to simple memos is important as the memos are the most straight-forward to use and they are used in industry (Lehtinen, Mäntylä, and Vanhanen 2011; Lehtinen et al. 2014b). Making memos does not require drawing nodes and arrows between the causes of problems as it is with CEDs. Therefore, they neither require specific software tools (Lehtinen, Mäntylä, and Vanhanen 2011; Lehtinen et al. 2014b). Thus, it is possible that a memo in the form of a structural list is a more effective technique than using CED. The results of Ottensooser et al. (Ottensooser et al. 2012) who compared the use of textual and graphical notations for interpreting business process descriptions support this idea. On the other hand, it is also possible that it is precisely the arrows and nodes of CEDs which improve the retrospective outcome and the perceptions of participants as they help to visualize the causal structures of problems. The prior studies on organizational learning systems and cognitive maps support this view (Lee, Courtney, and O'Keefe 1992). 3. Research methods In this section, we introduce the research goals and present how the research data was collected and analyzed in this controlled experiment (Juristo and Moreno 2003). Research objectives and questions are introduced in Section 3.1. Thereafter, the research context is presented in Section 3.2. In Section 3.3, we introduce the experimental design including the used retrospective method and the treatments, response variables and controlling the undesired variation. Section 3.4 introduces the data collection and analysis methods.

182 3.1. Research objectives and questions Our objective is to evaluate whether CED improves the outcome of RCA and the perceptions of retrospective participants when compared with writing down a structural list about the causes of problems analyzed in software project retrospectives. The comparison is based on two cause and effect structuring techniques, i.e., a directed graph (Bjørnson, Wang, and Arisholm 2009; Lehtinen, Mäntylä, and Vanhanen 2011) and a structural list (Ammerman 1998). Based on the prior studies in the context of software projects (Bjørnson, Wang, and Arisholm 2009; Lehtinen, Mäntylä, and Vanhanen 2011), the directed graph is claimed as the most optimal CED technique in the RCA of retrospectives. We compare the number and causal structures of detected causes considering both the total number of causes and the number of causes with specific characteristics. We also compare the perceptions of participants about the techniques. The research aims to answer the following comparative questions: RQ1: Is there a difference between the techniques in terms of the outcome of RCA? RQ1a: Is there a difference in the number of the detected causes? RQ1b: Is there a difference in the structures of the detected causes? RQ1c: Is there a difference in the characteristics of the detected causes? RQ2: Is there a difference between the techniques in terms of the perceptions of retrospective participants? RQ2a: Is there a difference in the preferred technique? RQ2b: How do the retrospective participants evaluate and describe the techniques? 3.2. Research context Since the early 1980s, Aalto University has provided a capstone project course for computer science students (Vanhanen, Lehtinen, and Lassenius 2012). During the course, the students develop software for external customers in teams. The software development for each customer is arranged as a software project lasting for five months. Each student uses approximately 150 hours for the project. Based on our experiences and the course feedback, the students are highly committed to the projects. The project teams have a total of seven to nine student members. These include a project manager, a quality manager, a software architect and four to six developers. There are no freshmen students in the course. The managers are M.Sc. level students whereas the developers are B.Sc. level students. Many students already have years of experience on industrial software development. The teams are required to follow a process framework defined by the course. The process framework divides the projects into three timeboxed iterations, each lasting six to seven weeks. The process framework combines practices from both agile and plan-driven process models. These can be adapted to sprints, iteration planning, iteration demos, backlogs, weekly stand-ups, retrospectives, pair-programming, continuous integration, risk management, effort estimation and realization, use-cases, test-case based functional testing, and more rigorous quality assurance. Each team is responsible for planning and using a development process that follows the process framework. (Vanhanen, Lehtinen, and Lassenius 2012) The use of students as study subjects has been discussed in the software engineering literature, e.g., (Svahnberg, Aurum, and Wohlin 2008; Berander 2004; Carver et al. 2003; Runeson 2003; Höst, Regnell, and Wohlin 2000). Runeson (2003) discussed the difference of using freshmen students, graduate level students, and industry personnel as study subjects. The conclusions are that graduate level students are feasible subjects for revealing improvement trends, but infeasible to reveal the absolute levels of improvements (Runeson 2003). Berander (2004) explained that the applicability of using students as study subjects is dependent on their experience and commitment. He also claims that the use of students as representatives for professionals is more appropriate in software projects than classroom settings (Berander 2004). Similar conclusions are also given by Carver et al. (2003). The experiment was conducted in the retrospectives of eleven project teams out of fourteen during the academic year The participation in the experiment was voluntary for the project teams. The team members did not know the objective of the experiment in advance. The research context was feasible for studying the improvement trend over the use of CED and structural list in the software project retrospectives of small teams. Most of the student subjects were graduate level students, who were experienced on software development and committed to their software projects. Thus, in the retrospectives, they were able to consider software project problems, which were relevant to their teams. The course projects were also similar to real projects. Thus, many challenges faced by the student teams were industrially relevant, as we concluded in our prior study (Vanhanen, Lehtinen, and Lassenius 2012). These included challenges related to team building, team members, project requirements, project management, and quality assurance. The customers were also committed to their projects and they paid a fee for the university when they got a student project. Thus, the students were required to develop software that was truly

183 needed by the customers. Additionally, similar research context has been previously used to conduct somewhat similar comparison (Bjørnson, Wang, and Arisholm 2009) Experiment design For the participating project teams (see Section 3.2), we provided the retrospective methodologies and controlled the retrospective settings. The course framework required the teams to conduct a retrospective at the end of the second and third iteration. The retrospective method and the used effort were fixed (see Section 0). Thus, our design had two experimental units (retrospectives) for each participating project team, meaning 22 experimental units as a total. The experiment followed a single factor paired design with a single blocking variable (Juristo and Moreno 2003). The factor that we examined was the technique used to visualize and organize the causes of problems. The factor had two alternatives: CED and a structural list. Both of these treatments were applied by each team, but in different retrospectives starting with a randomized order. Figure 1 introduces the CED and Figure 2 introduces the structural list technique. In CED, arrows are drawn between the causes of the problem. Instead, in the structural list, the causal structure is visualized using bullet lists. Furthermore, if a cause affects more than one effect, multiple arrows are drawn from the cause when using CED (see causes 8 and 16). Instead, with the structural list such cause needs to be duplicated under each effect it explains. The blocking variable that we were not able to eliminate was the project phase where the retrospectives were conducted. The first retrospective was conducted in the middle (Iteration 2) and the second was conducted at the end of the project (Iteration 3). We balanced our experiment design in order to take the project phase into account in the analysis. Table 1 summarizes the experiment design including the distribution of teams in the treatments and the project phase. The starting order of treatments was randomized for each team. As a result, six teams used CED and five teams used the structural list in the first retrospective (Iteration 2). Respectively, six teams used the structural list and five teams used CED in the second retrospective (Iteration 3). This randomization balanced the potential effects of the blocking variable related to the project phase. Furthermore, our data analyses were conducted as a paired analysis comparing the differences of the treatments inside each team, which mitigates the effects of differences between teams. The Problem - Cause 1 o Cause 2 Cause 4 Cause 5 Cause 6 Cause 7 Cause 8 Cause 9 o Cause 3 Cause 10 Cause 16 - Cause 11 o Cause 12 o Cause 13 Cause 8 Cause 15 Cause 16 Cause 17 Cause 18 o Cause 14 Cause 19 Figure 1. The CED technique Figure 2. The structural list technique Table 1 Distribution of treatments (A=CED, B= the structural list) into 22 experimental units Team (T) T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 I2 A A B A A A B B B A B Phase (I) I3 B B A B B B A A A B A Retrospective method The used retrospective method, summarized in Figure 3, started with a short introduction about the method. We presented for the participants how the steps of Problem Detection and Root Cause Analysis will be conducted in the retrospective. Our method follows the postmortem analysis method introduced by Bjørnson et al. (2009) who

claimed that such a retrospective method is lightweight and feasible for small software project teams. The method consists of two separated steps, which are introduced below.

Thereafter, each participant introduced the problems to the others. The problems were registered and projected on the wall by the first author who acted as a scribe.

184 claimed that such a retrospective method is lightweight and feasible for small software project teams. The method consists of two separated steps, which are introduced below. In the first step (Problem Detection), the participants were asked to write down problems, which have had a negative impact on reaching the project goals. Thereafter, each participant introduced the problems to the others. The problems were registered and projected on the wall by the first author who acted as a scribe. Similar problems were grouped together by the participants. Thereafter, the participants voted two problems for RCA. The first step was timeboxed to about 30 minutes. The second step (Root Cause Analysis) was conducted for both of the voted problems separately, lasting 40 minutes for each problem. First, each participant alone wrote down causes for the voted problem (5 minutes). Thereafter, they presented the causes for the others who simultaneously brainstormed more causes (15 minutes). The facilitator registered all detected causes immediately to a cause and effect structure shown on the wall. These two phases were repeated once more for the same voted problem. The second voted problem was thereafter processed. Retrospective (120 min) Step 1: Problem Detection Step 2: Root Cause Analysis (For both voted problems) Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 5 min 25 min 5 min 15 min 5 min Phase 6 15 min Write down problems Present and group Vote two problems Write down underlying causes Present and Brainstorm Write down underlying causes Present and Brainstorm Figure 3. The retrospective method used in the study Response variables and research hypothesis Figure 4 introduces the taxonomy used to clarify our research hypotheses. The figure draws a simple causal structure for a problem. The problem is placed on the left side of the figure while its causes are placed on the right side. The causes are organized based on their cause and effect relationships. Theoretically, each cause creates an effect (or effects), which itself can be a cause or the problem, and it is affected by its sub-cause(s). In the figure, the causes being placed next to the problem are the effects of their sub-causes placed on the right side of the diagram. In order to simplify our terminology, each cause, effect and sub-cause explaining why the problem occurs is a cause of the problem. Furthermore, our terminology introduces a term depth level, which indicates the shortest distance from the cause to the problem. The distance quantifies the number of causes at the causal structure from the cause to the problem. Additionally, size of a depth level indicates the number of causes organized to the same depth level. We can see that the size of the Depth Level 1 is 2. Finally, a hub cause (Bjørnson, Wang, and Arisholm 2009) refers to a cause that affects more than one effect and a single cause refers to a cause that affects exactly one effect. Effect Cause Depth Level 1 Depth Level 2 Size = 2 Size = 2 Single Cause Hub Cause Problem Single Cause Single Cause Figure 4. Taxonomy used to clarify our research hypotheses

185 Table 2 summarizes the response variables, our research hypotheses, and the measurements that we used. It has been claimed that the more problem causes are detected, the more effective is the retrospective method (Bjørnson, Wang, and Arisholm 2009). In the terminology of this paper, the response variable called method effectiveness (ME) indicates the number of problem causes detected. It is a simple indicator that counts the numbers of causes detected while ignoring their actual content and related causal structures. For example, there are 19 causes in Figures 1 and 2. Thus, the ME would be 19 for both figures. Our hypothesis was that the retrospective method utilizing CED results in a higher ME than the one utilizing the structural list. We based this hypothesis on prior studies that have commonly recommended using CEDs in RCA (see Section 2.2). Causal structure indicates the cause and effect structure of the causes of the problem. There are two response variables related to the causal structure, i.e., the size of depth level (Bjørnson, Wang, and Arisholm 2009) (SoDL) and the number of hub causes (Bjørnson, Wang, and Arisholm 2009) (NoH) (see Figure 4). The function SoDL(x) indicates the number of causes being registered to the depth level x, whereas the NoH value indicates the number of detected causes which explain more than one effect. Our hypothesis was that generally the return value of SoDL(x) increases among the depth levels. This hypothesis was based on our prior experiences on the output of RCA in industrial software project context (Lehtinen and Mäntylä 2011). In RCA, the detection of causes starts by the detection of few first level causes (Andersen and Fagerhaug 2006), which thereafter evolve to the detection of higher level causes (Andersen and Fagerhaug 2006) resulting in increasing number of detected problems and causes at the higher depth levels. We also hypothesized that the return value of SoDL(x) increases more with CED than with the structural list. This hypothesis was based on our understanding about the visual structure of CED. In contrast to the structural list, CED uses graphical nodes and edges (see Figure 1) helping the participants to focus on the detected causes. Additionally, CED utilizes network structure which maintains the causal structure as clean and simple. Thus, we assumed that higher numbers of causes are detected at the higher depth levels when CED is used. The return value of SoDL(x) is measured by calculating the number of causes at the corresponding depth level x. Furthermore, our hypothesis was that the NoH value is higher when CED is used. In CED, arrows are drawn between the cause and its effects. Instead, in the structural list, the cause needs to be duplicated under the effects it explains. Thus, the number of cause statements is lower in CED than it is in the equivalent structural list. Therefore, there is less distraction in the causal structure when CED is used and thus it is likely that it is easier to detect the different effects the cause explains. We think that the more there are hub causes, the more extensively the causal relationships are analyzed. This is because the hub causes create interconnections between larger ensembles of causes than interconnections between few individual causes. The NoH value is measured by calculating the percentage of causes that were used to explain more than one effect. Characteristics of detected causes (CDC) indicate the distribution of the detected causes among process areas and cause types. Our hypothesis was that the CDC is not dependent on the treatments. We based this hypothesis on the fact that neither of the treatments steers the participants to consider some specific project areas or cause types. We believed that the CDC was mostly dependent on the teams and problems analyzed, not on the studied techniques used to organize and visualize the problems and their causes. CDC is measured by using a classification system for the detected causes. We compared the distributions of causes in cause classes over the treatments. Perceptions of participants (PP) reflect the evaluations of the participants on the treatments. Considering the PP, our initial hypothesis was that the participants prefer CED to be used in retrospectives. This hypothesis was based on prior studies that have commonly recommended using CEDs in RCA (see Section 2.2). We used a questionnaire (see Appendix 1) after each retrospective to measure the perceptions of participants. Additionally, after both treatments were conducted, we used another questionnaire (see Appendix 2) combined with a group interview in order to conclude which treatment the participants preferred and why. Table 2 Response variables, research hypotheses, and related measurements used Response Variable Research Hypothesis Measurement Method Effectiveness (ME) ME with Diagram > ME with List The number of causes Causal Structure Size of Depth Levels (SoDL) SoDL(n+1) > SoDL(n) > > SoDL(2) > SoDL(1) The number of causes at different depth levels The number of causes at different depth levels Number of Hub causes (NoH) NoH with Diagram > NoH with List The percentage of causes that were used to explain more than one effect Characteristics of Detected Causes (CDC) CDC with Diagram CDC with List Distributions of classified causes Perceptions of Participants (PP) PP with Diagram > PP with List Questionnaires and group interviews

186 Controlling undesired variation We assumed that it was highly possible that the project phase where the retrospective was conducted had an impact on the retrospective outcome. We also assumed that the retrospective outcome is highly dependent on the team. In order to balance the effects of these variables, the treatment of each team was randomly assigned in the first phase. In addition, we applied both treatments to each team and used paired analysis to mitigate the variations between teams. We ensured that the retrospective settings were similar in each experimental unit. Therefore, six context variables were controlled. The context variables included the retrospective goal, the number and roles of the participants, the used language, the physical settings, and the retrospective facilitator. We also identified and measured three confounding variables, since we had no control organizing the teams and the project topics. The confounding variables included the voted problems (see Section 0), team members motivation, and team spirit. We controlled the goal of each retrospective. This was important as the problems related to software projects and the number and characteristics of their underlying causes vary (Lehtinen and Mäntylä 2011). Thus, our study results were dependent on the problems analyzed. We controlled this issue by forcing each team to analyze a common endemic problem that occurs frequently during the projects, i.e. why it is challenging to reach the project goals (Vanhanen, Lehtinen, and Lassenius 2012). The number and roles of retrospective participants were controlled. This was important as we believe that the number and causal structures of the causes of a problem are dependent on the number of participants. A high deviation in the number of participants between the treatments would likely have biased the study results. We decided that each retrospective has to include at least four to seven participants, as suggested in (Lehtinen, Mäntylä, and Vanhanen 2011). Additionally, the maximum deviation in the number of participants between the two retrospectives of each team was limited to +/- 1. Similarly, the roles of the participants were controlled. It was decided that at least two out of three people in the management roles of the team have to be present at both retrospectives. The used language was controlled. This was important as we believe that the team members contribution is dependent on the language used. People are likely more active speakers when they use their own mother tongue and thus also the output of retrospectives is dependent on the language used. It was decided that the teams have to use the same language in both treatments. Every retrospective was conducted in similar physical conditions. We took care that the infrastructure used to register and visualize the problems and their causes did not change between the retrospectives, i.e., the used laptop, software tools (Mindjet and MS Word) and projector. This was important as the screen resolution, margins, zoom level, etc. could have otherwise biased the study results through varying visualization capabilities. Similarly, the meeting room settings including the room size, lighting and location remained similar. We also controlled the facilitator of the retrospectives. The first author of this paper steered each retrospective and acted as the scribe for each team. This was important as thus we were able to control the skills of the facilitator. The first author has prior experiences on steering RCA and he was also familiar with the used software tools. Three confounding variables were measured in order to evaluate that dramatic changes in the working of the team did not happen between the retrospectives. The confounding variables included the voted problems (see Section 0), team members motivation and team spirit. Considering the voted problems, we compared the problems the retrospective participants selected for RCA in each treatment. This was important as now we were able to evaluate whether the differences in the treatments may have been caused by different problems analyzed. Furthermore, considering the team members motivation and team spirit, we used a questionnaire after each retrospective, as introduced in Section This was also important as now we were able to evaluate whether the differences between the treatments were caused by varying motivation or team spirit. We asked the participants to evaluate their personal effort, their team s effort, the openness in communication, and the team spirit in each retrospective. We also asked them to evaluate 1) whether some participants purposefully left some important causes out of their attention and 2) whether the participants did not dare to name all the detected causes publicly Data collection and analysis In this section, we introduce the methods we used in the data collection and analysis. As a summary, the data collection was based on triangulation which increases the validity of the study results (Yin 1994; Runeson and Höst 2008; Jick 1979). We used the output of RCA in statistical analyses on the method effectiveness and causal structures of the treatments (see Section 3.4.1). Additionally, we used the output of RCA to analyze whether the characteristics of detected causes remained similar over the treatments (see Section 3.4.2). Furthermore, we combined statistical methods with qualitative methods in order to evaluate the perceptions of participants about the

187 treatments. We asked the participants to provide feedback by using questionnaires (see Section 3.4.3) and group interviews (see Section 3.4.4). Each retrospective and group interview was video recorded in order to be able to transcribe the interviews and further analyze the retrospectives if needed Method effectiveness and causal structures The method effectiveness was analyzed with the paired-samples two-tailed t-test with the alpha level We compared the number of detected causes in the retrospectives of each team. Each cause was counted only once, i.e., the duplicate cause statements were removed. As the number of retrospective participants varied +/-1, we also compared the number of detected causes per number of participants. We also analyzed the method effectiveness by comparing the average, minimum, lower quartile, median, upper quartile, and maximum number of detected causes between the treatments. The causal structures were analyzed by comparing the size of depth levels, and the number of hub causes between the treatments. In the comparison, we used the paired-samples two-tailed t-test with the alpha level Between the treatments of each team, we analyzed whether CED results systematically in larger sizes of depth levels than the structural list technique. Furthermore, we also analyzed whether CED systematically results in a larger proportion of hub causes. Using the t-test was reasonable as the number of detected causes in the treatments was normally distributed between the teams. This conclusion was based on the Shapiro-Wilk test and the analysis of related Q-Q plots. We also tested that the distributions of causes at depth levels were normally distributed. The number of causes was normally distributed from the first to sixth depth levels. Furthermore, we evaluated the standardized effect size for the systematic differences between the treatments by using Cohen s d (Cohen 1988). This was done by dividing the difference between the means of treatments with their pooled standard deviation. The effect size results were interpreted in the following way: d < 0.2 (small), d 0.5 (medium), and d > 0.8 (large) (Cohen 1988). The following pattern was used to calculate Cohen s d, where is the sample mean, n is the sample size, and s is the standard deviation (Kampenes et al. 2007): Characteristics of detected causes We evaluated the characteristics of each detected cause (there were a total of 2247 causes) in order to evaluate whether the causes of problems detected in the retrospectives of each team remained similar between the treatments. We classified the detected causes by using a classification system developed for analyzing the characteristics of the causes of software project problems introduced in our prior studies (Lehtinen and Mäntylä 2011; Lehtinen et al. 2014a). The classification system divides the causes based on their types and process areas. In the classification system, a process area (a total of 6 process area variables) expresses where the cause occurs (see Table 3) whereas a cause type (a total of 14 cause types variables) describes what the cause is (Lehtinen and Mäntylä 2011) (see Table 4). The combination of the process area with the cause type results in a characteristic of the cause (a total of 6 x 14 = 84 characteristics). For example, if the cause is classified into the management work process area and its type is classified as values & responsibility, the characteristic of the cause is values & responsibility in the management work. In order to evaluate whether the characteristics of the causes were similar between the treatments, we calculated the correlation between the numbers of causes with the same characteristic over the treatments. The correlation was calculated between the treatments of each team and between all teams combined together. The closer the correlation is to 1, the more similar are the characteristics Data from questionnaires The analyses on the perceptions of participants were partially based on questionnaires. Questionnaire 1 (see Appendix 1) was used for both treatments separately. Our aim was to evaluate whether similar parts of the treatments were evaluated similarly. We also evaluated whether different parts of the treatments, i.e. the technique used to organize and visualize the causes, were evaluated differently. Furthermore, after the second retrospective, the participants were asked to compare the treatments by using Questionnaire 2 (see Appendix 2). Our aim was to evaluate which treatment the participants prefer the most in the RCA of retrospectives. Questionnaire 1 included 19 questions covering all phases of the retrospective method. We asked the participants to evaluate the method used to collect the causes of problems. We also asked them to evaluate the method used to

188 Table 3 Process areas of the classification system express where the causes occur (Lehtinen and Mäntylä 2011) Process Area Management Work (MA) Sales & Requirements (S&R) Implementation Work (IM) Software Testing (ST) Release & Deployment (PD) Unknown (UN) General characterization of the detected causes Company support and the way the project stakeholders are managed and allocated to tasks. Requirements and input from customers. The design and implementation of features including defect fixing. Test design, execution, and reporting. Releasing and deploying the product. Causes that cannot be focused on any specific process area. Table 4 Cause types of the classification system express what the causes are (Lehtinen and Mäntylä 2011) Type / Sub-type People (P) Instructions & Experiences Values & responsibilities Cooperation Company Policies Tasks (T) Task Output Task Difficulty Task Priority Methods (M) Work Practices Process Monitoring Environment (E) Existing Product Resources & Schedules Tools Customers & Users General characterization of the detected causes This cause type includes the people related causes Missing or inaccurate documentation and lack of individual experience. Bad attitude and lack of taking responsibility. Inactive, inaccurate, or missing communication. Not following the company policies. This cause type includes the task related causes Low quality task output. The task requires too much effort, or time, or it is highly challenging. Missing, wrong, or too low task priority. This cause type includes the methodological causes Missing or inadequate work practices. The process model is missing, unclear, vague, too heavy, or inadequate. Lack of monitoring. This cause type includes the environment related causes Complex or badly implemented existing product. Wrong resources and schedules. Missing or insufficient tools. Customers and users expectations and need. organize the causes. Additionally, the questions included statements about the treatments which the participants were supposed to either agree or disagree with. The scale in each question was ordinal and symmetric, e.g., 1=very bad, 2, 3, 4=neutral, 5, 6, 7=very good. We assumed that the evaluations on the treatments vary only in the specific questions about the method used to organize the causes. This was due to the fact that the causes were organized differently, but collected similarly in both treatments (see Section 0). We compared the treatments by using the Wilcoxon Signed Rank Test with alpha level 0.05 over the evaluations of individual respondents. We also used the Bonferroni correction to calculate the required level of statistical significance. There were a total of 19 questionnaire items. Therefore, the Bonferroni correction gives that the level of statistical significance requires p = (0.05/19). The evaluations of participants who were not present at both retrospectives (10 of 61 participants) were excluded from the comparison. Questionnaire 2 included statements about both retrospectives which the participants were asked to either agree or disagree with. The statements compared the treatments. The scale of the questionnaire was ordinal and symmetric (1=fully disagree, 2, 3, 4=neutral, 5, 6, 7=fully agree). We compared the share of participants who disagreed with the statements to those who agreed with them. The evaluations of participants who were not present at both retrospectives (10 of 61 participants) were excluded from the comparison Data from group interviews In order to consolidate the results from the questionnaires and create a deeper understanding about the perceptions of participants in both treatments, we carried out a group interview with each participating team after the second retrospective. The interview took place immediately after the participants had answered the questionnaires. We did not want to focus the interviews on any specific questions. Instead, we wanted to create an understanding on what the participants thought about the treatments on a general level. The group interview was open ended (Yin 1994) and it was started by asking which of the used techniques do you prefer the most in the RCA of

189 retrospectives? Thereafter, depending on the answers of the participants, the interviewer (the first author) asked clarifying questions about the treatments, e.g., why do you prefer the structural list as a more feasible technique? The interviews were transcribed and thereafter coded by the first author. Additionally, the interviews were translated into English. After the interviews were transcribed into a literal form, the interviews were carefully scrutinized. Thereafter, we created categories that conceptualized the comments of the participants. The first author created preliminary categories, which were thereafter reviewed by other authors. Open coding technique (Flick 2006) was used to analyze how the participants described the treatments. As suggested in (Flick 2006), we started the qualitative analysis by recognizing the units of meaning, i.e. concepts that reflected the reasoning given in the comments (single words and short sentences of words from the comments). For example, there was a comment with CED it is easier to outline the aggregation of causes. This comment resulted in a concept: supports outlining aggregations. Similar concepts were grouped together. Thereafter, all comments were attached to the concepts. The comments were classified line-by-line to the concepts we recognized, as recommended in (Flick 2006). Simultaneously, the comments were divided between the treatments. Thus, we were able to compare how the participants described the treatments on the conceptualized level. In order to compare the comments on a more abstract level, we continued the analysis procedure by recognizing categories that linked the concepts together (Flick 2006). This was done by pondering the potential meaning of concepts for retrospectives. For example, we assumed that the concepts supports outlining aggregations and supports thinking would affect the sense making while the participants try to understand the causes of problems in retrospectives. Thus, a category sense making was created and the corresponding concepts were linked under it. The treatments were compared based on the categories and concepts that we recognized. We compared the treatments in order to recognize the concepts that were unique and common for the treatments. This helped us to make comparison and generalize how the treatments were described, which thereafter helped us to make hypotheses about the study results considering the method effectiveness and causal structures, too. Additionally, this helped us in interpreting the evaluation results from the questionnaires. Furthermore, we also compared the number of groups and comments on the related concepts. This was also somewhat important as it indicated the commonality of the perceptions of participants. 4. Results In this section, we present the study results. We start in Section 4.1 by introducing the quantitative results on the output of the treatments. These include the comparison of the method effectiveness, causal structures, and characteristics of detected causes. Thereafter, in Section 4.2, we introduce how the participants evaluated and described the treatments Output of root cause analysis In this section, we present the results regarding the output of RCA when applying the two alternative treatments. Table 5 summarizes the retrospectives of each team. It shows that the specific focus of the retrospectives remained mostly similar in each team. Table 5 Statistics about the retrospectives CED SL Team # L Voted problems p c c/p # L Voted problems p c c/p 1 1 F Co-operation, management F Co-operation, management F Scope, quality F Quality, scope E Scope, development E Co-operation, management F Scope, quality F Quality, scope F Co-operation, customer F Quality, customer F Tasks, motivation F Motivation, skills F Scope, task monitoring F Task monitoring, scope E Process, skills E Process, skills F Management, co-operation F Co-operation, management E Requirements, risk management E Requirements, skills F Co-operation, management F Co-operation, management Mean Mean #=indicates whether the treatment was conducted in the first (1) or second (2) retrospective, L=used language (F=Finnish, E=English), p=the number of participants, c=the number of detected causes, c/p=the average number of detected causes per participant

4.1.1. Method effectiveness Table 6 presents the descriptive statistics of the number of detected causes divided into the treatments.

The table views the statistics from the team and individual levels. The team level compares the treatments by using the number of detected causes in each team.

190 Method effectiveness Table 6 presents the descriptive statistics of the number of detected causes divided into the treatments. These include the average (Mean), standard deviation (Std), minimum (Min), lower quartile (Q1), median (Med), upper quartile (Q3), and maximum (Max). The table views the statistics from the team and individual levels. The team level compares the treatments by using the number of detected causes in each team. Instead, the individual level compares the treatments by using the average number of detected causes per participants in each team. Figure 5 draws the boxplots for the number of causes at the team level and Figure 6 presents the boxplots for the average number of detected causes per participants. The descriptive statistics indicate that CED outperformed the structural list (SL) in the method effectiveness (see Table 6, and Figures 5 and 6). CED resulted in 107 detected causes as an average. Respectively, the structural list resulted in 94 detected causes. The mean difference and the 95% confidence interval are 12.8 and ±13.8, respectively. The effect size between the treatments is medium (Cohen s d=0.57, p=0.065). When analyzing the method effectiveness on the team level, CED outperformed the structural list in 9 out of 11 teams (see Table 5 for details). When we normalize the number of detected causes by the number of participants, we find that in CED the average number of detected causes per participant was 19.8 compared with 17.2 in the structural list. The mean difference and the 95% confidence interval are 2.5 and ±2.69, respectively. The effect size is medium (Cohen s d=0.52, p=0.065). Furthermore, when analyzing the average number of detected causes per number of participants in a team, CED outperformed the structural list in 8 out of 11 teams (see Table 5 for details). Thus, whether or not we normalize for the number of participants CED provides a medium effect size in number of detected causes (Cohen s d=0.57 or d=0.52), but the difference is not statistically significant (alpha p=0.05) due to small sample size (n=22). Table 6 Descriptive statistics of the number of detected causes between the treatments Focus Treatment Mean Std. Min Q1 Med Q3 Max Team SL CED Individual SL CED Figure 5. Boxplot of the number of causes in the treatments Figure 6. Boxplot of the average number of causes per participant in the treatments

4.1.2. Causal structures Considering the causal structures, Figure 7 shows the average size of the depth levels (SoDL), see Section 3.3.2. With CED, the SoDL increases between the first and third depth levels.

191 Causal structures Considering the causal structures, Figure 7 shows the average size of the depth levels (SoDL), see Section With CED, the SoDL increases between the first and third depth levels. Instead, with the structural list the SoDL increases only between the first and second depth levels. The differences between the treatments in the size of the first (p=0.293, Cohen s d=-0.51) and second (p=0.811, Cohen s d=0.12) depth levels are not statistically significant. The effect sizes are medium to small, respectively. Instead, the difference in the size of the depth level three is statistically significant (p=0.020) and the effect size is large (Cohen s d=1.01). Thus, it is possible that CED allows creating causal structures that have more causes starting from the third level than ones created with the structural list. The difference in the total amount of the detected causes summed from the third to last depth level is medium (Cohen s d=0.64, p=0.07). However, the differences between the treatments in the number of the detected causes at the later depth levels (four to nine) are not statistically significant. Figure 8 presents a boxplot of the percentage of hub causes (NoH) in both treatments (a cause that explains more than one effect, see Section 3.3.2). While comparing the total number of hub causes between the treatments, the t- test gives a large and significant difference (p=0.010, Cohen s d=1.42). As an average, 7% (std. 3%) of the detected causes were hub causes when CED was used. Instead, the average number of hub causes was only 3% (std. 2%) when the structural list was used. Causes (avg) SL CED Causes (avg) at Depth Levels (1-9) Figure 7. Summary of the average number of causes (a total of 2247 detected causes) at depth levels (a total of nine depth levels) Figure 8. Boxplot of the share (%) of hub causes from all detected causes in the treatments Characteristics of detected causes Figure 9 indicates that similar causes were detected in both treatments. For example, in both treatments the top cause was the output of management work (n=106 for the structural list, n=107 for CED). The figure compares the characteristics of all detected causes (see Section 3.4.2) divided between the treatments. Based on the number of causes with similar characteristics, the data is organized from the highest to the lowest number of characteristics occurred in CED. Figure 10 has the same data as Figure 9 and it illustrates the linear correlation of the number of causes with the same characteristics between the treatments. Each plot in Figure 10 represents the number of causes with the same characteristic in both treatments. The X-axis shows the number of causes with a certain characteristic of the structural list and the Y-axis shows the number of causes with the same characteristic of CED. The shares of detected causes with similar characteristics correlate strongly between the treatments (Pearson s r=0.896, p<0.001). This means that the characteristics of the detected causes did not depend significantly on the treatments.

Number of causes (N=2247) 120 100 80 60 40 20 0 1 3 5 7 9 11131517192123252729313335373941434547495153555759616365676971737577798183 Characteristis of causes (N=84) CED SL Figure 9.

192 Number of causes (N=2247) Characteristis of causes (N=84) CED SL Figure 9. Distribution of causes among their characteristics Figure 10. Linear correlation on the distributions of cause characteristics between the treatments (A plot in the figure represents the same cause characteristic with both treatments) 4.2. Feedback of participants In this section, we present the analysis of the most relevant questionnaire data in terms of the research questions. Next, we present the participant s evaluations on the methods after each treatment, their comparisons on the two treatments as well as the findings from the group interviews Evaluations after each treatment Table 7 summarizes the results from Questionnaire 1 that had four Topics. This questionnaire was given after both the first and second retrospective. For both treatments, the evaluations were highly similar considering how the causes of problems were collected, i.e. Topic 1. Furthermore, no differences were detected in Topic 3, the general usefulness of the retrospective, or in Topic 4 that measured the social atmosphere of the retrospective. Topic 2 of the survey studied how the detected causes were organized and in there we found some differences between the methods. The participants preferred CED when asked about the technique used to organize the causes (see Table 7, ID 2.1) and Wilcoxon Signed Rank Test (WSRT) showed that the difference between the treatments is highly statistically significant (p=0.001). The participants also thought that getting the big picture of the problem causes was easier with CED (see Table 7, ID 2.2). However, the difference is not statistically significant (WSRT p=0.089). Finally, the participants saw no difference between treatments in the easiness to register problem causes (see Table 7, ID 2.3) (WSRT p=0.464) Comparison of the treatments At the end of the second retrospective, the participants were asked to compare the treatments by using Questionnaire 2, see Table 8. Questionnaire 2 included statements about the retrospectives (first or second session ) which the participants were supposed to agree or disagree on a 7-point ordinal scale from fully disagree

Empirical Software Evolvability Code Smells and Human Evaluations

Empirical Software Evolvability Code Smells and Human Evaluations Mika V. Mäntylä SoberIT, Department of Computer Science School of Science and Technology, Aalto University P.O. Box 19210, FI-00760 Aalto,