MTAT Software Engineering Management

MTAT.03.243 Software Engineering Management Lecture 12: SPI & Empirical Methods - Part B Dietmar Pfahl Spring 2013 email: dietmar.pfahl@ut.ee

Structure of Lecture 12 Feedback on Project SPC Six-Sigma Notes on Experimental Design Exercise Homework 4 Literature

Your feedback is appreciated! Please fill in the questionnaire 10 min

Structure of Lecture 12 Feedback on Project SPC Six-Sigma Notes on Experimental Design Exercise Homework 4 Literature

Basics of Statistical Process Control Statistical Process Control (SPC) monitoring production process to detect and prevent poor quality Sample subset of items produced to use for inspection Control Charts process is within statistical control limits UCL LCL

Variability Random common causes inherent in a process can be eliminated only through improvements in the system Non-Random special causes due to identifiable factors can be modified through operator or management action

Statistical Process Control Understanding the process, Understanding the causes of variation, and Elimination of the sources of special cause variation.

1 Select process 2 Identify product or process characteristics that describe process performance 3 Select the appropriate type of control chart Usage of control charts 4 5 6 Measure process performance over a period of time Use appropriate calculations based on measurement data to determine center lines and control limits for performance characteristics Plot measurement data on control charts 10 Identify and remove assignable causes 8 7 9 Process is stable; continue measuring Are all measured values within limits and distributed randomly around centerlines? Process is not stable Source: Florac & Carleton (1999)

Control Chart Patterns 8 consecutive points on one side of the center line 8 consecutive points up or down across zones 14 points alternating up or down 2 out of 3 consecutive points in zone C but still inside the control limits 4 out of 5 consecutive points in zone C or B

Detecting out-of-control situations Source: Western Electric (1958)

Common questions for investigating an out-of-control process (1) Are there differences in the measurement accuracy of instruments/methods used? Are there differences in the methods used by different personnel? Is the process affected by the environment? Has there been a significant change in the environment? Is the process affected by predictable conditions? Example: tool wear. Were any untrained personnel involved in the process at the time?

Common questions for investigating an out-of-control process (2) Has there been a change in the source for input to the process? Example: plans, specs, information. Is the process affected by employee fatigue? Has there been a change in policies or procedures? Example: maintenance procedures. Is the process adjusted frequently? Did the samples come from different parts of the process? Shifts? Individuals? Are employees afraid to report bad news? One should treat each Yes answer as a potential source of a special cause.

Ishikawa Chart Example: Change Request Process problem reports not logged in properly cannot isolate software artifact(s) containing the problem change control board meets only once a week change decisions not released in a timely manner Delays in approving changes Collection Evaluation Resolution Closure It takes too long to process software change requests cannot determine takes time to delays in shipping information missing from problem reports what needs to be done to fix the problem cannot replicate make changes must reconfigure delays en-route changes and releases problem baselines SENG G. Ruhe MTAT.03.243 511 2012 / Lecture 12 / Dietmar Pfahl 2013 38

Type of Chart depends on Type of Measures Attribute a product characteristic that can be evaluated with a discrete response good bad; yes - no Variable a product characteristic that is continuous and can be measured e.g., complexity, length Types of charts Attributes p-chart c-chart Variables x-bar-chart (means) R-chart (range)

Control Charts for Attributes p-charts uses portion defective in a sample c-charts uses number of defects in an item

p-chart UCL = p + z p LCL = p - z p z = number of standard deviations from process average p = sample proportion defective; an estimate of process average p = standard deviation of sample proportion p = p(1 - p) n

p-chart Example NUMBER OF PROPORTION SAMPLE DEFECTIVES DEFECTIVE 1 6.06 2 0.00 3 4.04 : : : : : : 20 18.18 200 20 samples of 100 pairs of jeans

p-chart Example (cont.) p = total defectives total sample observations = 200 / 20(100) = 0.10 p(1 - p) UCL = p + z = 0.10 + 3 n 0.10(1-0.10) 100 UCL = 0.190 p(1 - p) LCL = p - z = 0.10-3 n 0.10(1-0.10) 100 LCL = 0.010

Proportion defective 0.20 0.18 UCL = 0.190 0.16 p-chart Example (cont.) 0.14 0.12 0.10 0.08 p = 0.10 0.06 0.04 0.02 LCL = 0.010 2 4 6 8 10 12 14 16 18 20 Sample number

c-chart UCL = c + z c LCL = c - z c c = c where c = number of defects per sample

c-chart (cont.) Number of defects in 15 samples SAMPLE 1 12 2 8 3 16 : : : : 15 15 190 NUMBER OF DEFECTS UCL LCL 190 c = = 12.67 15 = c + z c = 12.67 + 3 12.67 = 23.35 = c + z c = 12.67-3 12.67 = 1.99

Number of defects 24 21 UCL = 23.35 18 c = 12.67 c-chart (cont.) 15 12 9 6 3 LCL = 1.99 2 4 6 8 10 12 14 16 Sample number

Control Charts for Variables Mean chart ( x-bar-chart ) uses average of a sample Range chart ( R-Chart ) uses amount of dispersion in a sample

x-bar Chart x 1 + x 2 +... x x = k = k = = UCL = x + A 2 R LCL = x - A 2 R where x = = average of sample means

Mean 5.10 5.08 5.06 UCL = 5.08 5.04 5.02 5.00 x = = 5.01 x- bar Chart Example (cont.) 4.98 4.96 4.94 4.92 LCL = 4.94 1 2 3 4 5 6 Sample number 7 8 9 10

R- Chart UCL = D 4 R LCL = D 3 R where R = R k R = range of each sample k = number of samples

Range R-Chart Example (cont.) 0.28 0.24 0.20 UCL = 0.243 0.16 0.12 R = 0.115 0.08 0.04 0 LCL = 0 1 2 3 4 5 6 Sample number 7 8 9 10

Required Sample Size Attribute charts require larger sample sizes 50 to 100 parts in a sample Variable charts require smaller samples 2 to 10 parts in a sample

Structure of Lecture 12 Feedback on Project SPC Six-Sigma Notes on Experimental Design Exercise Homework 4 Literature

Motorola and Six-Sigma Mikel J. Harry Ph.D. Arizona State University 1984 M.A. Ball State University 1981 B.S. Ball State University 1973

Six-Sigma Key: σ = standard deviation µ = center of the distribution (shifted 1.5σ from its original, on-target location) +/-3σ & +/-6σ show the specifications relative to the original target

Six-Sigma Conceptually, the sigma level of a process or product is where its customer-driven specifications intersect with its distribution. A centered six-sigma process has a normal distribution with mean=target and specifications placed 6 standard deviations to either side of the mean. At this point, the portions of the distribution that are beyond the specifications contain 0.002 ppm of the data (0.001 on each side). Practice has shown that most manufacturing processes experience a shift (due to drift over time) of 1.5 standard deviations so that the mean no longer equals target. When this happens in a six-sigma process, a larger portion of the distribution now extends beyond the specification limits: 3.4 ppm. Source: SEI http://www.sei.cmu.edu/str/descriptions/sigma6_body.html

How to Calculate Six-Sigma? Far Right Tail Probabilities Z P{Z to oo} Z P{Z to oo} Z P{Z to oo} Z P{Z to oo} ----------------+-----------------+------------------+------------------ 2.0 0.02275 3.0 0.001350 4.0 0.00003167 5.0 2.867 E-7 2.1 0.01786 3.1 0.0009676 4.1 0.00002066 5.5 1.899 E-8 2.2 0.01390 3.2 0.0006871 4.2 0.00001335 6.0 9.866 E-10 2.3 0.01072 3.3 0.0004834 4.3 0.00000854 6.5 4.016 E-11 2.4 0.00820 3.4 0.0003369 4.4 0.000005413 7.0 1.280 E-12 2.5 0.00621 3.5 0.0002326 4.5 0.000003398 7.5 3.191 E-14 2.6 0.004661 3.6 0.0001591 4.6 0.000002112 8.0 6.221 E-16 2.7 0.003467 3.7 0.0001078 4.7 0.000001300 8.5 9.480 E-18 2.8 0.002555 3.8 0.00007235 4.8 7.933 E-7 9.0 1.129 E-19 2.9 0.001866 3.9 0.00004810 4.9 4.792 E-7 9.5 1.049 E-21 Six-Sigma: P (x (6-1.5)) = P (x 4.5) = 0.000003398 = 3.398 / 1,000,000

Six-Sigma and the ±1.5σ Shift A run chart depicting a +1.5σ drift in a 6σ process. USL and LSL are the upper and lower specification limits and UNL and LNL are the upper and lower natural tolerance limits.

Six-Sigma Average industry in the US runs at four sigma, which corresponds to 6210 defects per million opportunities. Depending on the exact definition of "defect" in payroll processing, for example, this sigma level could be interpreted as 6 out of every 1000 paychecks having an error. As "four sigma" is the average current performance, there are industry sectors running above and below this value. Internal Revenue Service (IRS) phone-in tax advice, for instance, runs at roughly two sigma, which corresponds to 308,537 errors per million opportunities. Again, depending on the exact definition of defect, this could be interpreted as 30 out of 100 phone calls resulting in erroneous tax advice. ("Two Sigma" performance is where many noncompetitive companies run.) On the other extreme, domestic (U.S.) airline flight fatality rates run at better than six sigma, which could be interpreted as fewer than 3.4 fatalities per million passengers - that is, fewer than 0.00034 fatalities per 100 passengers [Harry 00], [Bylinsky 98], [Harrold 99].

Six-Sigma Assumptions: Normal Distribution Process Mean Shift of 1.5σ from Nominal is Likely Process Mean and Standard Deviation are known Defects are randomly distributed throughout units Parts and Process Steps are Independent

Structure of Lecture 12 Feedback on Project SPC Six-Sigma Notes on Experimental Design Exercise Homework 4 Literature

Experimental Designs http://www.socialresearchmethods.net/kb/design.php Group = Set of experimental units (subjects)

Experimental Designs (cont d) One-Group designs (withingroup): Post-Test X O Pre-Test and Post-Test O X O Interrupted time-series O O X O O O X O X O With: O = observation (measurement) X = treatment (intervention) Multiple-Group designs (betweengroups): With or without random sampling / assignment With or without blocking Balanced or unbalanced Factorial Designs: nested vs. crossed interaction between factors

Experimental Designs: Random Assignment /1?? Definition [Pfl94]: Randomization is the random assignment of subjects to groups or of treatments to experimental units, so that we can assume independence (and thus validity) of results. Rationale for Randomization [Pfl94]: Sometimes the results of an experimental treatment can be affected by the time, the place or unknown characteristics of the participants (= experimental units / subjects) These uncontrollable factors can have effects that hide or skew the results of the controllable variables. To spread and diffuse the effects of these uncontrollable or unknown factors, you can assign the order of treatments randomly, assign the participants to each treatment randomly, or assign the location of each treatment random[y, whenever possible.

Experimental Designs: Random Assignment /2 Randomization is a prerequisite for a controlled experiment!

Experimental Designs: Blocking /1?? Definition [Pfl94]: Blocking (Stratification) means allocating experimental units to blocks (strata) or groups so the units within a block are relatively homogeneous. Rationale for Blocking [Pfl94]: The blocked design captures the anticipated variation in the blocks by grouping like varieties, so that the variation does not contribute to the experimental error.

Experimental Designs: Blocking /2 A X Y Z B Example [Pfl94]: Suppose you are investigating the comparative effects of two design techniques A and B on the quality of the resulting code. The experiment involves teaching the techniques to twelve developers and measuring the number of defects found per thousand lines of code to assess the code quality. It may be the case that the twelve developers graduated from three universities. It is possible that the universities trained the developers in very different ways, so that the effect of being from a particular university can affect the way in which the design technique is understood or used. To eliminate this possibility, three blocks can be defined so that the first block contains all developers from university X, the second block from university Y, and the third block from university Z. Then, the treatments are assigned at random to the developers from each block. If the first block has six developers, you would expect three to be assigned to design method A, and three to method B, for instance.

Experimental Designs: Blocking /3 without blocking with blocking Less variance increases statistical power (for the same mean difference)

Experimental Designs: Balancing X Y Z Definition [Pfl94]: Balancing is the blocking and assigning of treatments so that an equal number of subjects is assigned to each treatment, wherever possible. Rationale for Balancing [Pfl94]: Balancing is desirable because it simplifies the statistical analysis, but it is not necessary. Designs can range from being completely balanced to having little or no balance. unbalanced A B

Experimental Designs: Factorial Designs Definition of Factorial Design: Factor 1 L1 L2 The design of an experiment can be expressed by explicitly stating the number of factors and how they relate to the different treatments. Factor 2 LA LB LC LD Expressing the design in terms of factors, tells you how many different treatment combinations are required. Factor 1 Factor 2 LA L1 LB L2 LC LD Crossed Design: Two factors, F1 and F2, in a design are said to be crossed if each level of each factor appears with each level of the other factor. Nested Design: Factor F2 is nested within factor F1 if each meaningful level of F2 occurs in conjunction with only one level of factor F1.

Experimental Designs: Interaction Effects Example: Measuring time to code a program module with or without using a reusable repository Case 1: No interaction between factors Case 2: Interaction effect Effect on Time to Code (Factor 1) depends (also) on Size of Module (Factor 2) Case 1 Case 2

Design Method Experimental Designs: Crossed vs. Nested Useful for looking at two factors, each with two or more conditions. A 1 B 1 Tool Usage B 2 no Useful for investigating one factor with two or more conditions, Design Method Method A 1 Method A 2 Tool B 1 Usage Tool B 2 Usage yes no yes no Factorial Design: Crossing (each level of each factor appears with each level of the other factor Nesting (each level of one factor occurs entirely in conjunction with one level of another factor) A 2 Proper nested or crossed design may reduce the number of cases to be tested. similar, but not necessarily identical Factors

Experimental Designs: Design Selection Flow Chart for selecting an Experimental Design [Pfl95] [Pfl95] S. L. Pfleeger: Experimental Design and Analysis in Software Engineering. Annals of Software Engineering, vol. 1, pp. 219-253, 1995. Also appeared as: S. L. Pfleeger: Experimental design and analysis in software engineering, Parts 1 to 5, Software Engineering Notes, 1995 and 1996.

Structure of Lecture 12 Feedback on Project SPC Six-Sigma Notes on Experimental Design Exercise Homework 4 Literature

Exercise: Assessing the Quality of Reported Experiments Checklist Paper Work individually Make sure you take notes on the rationale for your assessment After ca. 25 min compare with your neighbour Report to class

Structure of Lecture 12 Feedback on Project SPC Six-Sigma Notes on Experimental Design Exercise Homework 4 Literature

Homework 4 Assignment Work in Pairs 2 Phases (A and B) Deadline A: Mon, 6 May, 17:00 Deadline B: Wed 15 May, 17:00 3 Tasks Phase A: Task 1 & 2 Phase B: Task 3 Task 1: Assess the quality of paper P1 or P1 (pick only one!) Task 2: Design a controlled experiment (pick one RQ!) Task 3: Review two designs of your peers

Structure of Lecture 12 Feedback on Project SPC Six-Sigma Notes on Experimental Design Exercise Homework 4 Literature

Literature on Empirical Methods in SE T. Dybå, B. A. Kitchenham, M. Jørgensen (2004) Evidence-based Software Engineering for Practitioners, IEEE Software F. Shull, J. Singer and D. I. K. Sjøberg: Advanced Topics in Empirical Software Engineering, Chapter 11, pp. 285-311, Springer London (ISBN: 13:978-1-84800-043-8) Chapter: S. Easterbrook et al. (2008) Selecting Empirical Methods for Software Engineering Research A. Endres and D. Rombach (2003) A Handbook of Software and Systems Engineering Empirical Observations, Laws and Theories, Addison-Wesley S. L. Pfleeger (1995-96) Experimental design and analysis in software engineering, Parts 1 to 5, Software Engineering Notes H. Robinson, J. Segal, H. Sharp (2007) Ethnographically-informed empirical studies of software practice, in Information and Software Technology,49(6), pp. 540-551 W. L. Wallace (1971) The Logic of Science in Sociology, New York: Aldine R. K. Yin (2002) Case Study Research: Design and Methods, Sage, Thousand Oaks

Next Lecture Topic: Industry Presentation by Artur Assor (Nortal): "Rebuilding development infrastructure in Nortal" (tentative title) For you to do: Start working on Homework 4