Selection of Software Estimation Models Based on Analysis of Randomization and Spread Parameters in Neural Networks

Selection of Software Estimation Models Based on Analysis of Randomization and Spread Parameters in Neural Networks Cuauhtémoc López-Martín 1, Arturo Chavoya 2, and María Elena Meda-Campaña 3 1, 2,3 Information Systems Department, CUCEA, Guadalajara University, Jalisco, Mexico 1 cuauhtemoc@cucea.udg.mx, 2 achavoya@cucea.udg.mx, 3 emeda@cucea.udg.mx Abstract - Neural networks (NN) have demonstrated to be useful for estimating software development effort. A NN can be classified depending of its architecture. A Feedforward neural network (FFNN) and a General Regression Neural Network (GRNN) have two kinds of architectures. A FFNN uses randomization to be trained, whereas a GRNN uses a spread parameter to the same goal. Randomization as well as the spread parameter has influence on the accuracy of the models when they are used for estimating the development effort of software projects. Hence, in this study, an analysis of accuracies is done based on executions of NN involving random numbers and spread values. This study used two separated samples, one of them for training and the other one for validating the models (219 and 132 projects respectively). All projects where developed applying development practices based on Personal Software Process (PSP). Results of this study suggest that an analysis of random and spread parameters should be considered in both training and validation processes for selecting the suitable neural network model. Keywords: Software development effort estimation; neural networks, randomization, spread parameter, statistical regression 1 Introduction An inadequate estimation of the development effort on software projects could address to poor planning, low profitability, and, consequently, products with poor quality [9]. There are several techniques for estimating development effort, which could be classified on intuition-based and modelbased. The former is partly based on non-mechanical and unconscious processes and the means of deriving an estimate are not explicit and therefore not repeatable [12]; whereas those ones model-based could be classified on statistical and on computational intelligence techniques. Fuzzy Logic, Genetic Algorithms, Genetic Programming, and Neural networks belong to computational intelligence techniques. Neural networks have been applied in several fields as accounting, finance, health, medicine, engineering, manufacturing or marketing [10]. According to software development effort estimation, the feedforward is the neural network most commonly used in the effort estimation field [7]. When neural networks have been applied, they have presented the following weaknesses [7]: 1. It is not clear some of the characteristics like sample size or number of variables. 2. The statistical techniques have not been optimally used. 3. Clarity on the determination of parameters of the neural networks. 4. Results obtained from the model building processes are not validated on a new data set that is not used for building the models. Each of those four problems have considered in this study as follows: 1. Two data samples were used, one of them integrated by 219 projects and developed by 71 persons from the year 2005 to the year 2009, and the other one integrated by 132 projects and developed by 38 persons through the year 2010. Both samples were developed based on the same characteristics of the experiment design (described in section II). Dependent variable is the development effort, whereas independent variables are related to size and people factors, which are described in section I.A. 2. The multiple linear regression equation is generated from a global analysis (based on coefficient of determination) as well as from an individual analysis of its parameters (section IV) to select the significant variables (independent variables) that explain to development effort (dependent variable). This practice has been suggested in [1] and [10]. 3. The GRNN contains a parameter named SPREAD which influences in the GRNN accuracy. Accuracy values are analyzed for several SPREAD values (section V). In addition, FFNN involves randomization to be trained, analysis of executions are done in section VI. 4. Analysis of models is based upon the two following main stages when an prediction model is used [4]: (1) the model adequacy checking or model verification (estimation stage) must be determined, that is, whether the model is adequate to describe the observed (actual) data; if so then (2) the estimation model is validated using new data, that is, prediction stage (sections V and VI). Data of this study were obtained by means of the application of a disciplined software development process: the Personal

Software Process (PSP) whose practices and methods have been used by thousands of software engineers for delivering quality products on predictable schedule [5]. 1.1 Data description of software projects Source lines of code (LOC) remains in favor of many models [14]. There are two measures of source code size: physical source lines and logical source statements. The count of physical lines gives the size in terms of the physical length of the code as it appears when printed [11]. In this study, two of the independent variables are New and Changed (N&C) as well as Reused code and all of them were considered as physical lines of code (LOC). N&C is composed of added and modified code. The added code is the LOC written during the current programming process, while the modified code is the LOC changed in the base program when modifying a previously developed program. The base program is the total LOC of the previous project while the reused code is the LOC of previously developed programs that are used without any modification. A coding standard should establish a consistent set of coding practices that is used as a criterion when judging the quality of the produced code. Hence, it is necessary to always use the same coding and counting standards. The software projects of this study followed those two guidelines. After product size, people factors (such as experience on applications), platforms, languages and tools have the strongest influence in determining the amount of effort required to develop a software product [2]. Programming language experience is used as a third independent variable in this study, which was measured in months. Because projects of this study were developed inside an academic environment, the effort was measured in minutes as was used in [16]. 1.2 Accuracy criterion There are several criteria to evaluate the accuracy of estimation models. A common criterion for the evaluation of prediction models has been the Magnitude of Relative Error (MRE). In several papers, a MMRE 0.25 has been considered as acceptable. The accuracy criterion for evaluating models of this study is the Magnitude of Error Relative to the estimate or MER defined as follows: MER i = Actual Effort i Estimated Effort i Estimated Effort i The MER value is calculated for each observation i whose effort is estimated. The aggregation of MER over multiple observations (N) can be achieved through the mean (MMER) as follows: MMER = ( 1/ N ) N i = 1 MER The accuracy of an estimation technique is inversely proportional to the MMER. i Results of MMER had better results than MMRE in in [15] for selecting the best model; this fact is the reason for using MMER 2 Experimental design The experiment was done inside a controlled environment having the following characteristics: 1. All of the developers were experienced working for software development inside of their enterprises which they were working. 2. All developers were studying a postgraduate program related to computer science. 3. Each developer wrote seven project assignments. However only four of them were selected by developer. The first three programs were not considered because they had differences in their process phases and in their logs, whereas in latest four programs were based on the same logs and in the following phases: plan, design, design review, code, code review, compile, testing and postmortem. 4. Each developer selected his/her own imperative programming language whose code standard had the following characteristics: each compiler directive, variable declaration, constant definition, delimiter, assign sentence, as well as flow control statement was written in a line of code. 5. Developers had already received at least a formal course about the object oriented programming language that they selected to be used though the assignments, and they had good programming experience in that language. Sample of this study only involved developers whose programs were coded in C++ or JAVA. 6. Because of this study was an experiment with the aim to reduce bias, we did not inform to developers our experimental goal. 7. Developers fill out an spreadsheet for each task and submit it electronically for examination. 8. Each group course was not greater than fifteen developers. 9. Since that a coding standard should establish a consistent set of coding practices that is used as a criterion when judging the quality of the produced code [16], it is necessary to always use the same coding and counting standards. The programs developed of this study followed these guidelines. All of them coincided with the counting standard depicted in Table I. 10. Developers were constantly supervised and advising about the process. 11. The code wrote in each program was designed by the developers to be reused in next programs. 12. The kind of the developed programs had a similar complexity of those suggested in [16]. 13. Data used in this study belong from those, whose data for all seven exercises were correct, complete, and consistent

Table 1. Counting standard Count type Type Physical/logical Physical Statement type Included Executable Yes Nonexecutable Declarations Yes (one by text line) Compiler directives Yes (one by text line) Comments and Blank lines No Delimiters: { and } Yes 3 Neural networks An artificial neural network, or simply a neural network (NN), is a technique of computing and signal processing that is inspired on the processing done by a network of biological neurons [13]. The basis for construction of a neural network is an artificial neuron. An artificial neuron implements a mathematical model of a biological neuron. There is a variety of tasks that neural network can be trained to perform. The most common tasks are pattern association, pattern recognition, function approximation, automatic control, filtering and beam-forming. The neuron model and the architecture of a neural network describe how a network transforms its input into an output. Two or more neurons can be combined in a layer, and a particular network could contain one or more such layers [6]. Two kinds of neural networks are briefly described in the following two sections 3.1 Feedforward neural network (FFNN) The input to an artificial neuron is a vector of numeric values x = x, x,..., x j,..., x }. The neuron receives the { 1 2 m vector and perceives each value, or component of the vector, with a particular independent sensitivity called weight w= { w, w2,..., w j,..., w 1 m }. Upon receiving the input vector, the neuron first calculates its internal state v, and then its output value y. The internal state v of the neuron is calculated as the sum of the inner product of the input vector and the weight vector, and a numerical value b called bias as m follows: : v = x w+ b = y j w j + b. This function is also j= 1 known as transfer function. The output of the neuron is a function of its internal state y = Φ(v). This function is also known as activation function. The main task of the activation function is to scale all possible values of the internal state into a desired interval of output values. The intervals of output values are for instance [0, 1] or (-1, 1). A feedforward network consists of layers of neurons. There is an input layer, an output layer and optionally one or more hidden layers between the input and the output layers. After a network receives its input vector, layer by layer of neurons process the signal, until the output layer emits an output vector as response. Neurons in the same layer process the signal in parallel. In the feedforward network the signals between neurons always flow from the input layer toward the output layer. A neural network learns by adjusting its parameters. The parameters are the values of bias and weights in its neurons. Some neural networks learn constantly during their application, while most of them have two distinct periods: training period and application period. During the training period a network processes inputs adjusting its parameters. It is guided by some learning algorithm, in order to improve its performance. Once the performance is acceptably accurate, or precise, the training period is completed. The parameters of the network are then fixed to the learned values, and the network starts its period of application for the intended task. In the present work, a feedforward neural network with one hidden layer is applied for function approximation. 3.2 General regression neural network (GRNN) The architecture of a GRNN is the following [3]: input units provide all the X i variables to all neurons on the second layer. Pattern units are dedicated to receive as input the outputs from a set of input neurons. When a new vector X is entered into the network, it is subtracted from the stored vector representing each cluster center. Either the squares or the absolute values of the differences are summed and fed into a nonlinear activation function. The activation function normally used is the exponential function. The pattern units output is passed on to the summation units. The summation units perform a dot product between a weight vector and a vector composed of the signals from the pattern units. The summation unit that generates an estimate of F(X)K sums the outputs of the pattern units weighted by the number of observations each cluster center represents. The summation unit that estimates Y F(X)K multiplies each value from a pattern unit by the sum of the samples Y j associated with cluster center X i. The output unit merely divides Y f(x)k by f(x)k to yield the desired estimate of Y. When estimation of a vector Y is desired, each component is estimated using one extra summation unit, which uses as its multipliers sums of samples of the component of Y associated with each cluster center X i. 4 Significant variables from statistical regression analysis From a sample of 219 projects, the following multiple linear regression equation considering New and Changed

(N&C), Reused code and Programming Language Experience (PLE) was generated: Effort = 62.5307 + (1.1025*N&C) (0.189257*Reused) (0.477072*PLE) This equation has a coefficient of determination of r 2 0.51, which corresponds to an acceptable value in software estimation according to [16]. ANOVA for this equation showed a statistically significant relationship between the variables at the 99% confidence level. To determine whether the model could be simplified, a parameters analysis of the multiple linear regression was done. In results for this analysis (Table 2); the highest p-value on the independent variables was 0.0027, belonging to reused code. Since this p-value was less than 0.05, reused code is statistically significant at the 95% confidence level. Consequently, the independent variable of reused code was not removed. Hence, this variable will have to be considered for its evaluation. Table 2. Individual analysis of parameters Parameter Estimate Standard error t-statistic p-value Constant 62.5307 4.6836 13.3509 0.0000 N&C 1.1025 0.0766 14.3819 0.0000 Reused -0.189257 0.0623-3.0361 0.0027 PLE -0.477072 0.1028-4.6364 0.0000 5 Analysis of GRNN spread parameter In GRNN a parameter named SPREAD was empirically changed until a suitable value was obtained. If the parameter spread is small the GRNN function is very steep, so that the neuron with the weight vector closest to the input will have a much larger output than other neurons. The GRNN tends to respond with the target vector associated with the nearest input vector. As the parameter spread becomes larger, the function slope of the GRNN becomes smoother and several neurons can respond to an input vector. The network then acts as if it is taking a weighted average between target vectors whose input vectors are closest to the new input vector. As the parameter spread becomes larger, more and more neurons contribute to the average, with the result that the network function becomes smoother [6]. The values for SPREAD were 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, and 40. To select suitable GRNN based on its spread value, it is necessary to know the behavior when the GRNN is applied to a new dataset; that is, a low spread value could over-fit the network and then when the GRNN is applied to new data, it could obtain a larger (worse) MMER instead of a better one. Table 3 shows the MMER values in both the verification and the validation stages as the spread value is being increased. Table 3 shows that as the spread value increases, the MMER in validation stage gets better until the spread value is equal to 13 (when MMER has its best value with 0.23). It can be observed that from the spread value equal to 25, the MMER gets worse. Hence, we considered as suitable GRNN that having a spread value equal to 13. Table 3. Analysis of MMER by stage based on GRNN spread value Stage Verification (219 projects) 5 0.14 0.28 6 0.16 0.26 7 0.18 0.25 8 0.20 0.25 9 0.21 0.24 10 0.22 0.24 11 0.23 0.24 12 0.23 0.26 13 0.24 0.23 14 0.24 0.23 15 0.25 0.23 20 0.26 0.23 25 0.28 0.24 30 0.29 0.24 35 0.29 0.25 40 0.30 0.26 SPREAD values Validation (132 projects) 6 Analysis of FFNN randomization A feedforward network with one layer of hidden neurons is sufficient to approximate any function with finite number of discontinuities on any given interval [13]. Three neurons were used in the input layer of the network. One receives a number of N&C, the second one receives the number of reused lines of code, whereas the last one receives the developer s programming language experience in months. The output layer consists of only one neuron indicating an estimated effort. The set of 219 software projects was used to train the network. This group of projects was randomly separated in three subgroups: training, validation and testing. The training group contained 60% of the projects. The input-output pairs of data for these projects were used by the network to adjust its parameters. The next 20% of data were used to validate the results and identify the point at which the training should stop. The remaining 20% of data were randomly chosen to be used as testing data, to make sure that the network performed well with the data that was not present during parameter adjustment. These percentages were chosen as suggested in [32]. The number of neurons in the hidden layer was optimized empirically: 1, 2, 3, 4, 5, 10, 15, 20, 25 and 30 neurons were used for training the network. Ten executions were done by each number of neurons because this kind of network involved a random process. The optimized Levenberg-Marquardt algorithm [8] was used to train the network. Table 4 presents the MMER obtained by execution.

Table 4. MMER by execution having different number of neurons in the hidden layer Neurons by hidden layer Executions 1 2 3 4 5 6 7 8 9 10 Best MMER 1 0.26 0.25 0.25 0.26 0.25 0.27 0.25 0.25 0.26 0.25 0.25 2 0.24 0.25 0.25 0.26 0.26 0.26 0.26 0.26 0.26 0.25 0.24 3 0.25 0.24 0.25 0.25 0.25 0.28 0.25 0.28 0.25 0.25 0.24 4 0.27 0.26 0.25 0.24 0.26 0.25 0.25 0.25 0.25 0.25 0.24 5 0.26 0.25 0.26 0.26 0.25 0.24 0.25 0.24 0.26 0.29 0.24 10 0.25 0.26 0.27 0.25 0.24 0.25 0.24 0.25 0.28 0.26 0.24 15 0.24 0.38 0.25 0.26 0.25 0.25 0.25 0.25 0.27 0.24 0.24 20 0.25 0.25 0.25 0.25 0.24 0.32 0.24 0.26 0.25 0.25 0.24 25 0.33 0.28 0.38 0.36 0.24 0.28 0.30 0.25 0.25 0.24 0.24 30 0.26 0.24 0.33 0.42 0.25 0.24 0.35 0.28 0.24 0.25 0.24 Table 4 shows a MMER = 0.24 using from 2 to 30 neurons. Considering that more neurons means more computation, to select the final number neurons to be used in this study we proceeded to analyze the frequency of MMER. Table 5 shows that the higher the number of neurons, the higher the MMER dispersion. Table 6. Frequency of MMER by number of neurons in the hidden layer MMER Number of neurons in the hidden layer 1 2 3 4 5 10 15 20 25 30 0.24 1 1 1 2 2 2 2 2 3 0.25 6 3 7 6 3 4 5 6 2 2 0.26 3 6 2 4 2 1 1 1 0.27 1 1 1 1 0.28 2 1 2 1 0.29 1 0.30 1 0.32 1 0.33 1 1 0.35 1 0.36 1 0.38 1 1 0.42 1 Total of executions 10 10 10 10 10 10 10 10 10 10 Based on the data from Table 6, we selected as suitable neural network that with three neurons in the hidden layer since it had the highest frequency (seven times) having a MMER = 0.25. Then, this trained FFNN was applied to the other data set of 132 projects obtaining a MMER = 0.24. 7 Conclusions and future research This research has analyzed the effect that randomization and spread parameter have on the selection of the best neural network model. Accuracy was measured based on the Mean of Magnitude of Error Relative to the estimate or MMER. Two kinds of neural networks were analyzed. The randomization involved in a FFNN showed that the higher the number of neurons, the higher the MMER dispersion. In accordance with GRNN spread parameter, our analysis showed that to select suitable GRNN, it is necessary to know the behavior when the GRNN is applied to a new dataset and not only is sufficient to know its accuracy of the GRNN when it is trained. This analysis was inside of a software development estimation context based upon projects developed in a controlled environment as well as following a disciplined process. Future work is related to the relationship analysis between data statistical characteristics and accuracy of estimation models 8 Acknowledgement The authors of this paper would like to thank CUCEA of Guadalajara University, Jalisco, México, Programa de Mejoramiento del Profesorado (PROMEP), as well as to Consejo Nacional de Ciencia y Tecnología (Conacyt). 9 References [1] B. A. Kitchenham and E. Mendes, Travassos G.H. (2007). Cross versus Within-Company Cost Estimation Studies: A Systematic Review, IEEE Transactions Software Engineering, Vol. 33, No. 5, pages, 316-329 [2] B. Boehm Ch. Abts, A.W. Brown, S. Chulani, B.K. Clarck, E. Horowitz, R. Madachy, D. Reifer and B. Steece, 2000, COCOMO II. Prentice Hall. [3] D. F. Specht, A General Regression Neural Network. IEEE transactions on Neural Networks, Vol. 7, No. 3, 1991. [4] D. Montgomery and E. Peck, Introduction to Linear Regression Analysis, 2001, John Wiley. [5] D. Rombach, J. Münch, A. Ocampo, W. S. Humphrey and D. Burton, Teaching disciplined software development. Journal Systems and Software, Elsevier, 2008, pp. 747-763. [6] H. Demuth, M. Beale and M Hagan, MatLab Neural Network Toolbox 6, User s Guide, 2008. [7] H. Park and S. Baek, An empirical validation of a neural network model for software effort estimation, Journal of Expert Systems with Applications, Elsevier, 2008, Vol. 35, Pp. 929 937 [8] L. Finschi, An Implementation of The Levenberg-Marquardt Algorithm. Eidgenössische Technische Hochschule Zürich, 1996 [9] M. Jørgensen, A Preliminary Theory of Judgment-based Project Software Effort Predictions. IRNOP VIII, Project Research Conference, ed. by Lixiong Ou, Rodney Turner, Beijing, Publishing House of Electronic Industry, 2006, pp. 661-668 [10] M. Paliwal and U.A.Kumar, Neural networks and statistical techniques: A review of applications, Journal of Expert Systems with Applications, Vol. 36, Pp.2 17. 2009. doi:10.1016/j.eswa.2007.10.005 [11] R.E. Park. Software Size Measurement: A Framework for Counting Source Statements, 1992. Software Engineering Institute, Carnegie Mellon University. [12] S. Grimstad and M. Jørgensen, Inconsistency of expert judgment-based estimates of software development effort,

Journal of Systems and Software, Elsevier, Vol. 80, 2007 pp. 1770 1777. [13] S. Haykin, Neural Networks: A Comprehensive Foundation, Second edition, Prentice Hall. 1998 [14] S.G. MacDonell, Software source code sizing using fuzzy logic modelling, Elsevier. Volume 45, Issue 7, 2003, pp. 389-404. Doi:10.1016/S0950-5849(03)00011-9 [15] T. Foss, E. Stensrud, B. Kitchenham and I. Myrtveit I, A Simulation Study of the Model Evaluation Criterion MMRE, IEEE Transactions on Software Engineering, 2003, Vol. 29, No. 11. [16] W. Humphrey A Discipline for Software Engineering. Addison Wesley. 1995.