Computational Idealizations in Software Intensive Science. A Comment on Symons & Horner s paper. (Draft)

Computational Idealizations in Software Intensive Science. A Comment on Symons & Horner s paper. (Draft) Nicola Angius Dipartimento di Storia, Scienze delle Uomo e della Formazione Università degli Studi di Sassari, Italy nangius@uniss.it Abstract This commentary on John Symons and Jack Horner s paper, besides sharing its main argument, challenges the authors statement that there is no effective method to evaluate software intensive systems as a distinguishing feature of software intensive science. It is underlined here how analogous methodological limitations characterise the evaluations of empirical systems in non-software intensive sciences. The authors claim that formal methods establish the correctness of computational models rather than of the represented program is here compared with the empirical adequacy problem typifying the model-based reasoning approach in physics. And the remark that testing all the paths of a software intensive system is unfeasible is related to the enumerative induction problem in the justification of empirical law-like hypotheses in non-software intensive sciences. Keywords Software Intensive Science Computational Models Scientific Method The increasing exploitation of software systems in scientific inquiries calls for deeper philosophical investigations upon arising epistemological and methodological issues. The paper Software Intensive Science by John Symons and Jack Horner sheds light on one of those issues to which scanty attention has been paid by philosophers so far. The topic of concern is that the high path complexity of widespread programs, engendered by conditional program statements, introduces epistemologically significant differences between the practice of software intensive science (SIS) and of non-software intensive science (NSIS). The target article argues that it is not feasible to apply conventional statistical inference theory (CSIT), commonly used to evaluate empirical hypotheses in traditional NSIS, to evaluate error distributions within software used in SIS. One of the major merits of this study is, in my view, that of highlighting how the explanatory and predictive capabilities of law-like statements (or of the scientific theory from which the former are derived) in SIS depend on the reliability of the software systems involved. In the case of a computer simulation, the yielded predictions rely not only on the equations representing the evolution of the target system but also on the software involved in the computation of the equations 1

solutions (and on the correctness of the hardware instantiating that software). And in the case of a program drawing regularities from a set of theoretical assumptions, the obtained law-like statements are hypotheses that might not be consistent with observed phenomena in case a faulty program derived them. Scientific knowledge of empirical systems in SIS leans on the knowledge one acquires about the software systems involved and the computer scientific problem of evaluating the correctness of programs becomes essential in SIS. According to the authors, there is no effective method that can characterise error distributions within programs except from Software Testing which, however, cannot be exhaustive for software systems in use today (and hence it is not effective). The problem of programs correctness introduces into current computer-aided scientific inquiries an amount of uncertainty that marks a fundamental difference between non-software intensive science and software intensive science. In the following, I would like to weaken the authors conclusion by discussing the premise that there is no effective method available to examine software intensive systems. If CSIT cannot be applied to programs code, current techniques developed by computer science to tackle the problem of correctness can be nonetheless be put, from an epistemological and methodological point of view, on a par with other common methods used to evaluate scientific theories and hypotheses in NSIS. THE EMPIRICAL ADEQUACY PROBLEM IN SOFTWARE INTENSIVE SCIENCE Software code can be examined both statically, by means of the so-called formal methods (Monin and Hynchey 2003), and dynamically, the main techniques being those of Software Testing (Ammann and Offutt 2008) taken into consideration by the authors. Static code analysis provides an a-priori examination of programs that, in contrast with dynamic code analysis, does not require the examination of the executions of the software intensive system involved. As such, formal verification does provide an effective method of analysis. If Theorem Proving is affected by undecidability limitations (Van Leeuwen 1990), Model Checking (Clarke et al. 1999) supplies one with a depth-first search decidable algorithm able to extensively check, within a reasonable time complexity, a model representing the potential executions of the examined program. The target article seems to underestimate the potentialities of formal methods in theoretical computer science and their methodological impact onto the philosophy of computing. The authors object that, in formal methods, the evaluation problem is only shifted from the software intensive system to a representation of such system, that is, the computational model: whether the model is an adequate representation of the program still remains to be settled and, in order to do this, an intractable time complexity is required that is comparable to that of testing all paths. This is certainly true. However, evaluating the empirical adequacy of computational models involved in SIS is not that different from the problem of evaluating the empirical adequacy of representational models used by NSIS (van Fraassen 1980). According to the model-based reasoning approach to science (Magnani et al. 1999), complex empirical systems are studied by means of simplified representations of them, usually idealised models, and proofs are accomplished within those models. Consistently with the authors objection, model-based results are to be acknowledged as model-based hypotheses since they have to be reinterpreted in the target system in order to be evaluated. Model based hypotheses characterised research in physics since Galileo s study on uniformly accelerated motion (Hughes 1997). What distinguishes a model-based reasoning approach in SIS and in NSIS is that, in SIS, the reasoning involves the usage of software intensive 2

systems and a problem arises as how to examine such systems. In this I agree with the authors. Let us consider the case of Systems Biology studying some bio-chemical process. Current trends in bioinformatics provided by the so-called executable biology (Fisher and Henzingher 2007) represent a biological system by means of a program, the examination of which provides predictions concerning the phenomena under examination. This is a typical SIS approach in biology and the knowledge of the target biological system depends on the knowledge one acquires about the program representing such system; in computer science terms, such knowledge is the correctness of the program with respect to a set of property specifications. Model Checking provides one with a formal tool to prove the correctness of programs that has also been applied to the analysis of cell systems in Systems Biology (Fisher and Henzinger 2007, 1240). The objection I like to address to the target article s authors is that the model-checking approach on system biology, as on any other context in SIS, can be methodologically put on a par with model-based reasoning approaches in NSIS. Property specifications can be used to expresses, usually by means of temporal logic formulas, a set behavioural properties that the biological system is supposed to assume. A state transition system is used to represent, in principle, all potential executions of the program describing the bio-chemical phenomena of interest. To avoid state space explosion, an abstract version of the same model is obtained by applying common state space reduction techniques, mostly data abstractions (Kesten and Pnueli 2000). The model checking algorithm finally checks whether the temporal formulas hold in the abstract model, that is, whether the required behaviours belong to the behaviours allowed by the model. As Symons and Horner underline, we still do not know whether the software examined is a fair instantiation of the abstract model. Indeed, data abstraction might produce what are known as false positives, i.e., paths in the abstract model that do not correspond to any of the actual program s executions (Clarke et al. 2000). Temporal formula verified in an abstract model are akin to model-based hypotheses that need to be justified on the basis of observed executions (Angius 2013b). The authors claim that in order to evaluate whether a given program is a correct instantiation of the computational model, it is necessary to perform the unfeasible task of testing all the program s executions and compare them with the model s paths. This is not the way computer scientists evaluate the empirical adequacy of abstract state transition systems. In case the model checking algorithm terminates with a positive answer, i.e. the temporal logic formula holds in the model, it yields what is called a set of witnesses, that is, of paths satisfying the formalised property specification. Those paths are compared to actual executions to exclude that they correspond to false positives. In the latter case, the model is refined. And if the algorithm ends with a negative answer, a set of counterexamples is exhibited showing paths that violate the checked temporal formula. A tester will try to make the program executing those incorrect paths. If they are actually observed, the program is debugged; if they are not, the model is refined accordingly. Note that, in both cases, not all the paths of the state transition system need to be compared to actual executions, so that the overall testing time complexity is not the time complexity of testing all paths in Software Testing. This process is known as abstraction refinement (Clarke et al. 2000) and characterises many model-based reasoning approaches in NSIS as well (Angius 2013a). My conclusion is that, in NSIS, as in Galileo s study of accelerated motion, models used as surrogates of the physical system to be examined contain some degree of uncertainty as well which is due to the abductive nature of models in science (Magnani 2004). They are abstract, or even 3

idealised, hypothetical structures the adequacy of which cannot be totally evaluated, but such that they can be refined to obtain successful predictions with regard to that which concerns the phenomena of interest. This happens because one cannot test all the behaviours of even a physical system. Both empirical systems and reactive computational systems (i.e. systems continuously interacting with its environments, such as those involved in SIS) may be characterised by an infinite set of runs. I believe that the main thesis of the target article here discussed should be weakened by saying that the introduction of software into science adds further (methodologically not different) sources of uncertainty insofar as a further representational element is introduced in SIS: one has to consider the adequacy of a computational model with respect to a program and the empirical adequacy of the program with respect to the represented empirical system. THE PROBLEM OF INDUCTION IN SOFTWARE INTENSIVE SCIENCE Let us now turn to consider the target article s methodological analysis concerning the dynamic analysis of programs provided by Software Testing. The authors develop an argument showing that only a partial knowledge of software intensive systems is available, because CSIT cannot be applied to programs code and because conditionality prevents one from testing all the runs of a program containing a reasonable number of instructions. I completely agree: the coverage problem in Software Testing is one of the main difficulties with which software engineers have to deal (Ammann and Offutt 2008). The conclusion is that, since CSIT is commonly used to evaluate error distributions in NSIS domains, this marks another fundamental difference between SIS and NSIS. I would like again to weaken the authors conclusion by underlining how the epistemological limits of Software Testing can be considered as instances of the problem of justification of empirical hypotheses and theories (Glymour 1980) obtained without the help of software. Observing all behaviours of an empirical system in order to test a theory or to justify a hypothesis derived from that theory is unattainable as well; indeed, the classical problem of enumerative induction is, since Hume s critique to causal relations, concerned with the problem of observing all the occurrences of the events correlated by the hypothesis. In Software Testing, since all the potential executions cannot be observed, only the executions that are significant for the specifications of interest are tested. In particular, only those runs that violate the specifications are taken into consideration (Ammann and Offutt 2008, p. 11). The target paper stresses that, even in this case, there still remain executions that are not analysed. However, one is here in the similar situation in which only those behaviours of an empirical system that falsify a given hypothesis are observed: both scientific experiments and software tests are theory-laden in so far as tested behaviours are only those that are likely to falsify the hypothesis or the specification respectively (Angius 2014). The authors suggest that software tests are not exploratory (Franklin 1989) and many executions remain untested; yet the same happens with scientific experiments in NSIS: a scientific experiment is, by definition, a set of biased observations that are not exhaustive (Bunge 1998, 281-291). Once more, dynamic code analysis in SIS adds further sources of uncertainty that have nonetheless a methodological counterpart in NSIS. CSIT is used in science to provide statistical evaluations, expressed in terms of probabilistic statements, of empirical hypotheses. Those hypotheses are assessed by considering a population extracted from the empirical system: as such, the evaluation process does not require to observe all potential behaviours of the system to be examined. Some of the system behaviours remain unknown. The same seems to hold for software intensive systems. Despite the fact that CSTI cannot 4

be successfully applied to software code, other statistical techniques are available and commonly used in software engineering to provide error distributions in a given program s code. To give an example in brief, in Software Reliability (Brocklehurst and Littlewood 1992) the dependability of a program is defined in terms of the probability that a specified set of failures will be observed in the future. Such probability is defined in terms of a program faults distribution function assigning to each time-element of a time interval the probability that a fault is executed. The error distribution function is calibrated by letting the program run and observing the actual failure times. The probabilities involved increase or decrease as new executions are observed. The software reliability estimation process involves a Bayesian confirmation of hypotheses on software intensive systems which characterises common statistical approaches in science (Angius 2014). It remains true that, both in the scientific application of CSTI and in Software Reliability, error distributions are attained with respect to given hypotheses or desired specifications respectively. Behaviours of each examined system that are not involved with, or implicated by, the hypotheses/specifications into consideration, remain unobserved and as such unknown. CONCLUSIONS It is undeniable that path complexity prevents one from attaining a comprehensive knowledge of software intensive systems. However, this does not mark an epistemological difference with the analysis of empirical systems in science. In both cases, complex systems are examined via a simplified representation of them, i.e. a model, and the predictions yielded by those models are justified performing model-guided experiments on their target systems. Consequences stemming from data not comprised within those models remain unknown. Software intensive systems introduce into science, along with computational power, further idealizations and assumptions. The interesting research offered by this paper suggests that the prediction accuracy of the empirical regularities formulated in SIS rely on ceteris paribus conditions concerning the involved programs correctness. References Ammann, P. & Offutt, J. (2008). Introduction to Software Testing. Cambridge University Press. Angius, N. (2013a). Abstraction and idealization in the formal verification of software systems. Minds and Machines, 23(2), 211-226. Angius, N. (2013b). Model-based abductive reasoning in automated software testing. Logic Journal of IGPL, 21(6), 931-942. Angius, N. (2014). The Problem of Justification of Empirical Hypotheses in Software Testing. Philosophy & Technology, doi: 10.1007/s13347-014-0159-6. Brocklehurst, S., & Littlewood, B. (1992). New ways to get accurate reliability measures. IEEE Software, 34 42. Bunge, M. A. (1998). Philosophy of Science. Vol. 2, From Explanation to Justification. New Brunswick, New Jersey: Transaction Publishers. Clarke, E. M., Grumberg, O., & Peled, D. A. (1999). Model checking. Cambridge, MA: The MIT Press. Clarke, E., Grumberg, O., Jha, S., Lu, Y., & Veith, H. (2000). Counterexample-guided abstraction refinement. In Computer aided verification, pp. 154-169), Springer Berlin Heidelberg. 5

Fisher, J., & Henzinger, T. A. (2007). Executable cell biology. Nature biotechnology, 25(11), 1239-1249. Franklin, A. (1989). The Neglect of Experiment. Cambridge: Cambridge University Press. Glymour, C. (1980). Theory and Evidence. Princeton: Princeton University Press. Hughes, R. I. (1997). Models and representation. Philosophy of Science, S325-S336. Kesten, Y., & Pnueli, A. (2000). Control and data abstraction: Cornerstones of the practical formal verification. Software Tools and Technology Transfer, 2(4), 328 342. Magnani, L. (2004). Model based and manipulative abduction in science. Foundation of Science, 9, 219 247. Magnani, L., Nersessian, N., & Thagard, P. (Eds.). (1999). Model-based reasoning in scientific discovery. Springer. Monin, J. F., & Hinchey, M. G. (2003). Understanding formal methods. Berlin: Springer. Van Fraassen B. C. (1980), The Scientific Image, Oxford: Oxford University Press. Van Leeuwen, J. (1990). Handbook of Theoretical Computer Science. Volume B: Formal Models and Semantics. Elsevier and MIT Press. 6