Computational Idealizations in Software Intensive Science. A Comment on Symons & Horner s paper. (Draft)

Similar documents
Abstractions and the Brain

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Senior Project Information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Rule-based Expert Systems

Self Study Report Computer Science

Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Developing Students Research Proposal Design through Group Investigation Method

On-Line Data Analytics

Introduction to Simulation

Mathematics Program Assessment Plan

EGRHS Course Fair. Science & Math AP & IB Courses

A Case Study: News Classification Based on Term Frequency

AQUA: An Ontology-Driven Question Answering System

PROGRAMME SPECIFICATION

Is operations research really research?

Evolution of Collective Commitment during Teamwork

Full text of O L O W Science As Inquiry conference. Science as Inquiry

This Performance Standards include four major components. They are

Lecture 1: Machine Learning Basics

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Models of / for Teaching Modeling

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Proof Theory for Syntacticians

Lecture 1: Basic Concepts of Machine Learning

Room: Office Hours: T 9:00-12:00. Seminar: Comparative Qualitative and Mixed Methods

Improving software testing course experience with pair testing pattern. Iyad Alazzam* and Mohammed Akour

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Update on Standards and Educator Evaluation

Concept Acquisition Without Representation William Dylan Sabo

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

STA 225: Introductory Statistics (CT)

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

A cognitive perspective on pair programming

MASTER S THESIS GUIDE MASTER S PROGRAMME IN COMMUNICATION SCIENCE

Proficiency Illusion

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

Seminar - Organic Computing

The Singapore Copyright Act applies to the use of this document.

FIGURE IT OUT! MIDDLE SCHOOL TASKS. Texas Performance Standards Project

Reinforcement Learning by Comparing Immediate Reward

What is PDE? Research Report. Paul Nichols

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Some Principles of Automated Natural Language Information Extraction

Management of time resources for learning through individual study in higher education

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Modeling user preferences and norms in context-aware systems

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Language properties and Grammar of Parallel and Series Parallel Languages

Mining Association Rules in Student s Assessment Data

To link to this article: PLEASE SCROLL DOWN FOR ARTICLE

Evidence for Reliability, Validity and Learning Effectiveness

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Visual CP Representation of Knowledge

Replies to Greco and Turner

Document number: 2013/ Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

A. What is research? B. Types of research

A Comparison of Standard and Interval Association Rules

The Good Judgment Project: A large scale test of different methods of combining expert predictions

XXII BrainStorming Day

How do adults reason about their opponent? Typologies of players in a turn-taking game

Prentice Hall Literature Common Core Edition Grade 10, 2012

Computerized Adaptive Psychological Testing A Personalisation Perspective

b) Allegation means information in any form forwarded to a Dean relating to possible Misconduct in Scholarly Activity.

A Version Space Approach to Learning Context-free Grammars

Vision for Science Education A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas

Process to Identify Minimum Passing Criteria and Objective Evidence in Support of ABET EC2000 Criteria Fulfillment

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Analysis of Students Incorrect Answer on Two- Dimensional Shape Lesson Unit of the Third- Grade of a Primary School

Two-Valued Logic is Not Sufficient to Model Human Reasoning, but Three-Valued Logic is: A Formal Analysis

Problems of the Arabic OCR: New Attitudes

DIDACTIC APPROACH FOR DEVELOPMENT OF THE JOB LANGUAGE KIT FOR MIGRANTS

MADERA SCIENCE FAIR 2013 Grades 4 th 6 th Project due date: Tuesday, April 9, 8:15 am Parent Night: Tuesday, April 16, 6:00 8:00 pm

Facilitating Students From Inadequacy Concept in Constructing Proof to Formal Proof

SPATIAL SENSE : TRANSLATING CURRICULUM INNOVATION INTO CLASSROOM PRACTICE

Software Security: Integrating Secure Software Engineering in Graduate Computer Science Curriculum

Circuit Simulators: A Revolutionary E-Learning Platform

Measurability and Reproducibility in University Timetabling Research: Discussion and Proposals

A Genetic Irrational Belief System

Timeline. Recommendations

Reducing Features to Improve Bug Prediction

Preprint.

A Critique of Running Records

PRODUCT COMPLEXITY: A NEW MODELLING COURSE IN THE INDUSTRIAL DESIGN PROGRAM AT THE UNIVERSITY OF TWENTE

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

Software Maintenance

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS

Level 6. Higher Education Funding Council for England (HEFCE) Fee for 2017/18 is 9,250*

What is Thinking (Cognition)?

FUNCTIONAL OR PREDICATIVE? CHARACTERISING STUDENTS THINKING DURING PROBLEM SOLVING

How to read a Paper ISMLL. Dr. Josif Grabocka, Carlotta Schatten

Transcription:

Computational Idealizations in Software Intensive Science. A Comment on Symons & Horner s paper. (Draft) Nicola Angius Dipartimento di Storia, Scienze delle Uomo e della Formazione Università degli Studi di Sassari, Italy nangius@uniss.it Abstract This commentary on John Symons and Jack Horner s paper, besides sharing its main argument, challenges the authors statement that there is no effective method to evaluate software intensive systems as a distinguishing feature of software intensive science. It is underlined here how analogous methodological limitations characterise the evaluations of empirical systems in non-software intensive sciences. The authors claim that formal methods establish the correctness of computational models rather than of the represented program is here compared with the empirical adequacy problem typifying the model-based reasoning approach in physics. And the remark that testing all the paths of a software intensive system is unfeasible is related to the enumerative induction problem in the justification of empirical law-like hypotheses in non-software intensive sciences. Keywords Software Intensive Science Computational Models Scientific Method The increasing exploitation of software systems in scientific inquiries calls for deeper philosophical investigations upon arising epistemological and methodological issues. The paper Software Intensive Science by John Symons and Jack Horner sheds light on one of those issues to which scanty attention has been paid by philosophers so far. The topic of concern is that the high path complexity of widespread programs, engendered by conditional program statements, introduces epistemologically significant differences between the practice of software intensive science (SIS) and of non-software intensive science (NSIS). The target article argues that it is not feasible to apply conventional statistical inference theory (CSIT), commonly used to evaluate empirical hypotheses in traditional NSIS, to evaluate error distributions within software used in SIS. One of the major merits of this study is, in my view, that of highlighting how the explanatory and predictive capabilities of law-like statements (or of the scientific theory from which the former are derived) in SIS depend on the reliability of the software systems involved. In the case of a computer simulation, the yielded predictions rely not only on the equations representing the evolution of the target system but also on the software involved in the computation of the equations 1

solutions (and on the correctness of the hardware instantiating that software). And in the case of a program drawing regularities from a set of theoretical assumptions, the obtained law-like statements are hypotheses that might not be consistent with observed phenomena in case a faulty program derived them. Scientific knowledge of empirical systems in SIS leans on the knowledge one acquires about the software systems involved and the computer scientific problem of evaluating the correctness of programs becomes essential in SIS. According to the authors, there is no effective method that can characterise error distributions within programs except from Software Testing which, however, cannot be exhaustive for software systems in use today (and hence it is not effective). The problem of programs correctness introduces into current computer-aided scientific inquiries an amount of uncertainty that marks a fundamental difference between non-software intensive science and software intensive science. In the following, I would like to weaken the authors conclusion by discussing the premise that there is no effective method available to examine software intensive systems. If CSIT cannot be applied to programs code, current techniques developed by computer science to tackle the problem of correctness can be nonetheless be put, from an epistemological and methodological point of view, on a par with other common methods used to evaluate scientific theories and hypotheses in NSIS. THE EMPIRICAL ADEQUACY PROBLEM IN SOFTWARE INTENSIVE SCIENCE Software code can be examined both statically, by means of the so-called formal methods (Monin and Hynchey 2003), and dynamically, the main techniques being those of Software Testing (Ammann and Offutt 2008) taken into consideration by the authors. Static code analysis provides an a-priori examination of programs that, in contrast with dynamic code analysis, does not require the examination of the executions of the software intensive system involved. As such, formal verification does provide an effective method of analysis. If Theorem Proving is affected by undecidability limitations (Van Leeuwen 1990), Model Checking (Clarke et al. 1999) supplies one with a depth-first search decidable algorithm able to extensively check, within a reasonable time complexity, a model representing the potential executions of the examined program. The target article seems to underestimate the potentialities of formal methods in theoretical computer science and their methodological impact onto the philosophy of computing. The authors object that, in formal methods, the evaluation problem is only shifted from the software intensive system to a representation of such system, that is, the computational model: whether the model is an adequate representation of the program still remains to be settled and, in order to do this, an intractable time complexity is required that is comparable to that of testing all paths. This is certainly true. However, evaluating the empirical adequacy of computational models involved in SIS is not that different from the problem of evaluating the empirical adequacy of representational models used by NSIS (van Fraassen 1980). According to the model-based reasoning approach to science (Magnani et al. 1999), complex empirical systems are studied by means of simplified representations of them, usually idealised models, and proofs are accomplished within those models. Consistently with the authors objection, model-based results are to be acknowledged as model-based hypotheses since they have to be reinterpreted in the target system in order to be evaluated. Model based hypotheses characterised research in physics since Galileo s study on uniformly accelerated motion (Hughes 1997). What distinguishes a model-based reasoning approach in SIS and in NSIS is that, in SIS, the reasoning involves the usage of software intensive 2

systems and a problem arises as how to examine such systems. In this I agree with the authors. Let us consider the case of Systems Biology studying some bio-chemical process. Current trends in bioinformatics provided by the so-called executable biology (Fisher and Henzingher 2007) represent a biological system by means of a program, the examination of which provides predictions concerning the phenomena under examination. This is a typical SIS approach in biology and the knowledge of the target biological system depends on the knowledge one acquires about the program representing such system; in computer science terms, such knowledge is the correctness of the program with respect to a set of property specifications. Model Checking provides one with a formal tool to prove the correctness of programs that has also been applied to the analysis of cell systems in Systems Biology (Fisher and Henzinger 2007, 1240). The objection I like to address to the target article s authors is that the model-checking approach on system biology, as on any other context in SIS, can be methodologically put on a par with model-based reasoning approaches in NSIS. Property specifications can be used to expresses, usually by means of temporal logic formulas, a set behavioural properties that the biological system is supposed to assume. A state transition system is used to represent, in principle, all potential executions of the program describing the bio-chemical phenomena of interest. To avoid state space explosion, an abstract version of the same model is obtained by applying common state space reduction techniques, mostly data abstractions (Kesten and Pnueli 2000). The model checking algorithm finally checks whether the temporal formulas hold in the abstract model, that is, whether the required behaviours belong to the behaviours allowed by the model. As Symons and Horner underline, we still do not know whether the software examined is a fair instantiation of the abstract model. Indeed, data abstraction might produce what are known as false positives, i.e., paths in the abstract model that do not correspond to any of the actual program s executions (Clarke et al. 2000). Temporal formula verified in an abstract model are akin to model-based hypotheses that need to be justified on the basis of observed executions (Angius 2013b). The authors claim that in order to evaluate whether a given program is a correct instantiation of the computational model, it is necessary to perform the unfeasible task of testing all the program s executions and compare them with the model s paths. This is not the way computer scientists evaluate the empirical adequacy of abstract state transition systems. In case the model checking algorithm terminates with a positive answer, i.e. the temporal logic formula holds in the model, it yields what is called a set of witnesses, that is, of paths satisfying the formalised property specification. Those paths are compared to actual executions to exclude that they correspond to false positives. In the latter case, the model is refined. And if the algorithm ends with a negative answer, a set of counterexamples is exhibited showing paths that violate the checked temporal formula. A tester will try to make the program executing those incorrect paths. If they are actually observed, the program is debugged; if they are not, the model is refined accordingly. Note that, in both cases, not all the paths of the state transition system need to be compared to actual executions, so that the overall testing time complexity is not the time complexity of testing all paths in Software Testing. This process is known as abstraction refinement (Clarke et al. 2000) and characterises many model-based reasoning approaches in NSIS as well (Angius 2013a). My conclusion is that, in NSIS, as in Galileo s study of accelerated motion, models used as surrogates of the physical system to be examined contain some degree of uncertainty as well which is due to the abductive nature of models in science (Magnani 2004). They are abstract, or even 3

idealised, hypothetical structures the adequacy of which cannot be totally evaluated, but such that they can be refined to obtain successful predictions with regard to that which concerns the phenomena of interest. This happens because one cannot test all the behaviours of even a physical system. Both empirical systems and reactive computational systems (i.e. systems continuously interacting with its environments, such as those involved in SIS) may be characterised by an infinite set of runs. I believe that the main thesis of the target article here discussed should be weakened by saying that the introduction of software into science adds further (methodologically not different) sources of uncertainty insofar as a further representational element is introduced in SIS: one has to consider the adequacy of a computational model with respect to a program and the empirical adequacy of the program with respect to the represented empirical system. THE PROBLEM OF INDUCTION IN SOFTWARE INTENSIVE SCIENCE Let us now turn to consider the target article s methodological analysis concerning the dynamic analysis of programs provided by Software Testing. The authors develop an argument showing that only a partial knowledge of software intensive systems is available, because CSIT cannot be applied to programs code and because conditionality prevents one from testing all the runs of a program containing a reasonable number of instructions. I completely agree: the coverage problem in Software Testing is one of the main difficulties with which software engineers have to deal (Ammann and Offutt 2008). The conclusion is that, since CSIT is commonly used to evaluate error distributions in NSIS domains, this marks another fundamental difference between SIS and NSIS. I would like again to weaken the authors conclusion by underlining how the epistemological limits of Software Testing can be considered as instances of the problem of justification of empirical hypotheses and theories (Glymour 1980) obtained without the help of software. Observing all behaviours of an empirical system in order to test a theory or to justify a hypothesis derived from that theory is unattainable as well; indeed, the classical problem of enumerative induction is, since Hume s critique to causal relations, concerned with the problem of observing all the occurrences of the events correlated by the hypothesis. In Software Testing, since all the potential executions cannot be observed, only the executions that are significant for the specifications of interest are tested. In particular, only those runs that violate the specifications are taken into consideration (Ammann and Offutt 2008, p. 11). The target paper stresses that, even in this case, there still remain executions that are not analysed. However, one is here in the similar situation in which only those behaviours of an empirical system that falsify a given hypothesis are observed: both scientific experiments and software tests are theory-laden in so far as tested behaviours are only those that are likely to falsify the hypothesis or the specification respectively (Angius 2014). The authors suggest that software tests are not exploratory (Franklin 1989) and many executions remain untested; yet the same happens with scientific experiments in NSIS: a scientific experiment is, by definition, a set of biased observations that are not exhaustive (Bunge 1998, 281-291). Once more, dynamic code analysis in SIS adds further sources of uncertainty that have nonetheless a methodological counterpart in NSIS. CSIT is used in science to provide statistical evaluations, expressed in terms of probabilistic statements, of empirical hypotheses. Those hypotheses are assessed by considering a population extracted from the empirical system: as such, the evaluation process does not require to observe all potential behaviours of the system to be examined. Some of the system behaviours remain unknown. The same seems to hold for software intensive systems. Despite the fact that CSTI cannot 4

be successfully applied to software code, other statistical techniques are available and commonly used in software engineering to provide error distributions in a given program s code. To give an example in brief, in Software Reliability (Brocklehurst and Littlewood 1992) the dependability of a program is defined in terms of the probability that a specified set of failures will be observed in the future. Such probability is defined in terms of a program faults distribution function assigning to each time-element of a time interval the probability that a fault is executed. The error distribution function is calibrated by letting the program run and observing the actual failure times. The probabilities involved increase or decrease as new executions are observed. The software reliability estimation process involves a Bayesian confirmation of hypotheses on software intensive systems which characterises common statistical approaches in science (Angius 2014). It remains true that, both in the scientific application of CSTI and in Software Reliability, error distributions are attained with respect to given hypotheses or desired specifications respectively. Behaviours of each examined system that are not involved with, or implicated by, the hypotheses/specifications into consideration, remain unobserved and as such unknown. CONCLUSIONS It is undeniable that path complexity prevents one from attaining a comprehensive knowledge of software intensive systems. However, this does not mark an epistemological difference with the analysis of empirical systems in science. In both cases, complex systems are examined via a simplified representation of them, i.e. a model, and the predictions yielded by those models are justified performing model-guided experiments on their target systems. Consequences stemming from data not comprised within those models remain unknown. Software intensive systems introduce into science, along with computational power, further idealizations and assumptions. The interesting research offered by this paper suggests that the prediction accuracy of the empirical regularities formulated in SIS rely on ceteris paribus conditions concerning the involved programs correctness. References Ammann, P. & Offutt, J. (2008). Introduction to Software Testing. Cambridge University Press. Angius, N. (2013a). Abstraction and idealization in the formal verification of software systems. Minds and Machines, 23(2), 211-226. Angius, N. (2013b). Model-based abductive reasoning in automated software testing. Logic Journal of IGPL, 21(6), 931-942. Angius, N. (2014). The Problem of Justification of Empirical Hypotheses in Software Testing. Philosophy & Technology, doi: 10.1007/s13347-014-0159-6. Brocklehurst, S., & Littlewood, B. (1992). New ways to get accurate reliability measures. IEEE Software, 34 42. Bunge, M. A. (1998). Philosophy of Science. Vol. 2, From Explanation to Justification. New Brunswick, New Jersey: Transaction Publishers. Clarke, E. M., Grumberg, O., & Peled, D. A. (1999). Model checking. Cambridge, MA: The MIT Press. Clarke, E., Grumberg, O., Jha, S., Lu, Y., & Veith, H. (2000). Counterexample-guided abstraction refinement. In Computer aided verification, pp. 154-169), Springer Berlin Heidelberg. 5

Fisher, J., & Henzinger, T. A. (2007). Executable cell biology. Nature biotechnology, 25(11), 1239-1249. Franklin, A. (1989). The Neglect of Experiment. Cambridge: Cambridge University Press. Glymour, C. (1980). Theory and Evidence. Princeton: Princeton University Press. Hughes, R. I. (1997). Models and representation. Philosophy of Science, S325-S336. Kesten, Y., & Pnueli, A. (2000). Control and data abstraction: Cornerstones of the practical formal verification. Software Tools and Technology Transfer, 2(4), 328 342. Magnani, L. (2004). Model based and manipulative abduction in science. Foundation of Science, 9, 219 247. Magnani, L., Nersessian, N., & Thagard, P. (Eds.). (1999). Model-based reasoning in scientific discovery. Springer. Monin, J. F., & Hinchey, M. G. (2003). Understanding formal methods. Berlin: Springer. Van Fraassen B. C. (1980), The Scientific Image, Oxford: Oxford University Press. Van Leeuwen, J. (1990). Handbook of Theoretical Computer Science. Volume B: Formal Models and Semantics. Elsevier and MIT Press. 6