666280CNRXXX10.1177/1054773816666280Clinical Nursing ResearchKnapp research-article2016 Guest Editorial Why Is the One-Group Pretest Posttest Design Still Used? Clinical Nursing Research 2016, Vol. 25(5) 467 472 The Author(s) 2016 Reprints and permissions: sagepub.com/journalspermissions.nav DOI: 10.1177/1054773816666280 cnr.sagepub.com Thomas R. Knapp, EdD, FAAN 1,2 Abstract The one-group pretest posttest pre-experimental design has been widely criticized, yet continues to be used in some clinical nursing research studies. This editorial explains what is wrong with the design, suggests reasons for its continued use, and gives some recommendations regarding what can be done about it. Keywords experimental design, causality, graduate education More than 50 years ago, Donald Campbell and Julian Stanley (1963) carefully explained why the one-group pretest posttest pre-experimental design (Y 1 X Y 2 ) was a very poor choice for testing the effect of an independent variable X on a dependent variable Y that is measured at Time 1 and Time 2. The reasons ranged from obvious matters such as the absence of a control group to technical considerations such as regression toward the mean. Yet that design continues to be used in clinical nursing research. After briefly summarizing some of the things that Campbell and Stanley (hereinafter referred to as C&S) said were wrong with this design, I will try 1 University of Rochester, NY, USA 2 The Ohio State University, Columbus, USA Corresponding Author: Thomas R. Knapp, Professor Emeritus of Education, University of Rochester, 145 Rockingham St., Rochester, NY 14620, USA. Email: tknapp5@juno.com
468 Clinical Nursing Research 25(5) to suggest some reasons for its survival in the clinical nursing research literature. But first I would like to emphasize at the outset one other weakness that is not in the C&S list. There is no basis for any sort of helpful inference from it, statistical or scientific, even if the sample used in the study has been randomly selected (which is rarely the case). Suppose there is a statistically significant difference (change) between the pretest and posttest results. What can you say? You can t say that there is a statistically significant effect of X on Y, because there is no random assignment to experimental and control groups (there is no control group). The difference is what it is, and that s that. If the sample is random, you could construct a confidence interval around the difference, but that wouldn t help in inferring anything about the effect of X. Threats to Its Internal Validity and External Validity C&S use the term threats to indicate uncontrolled matters that could affect Y instead of, or in addition to, X. They also use the term internal validity as synonymous with causal interpretability. Some of the threats to the internal validity of the one-group pretest posttest design are as follows: 1. History: History is a threat in the sense that while the participants are being exposed to X, there could be some other event occurring at the same time that could be the cause of the change in Y. 2. Maturation: If there is a long time between T 1 for Y 1 and T 2 for Y 2, the participants have grown older and possibly less healthy or more healthy, which might account for any change in Y. 3. Testing: If the posttest is a cognitive test that is the same test as the pretest, the questions might be familiar, and therefore now easier, so if the scores improve from pretest to posttest, it could be a practice effect rather than a treatment effect. 4. Instrumentation: Instrumentation is a threat regarding the scoring or rating of the pre-experimental measurements and the post-experimental measurements. If the posttest performance is evaluated by someone different from, and a more stringent evaluator than, the person who scored the pretest, the posttest measurements could be lower even if there were no treatment effect. The same threat could be posed by a mechanical or electrical instrument s reduction in precision from Time 1 to Time 2. 5. Statistical Regression: If the participants are well below average in the population of interest, and have been selected on that basis, they must perform better, on the average, on the posttest than on the pretest
Knapp 469 as an artifact of the elliptical shape of the scatter diagram for a positive relationship between pretest and posttest scores (which is usually the case). They have no other way to go than up, so to speak. This threat also is a problem for participants selected because they are well above average. They have nowhere to go but down. C&S use the term external validity as synonymous with generalizability. Two of such threats for the one-group pretest posttest design are described as follows: 1. Interaction of Testing and X: This threat refers to the possibility of participants being sensitized to the treatment by the pretest, so that the generalizability of the findings might only extend to pretested populations. 2. Interaction of Selection and X: This pertains to the unfortunate fact that experiments are rarely carried out on random samples of participants, thus making generalizations to other potential participants difficult if not impossible. The measuring instrument(s) used is(are) also rarely randomly sampled from a set of equally appropriate instruments, thereby further restricting generalizability. Some Possible Reasons for Its Survival Perhaps some researchers in disciplines such as nursing, medicine, and public health have not heard about the C&S cautions. I personally doubt it, for three reasons: (a) I am familiar enough with graduate curricula in nursing to know that C&S has indeed been used in courses in research design in many schools and colleges of nursing; (b) discussions (sometimes dangerously close to plagiarisms) of the C&S designs appear in several textbooks in the health sciences; and (c) the Google prompt campbell stanley experimental design (without the quotation marks) returns about 180,000 entries, not all of which are to social scientific research. The prompt campbell stanley onegroup pretest posttest design (again without the quotation marks) returns about 25,000 entries, several of which are to clinical research studies. Perhaps some researchers find random assignment to treatment and control groups difficult to carry out, for practical and/or ethical reasons. As an obvious example, one cannot randomly assign elementary schoolchildren to a reading program or to no program to study the change in the understanding of directions for taking low-dose aspirin. Perhaps some researchers are subject to pressures from colleagues and/or superiors to give the experimental treatment to everybody. The Sinclair
470 Clinical Nursing Research 25(5) Lewis (1925) novel Arrowsmith provides a good example of that with respect to an untried serum. The researcher who might otherwise argue for a better design might not be willing to spend the political capital necessary to overturn an original decision to go with the Y 1 X Y 2 approach. Perhaps some researchers might want to conserve personal effort by using the one-group design. Having a control group to contend with is much more work. Perhaps some researchers don t care whether or not the difference is attributable to X; all they might care about is whether things get better or worse between pretest and posttest, not why. Perhaps some researchers use the design in a negative way. If X is hoped to produce an increase in Y from pretest to posttest, and if in fact a decrease is observed, any hypothesis regarding a positive change would not be supported by the data, no matter how big or how small that decrease is. Perhaps some researchers consider the use of this design as a pilot effort (for a main study that might or might not follow). Perhaps some researchers feel that the time between pretest and posttest is often so short (a measure of Y, a brief exposure to X, and another measure of Y) that if there s any change in Y, it must be X that did it. Perhaps some researchers not only don t care about causality but are interested primarily in individual changes (John lost 5 points, Mary gained 10 points, etc.) even if the gains and the losses cancel each other out. The raw data for a Y 1 X Y 2 design show that nicely. Perhaps some researchers are so eager to get a paper published that they ll try almost anything, including the use of a weak design. Can the Design Be Salvaged? There have been several suggestions for improving upon the one-group pretest posttest design to make it more defensible as a serious approach to experimentation. One suggestion (Glass, 1965) was to use a complicated design that is capable of separating maturation and testing effects from the treatment effect. Another approach (Johnson, 1986) was to randomly assign participants to the various measurement occasions surrounding the treatment (e.g., pretest, posttest, post-posttest) and compare the findings for those subgroups within the one-group context. A third variation was to incorporate a double pretest before implementing the treatment. If the difference between either pretest and posttest is much greater than the difference between the two pretests, additional support is provided for the effect of X. Marin, Marin, Perez-Stable, Otero-Sabogal, and Sabogal (1990) actually used that design in their study of the effect of an anti-smoking campaign.
Knapp 471 But all of those approaches pale in comparison with having two groups (one experimental, one control) to which participants are randomly assigned, with each group pretested and posttested, that is, R Y 1 X Y 2 which is C&S Design 4. R Y1 X Y2 R Y Y 3 4 If you have the random assignment you can even do without the pretest, using their Design 6, R X Y 1, which they prefer to Design 4 in any event because it has R X Y 1 R Y 2 greater generalizability. Both are equally strong for assessing causality. What Can Be Done to Minimize Its Use? It s all well and good to complain about the misuse or overuse of the onegroup pretest posttest design. It s much more difficult to try to fix the problem. I have only the following three relatively mild recommendations: 1. Every graduate program (master s and doctoral) in nursing should include a required course in the design of experiments in which the C&S chapter is one of the adopted readings, with particular emphasis placed upon the section dealing with the one-group pretest posttest design. (C&S use the notation O 1 X O 2 rather than the notation Y 1 X Y 2, where the O stands for observation on the dependent variable Y; but in my opinion Y 1 X Y 2 is much more straightforward.) 2. Thesis and dissertation committees should take a much stronger stance against the one-group design. The best people to insist upon that are those who serve as statistical consultants in nursing colleges and departments. 3. Editors of, and reviewers for, nursing research journals should automatically reject a manuscript in which this design plays the principal role. A Historical Note Regarding the C&S Work As indicated in the References section that follows, experimental and quasi-experimental designs for research on teaching first appeared as a chapter in a set of papers devoted to educational research. It received such
472 Clinical Nursing Research 25(5) acclaim that it was reprinted (essentially intact) as a paperback book published in 1966, but without the words on teaching (undoubtedly in the hope of attracting a larger market, which it indeed did). It has gone in and out of print many times. Declaration of Conflicting Interests The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Funding The author(s) received no financial support for the research, authorship, and/or publication of this article. References Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research on teaching. In N. L. Gage (Ed.), Handbook of research on teaching (pp. 171-246). Chicago, IL: Rand McNally. Reprinted in 1966 under the title Experimental and quasi-experimental designs for research. Glass, G. V. (1965). Evaluating testing, maturation, and treatment effects in a pretestposttest quasi-experimental design. American Educational Research Journal, 2, 83-87. Johnson, C. W. (1986). A more rigorous quasi-experimental alternative to the onegroup pretest-posttest design. Educational and Psychological Measurement, 46, 585-591. Lewis, S. (1925). Arrowsmith. New York, NY: Harcourt Brace. Marin, B. V., Marin, G., Perez-Stable, E. J., Otero-Sabogal, R., & Sabogal, F. (1990). Cultural differences in attitudes toward smoking: Developing messages using the theory of reasoned action. Journal of Applied Social Psychology, 20, 478-493. Author Biography Thomas R. Knapp is Professor Emeritus of Education at the University of Rochester and Professor Emeritus of Nursing at The Ohio State University. His specializations are statistics, measurement, and research design.