THE MEASUREMENT OF READING SPEED AND THE OBLIGATION TO GENERALIZE TO A POPULATION OF READING MATERIALS

Similar documents
Evidence for Reliability, Validity and Learning Effectiveness

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

CEFR Overall Illustrative English Proficiency Scales

How to Judge the Quality of an Objective Classroom Test

School Size and the Quality of Teaching and Learning

Proficiency Illusion

Running head: DELAY AND PROSPECTIVE MEMORY 1

A Study of Metacognitive Awareness of Non-English Majors in L2 Listening

Process Evaluations for a Multisite Nutrition Education Program

Karla Brooks Baehr, Ed.D. Senior Advisor and Consultant The District Management Council

PREP S SPEAKER LISTENER TECHNIQUE COACHING MANUAL

ECON 365 fall papers GEOS 330Z fall papers HUMN 300Z fall papers PHIL 370 fall papers

DIDACTIC MODEL BRIDGING A CONCEPT WITH PHENOMENA

Missouri Mathematics Grade-Level Expectations

Success Factors for Creativity Workshops in RE

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Improving Conceptual Understanding of Physics with Technology

WHY SOLVE PROBLEMS? INTERVIEWING COLLEGE FACULTY ABOUT THE LEARNING AND TEACHING OF PROBLEM SOLVING

SCHEMA ACTIVATION IN MEMORY FOR PROSE 1. Michael A. R. Townsend State University of New York at Albany

Creating Meaningful Assessments for Professional Development Education in Software Architecture

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Norms How were TerraNova 3 norms derived? Does the norm sample reflect my diverse school population?

Aviation English Training: How long Does it Take?

learning collegiate assessment]

Reading Horizons. Organizing Reading Material into Thought Units to Enhance Comprehension. Kathleen C. Stevens APRIL 1983

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

Principal vacancies and appointments

Probability estimates in a scenario tree

The number of involuntary part-time workers,

West s Paralegal Today The Legal Team at Work Third Edition

Interpreting ACER Test Results

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Extending Place Value with Whole Numbers to 1,000,000

Calculators in a Middle School Mathematics Classroom: Helpful or Harmful?

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

TRAITS OF GOOD WRITING

The Task. A Guide for Tutors in the Rutgers Writing Centers Written and edited by Michael Goeller and Karen Kalteissen

An Empirical and Computational Test of Linguistic Relativity

AP Statistics Summer Assignment 17-18

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Introducing the New Iowa Assessments Mathematics Levels 12 14

Intermediate Algebra

On-the-Fly Customization of Automated Essay Scoring

Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown

The Impact of Positive and Negative Feedback in Insight Problem Solving

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

BENCHMARK TREND COMPARISON REPORT:

Pre-Algebra A. Syllabus. Course Overview. Course Goals. General Skills. Credit Value

Grade 6: Correlated to AGS Basic Math Skills

While you are waiting... socrative.com, room number SIMLANG2016

PHYSICS 40S - COURSE OUTLINE AND REQUIREMENTS Welcome to Physics 40S for !! Mr. Bryan Doiron

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report

EEllEEllEEEEll EE//EEEEI/EEEE EEEEEEEE / / IE / IE

1 3-5 = Subtraction - a binary operation

Sector Differences in Student Learning: Differences in Achievement Gains Across School Years and During the Summer

Rubric for Scoring English 1 Unit 1, Rhetorical Analysis

MADERA SCIENCE FAIR 2013 Grades 4 th 6 th Project due date: Tuesday, April 9, 8:15 am Parent Night: Tuesday, April 16, 6:00 8:00 pm

TU-E2090 Research Assignment in Operations Management and Services

Physics 270: Experimental Physics

Lecture 2: Quantifiers and Approximation

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

Algebra 2 Saxon Download or Read Online ebook algebra 2 saxon in PDF Format From The Best User Guide Database

The Evolution of Random Phenomena

Create Quiz Questions

Richardson, J., The Next Step in Guided Writing, Ohio Literacy Conference, 2010

EDUC-E328 Science in the Elementary Schools

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

A Diverse Student Body

South Carolina College- and Career-Ready Standards for Mathematics. Standards Unpacking Documents Grade 5

ReFresh: Retaining First Year Engineering Students and Retraining for Success

WASHINGTON Does your school know where you are? In class? On the bus? Paying for lunch in the cafeteria?

Showing synthesis in your writing and starting to develop your own voice

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

E LEARNING TOOLS IN DISTANCE AND STATIONARY EDUCATION

Guidelines for Writing an Internship Report

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

PSYCHOLOGY 353: SOCIAL AND PERSONALITY DEVELOPMENT IN CHILDREN SPRING 2006

Statewide Framework Document for:

SOCIAL SCIENCE RESEARCH COUNCIL DISSERTATION PROPOSAL DEVELOPMENT FELLOWSHIP SPRING 2008 WORKSHOP AGENDA

Alignment of Australian Curriculum Year Levels to the Scope and Sequence of Math-U-See Program

Save Children. Can Math Recovery. before They Fail?

Process to Identify Minimum Passing Criteria and Objective Evidence in Support of ABET EC2000 Criteria Fulfillment

Evidence-based Practice: A Workshop for Training Adult Basic Education, TANF and One Stop Practitioners and Program Administrators

Master Program: Strategic Management. Master s Thesis a roadmap to success. Innsbruck University School of Management

Using Proportions to Solve Percentage Problems I

Critical Thinking in Everyday Life: 9 Strategies

What the National Curriculum requires in reading at Y5 and Y6

Enhancing Students Understanding Statistics with TinkerPlots: Problem-Based Learning Approach

Eastbury Primary School

The Timer-Game: A Variable Interval Contingency for the Management of Out-of-Seat Behavior

Dimensions of Classroom Behavior Measured by Two Systems of Interaction Analysis

Developing a College-level Speed and Accuracy Test

Contents. Foreword... 5

Assessing Functional Relations: The Utility of the Standard Celeration Chart

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Further, Robert W. Lissitz, University of Maryland Huynh Huynh, University of South Carolina ADEQUATE YEARLY PROGRESS

Transcription:

48 Journal of Reading Behavior 1971-72 Vol. 4, No. 3, Summer THE MEASUREMENT OF READING SPEED AND THE OBLIGATION TO GENERALIZE TO A POPULATION OF READING MATERIALS Gerald R. Miller and Edmund B. Coleman* Abstract Although most studies of reading behavior have little scientific value if their conclusions have to be restricted to the specific materials that were used in the experiment, reading researchers have seldom used designs that would enable them to generalize beyond the particular letters, words, sentences, and so on they chanced to use. Data from an experiment by Carver are used to show that it is therefore likely that many experiments could not be replicated if different samples of materials were drawn. Evidence is also given that reading speed, if measured in a fine-grained unit such as letters per second, does not increase as passages become more difficult, but is a constant across a range that extends from first-grade texts to technical prose. Carver measured reading speed in letters per second 1 for four passages and interpreted his results as Evidence for the Invalidity of the Miller-Coleman Readability Scale. As he noted, the authors of the present paper did not call the scale he criticizes a readability scale. The title of our article was A Set of Thirty-six Passages Calibrated for Complexity, and only two sentences suggested readability as one among several uses for the passages. Other researchers began using them as a readability scale and gave them that label. At any rate, an argument as to whether the scale is valid is profitless; a more sensible question is whether it is useful and that will be determined, not by argument, but by putting it to good use. Two other points in Carver's article are more interesting and justify study. First, he is to be commended for criticizing the word as a unit in measuring reading speed. We will show that future research on reading speed should be conducted in letters per second and that many conclusions of previous research must be dismissed as artifacts of an inadequate measure. A second conclusion also justifies further study and provides the alternate title for the present article. Carver offers his Figure 2 as * Dr. Miller and Dr. Coleman are on the Faculty of The University of Texas at El Paso. 1 The casual reader of Carver's article may wonder just where he used letters per second. The term Carver actually used was "equivalent words per minute." Carver calculates equivalent words by dividing 6 into the total number of letters and spaces (including spaces taken up by punctuation marks) in the passage. Except for some light difference introduced by his including punctuation marks, "equivalent words" is simply number of letters divided by 5.

a reliable picture of the relation between letters read per second and difficulty. "Notice that each group had roughly equal rates for the first three paragraphs with a considerable drop in rate for the most difficult paragraph. The roughly parallel nature of the functions for these two experimental groups suggests that the nature of this function is reliable." This conclusion affords an excellent example of the pitfalls in generalizing from an inadequate language sample. The major point of the present article is that in most experiments in reading, the researcher is under strong obligation to show that his conclusions can be generalized to a population of reading materials. A researcher draws a representative sample of Ss and performs significance tests to show that his conclusions could be replicated using a different sample of Ssj that his results were not determined by idiosyncracies of the particular Ss he chanced to use. In an experiment in reading, it is equally important to show that conclusions can be generalized beyond the particular letters, words, sentences, paragraphs, books, etc. the researcher happened to use. Experimental designs and significance tests that permit such generalization have been discussed elsewhere (Coleman, 1964). If Carver's design had permitted such a test, even an approximate one, he would have seen that his conclusions were restricted to four passages. We will show that, as he says, the nature of the function in his Figure 2 is reliable, but only if one uses the same for passages he used. Conclusions that are restricted to four passages are sometimes as in Carver's case quite deceptive, and they are always severely limited in value. They are as limited as conclusions that are restricted to a particular four individuals. In fact, in reaching his conclusion that a person's speed declines for exceedingly difficult reading material, Carver did not even have the guidance of a sample of N = 4; he was generalizing from a single case. It was only when his Ss read one particular passage that reading speed in letters per second dropped. It will be instructive to repeat Carver's experiment using a larger language sample. The conclusions will certainly be more valuable, and it is not unlikely that they will be somewhat different. Sticht (1971) repeated Carver's experiment using a larger language population, all 36 passages. He measured speed of oral reading in syllables per second and he did not find any point where reading speed declined. Experiment 1 will check Sticht's findings to see if it holds true in silent reading and when measured in letters per second. If it is verified, it will add weight to the criticism of research on reading speed that uses words as the unit. The data will also show why Carver found a sudden decline in letters read per second, and why this find- 49

50 ing and many others in the field of reading must be attributed to nothing more important than idiosyncracies of the particular materials that chanced to be used to represent the language population. Experiment 1 Experiment 1 measured reading speed of a scale of difficulty that extended from first-grade material to very difficult technical prose. Each of 83 Ss read a different nine-passage scale of difficulty. Speed was measured in letters per second. Subjects. The Ss were 83 undergraduates enrolled in Freshman statistics classes. Materials. The materials were the 36 passages scaled by Miller and Coleman (1967), by Aquino (1969), and by Sticht (1971). A set of 36 passages were rank-ordered from easy to difficult according to total cloze score as reported by Aquino, and then a nine-passage scale of difficulty was constructed by randomly drawing a passage from the first four, a second passage from the second four, a third from the third, and so on. Three other nine-passage scales were constructed from the 36 passages. Eighty-three scales were constructed in this manner, providing a different scale of difficulty for each S. Each scale was shuffled to randomize order of presentation and stapled into a booklet. Procedure. The Si were tested in groups and were told, "Read each passage carefully enough to take a test on it. Do not turn the page and begin reading until I give the signal. When you have finished, stop, read your time from the timer, and record it on the passage." The signal to begin was then given and the timer started. The procedure was repeated for the other eight passages. After the class finished all nine passages, they were told that the instruction to prepare themselves for a test had been given only to insure careful reading and that no test would be given. Results In Figure 1, letters read per second are plotted for each of the 36 passages. The passages are arranged from easy to difficult based on the sum of their ranks according to five measures of complexity, specifically, all those except IG (information gain) that were reported by Aquino (1969). This rank-ordering is slightly more reliable than the one she published. Its major discrepancy with hers concerns Passage 8. We ranked this passage as No. 14 (When I was a little girl...) Two passages differed by three ranks, one by two ranks, twelve by one rank, and the other nineteen agreed exactly with hers. The conclusions are obvious from visual inspection. When reading speed is measured in letters per second, it is constant across a range of difficulty that stretches from first-grade material to the most difficult technical prose.

51 - -o Average Four-Point Function g t Carver's Four-Point Function O = Mean Reading Speed for a Single Passage O = Reading Spcnls for the Same Four Passages L'snl by Carv 9 18 27 Passage Number Figure 1. Letters Per Second Plotted Against Passage Difficulty. The Horizontal Dashed Line, Being An Average, Is a Close Approximation to the Functional Relation Between Speed and Difficulty. Generalizing to a Language Population. More formal analysis will serve to illustrate the danger of generalizing from an inadequate language sample. Recall that Carver, who used Passages 10, 25, 33 (Aquino's No. 32), and 36, concluded that although letters read per second is relatively constant throughout most of the range of difficulty, it drops sharply for exceedingly difficult material. His reading speeds are also plotted in Figure 1. Our speeds for the same four paragraphs are encircled. Note that if we had restricted ourselves to Carver's sample of passages, we too might have concluded that letters per second drop for the most difficult material. Sticht (1971) also reports that his reading speeds for these four passages were similar to Carver's. As Carver says, the functional relation in his Figure 2 is reliable; it is reliable, however, only in a limited and grossly deceptive sense. It is reliable only if research on reading speed is restricted to the same four passages he chanced to use. The agreement among Sticht, Carver, and ourselves suggests that there are reliable differences in the speed of reading different passages. This difference can be tested for significance by considering each of the 36 passages to be one category in the independent variable. Each S read nine passages, but since each S read a different nine, it is not possible to extract the variance due to 5s. The data, therefore, were analyzed as a simple randomized design. This has the effect of pooling the subject variance with the interaction variance in the error term. Such pooling increases the probability of a Type II error, but with our large sample, this was not a problem. The difference between the means of the 36 passages accounted for 9.16

52 percent of the total variance, which is significant beyond.01 (F = 2.05; df = 35, 710). This means that there are idiosyncracies within certain of the passages that cause them to be read slower or faster than other passages. For instance, Carver's most difficult passage, the one that led. him to conclude that there is a drop in reading speed, contains words such as antebrachials, algesiometer, and thermoesthesiometer. It is not unreasonable to assume that S's reading speed slowed as he laboriously sounded out these unfamiliar words. The major point of the present article is that in Carver's design the idiosyncracies of passages must be conceptualized as the chance variance of a sample; each of his four passages must be conceptualized as representing a population of passages at that level of difficulty. It follows that a significance test must be performed that treats these idiosyncracies as sampling variance. If it is an F-test, for example, its error term must include this variance. Perhaps a different treatment of the data can highlight the dangers of failing to perform such a test. 24 20 16 1 10 19 28 2 11 20 29 3 12 21 30 24 20 16 4 13 22 31 5 14 23 32 6 15 24 33 24 20 16 - <X --O b- o- i 7 16 25 34 r / «f 8 17 CK I 26 ^- \ 35 o 1 i "A \ 1 9 18 27 1 36 Passage Number Figure 2. Letters Per Second Plotted Against Passage Difficulty. Each of the Nine Scales Shows a Different Functional Relation Between Speed and Difficulty.

Carver constructed his four-passage scale by selecting only four of the 36 passages. There are 58,905 possible four-passage scales he could have constructed from the 36. In Figure 2, letters per second is plotted for nine of the possibilities. The nine were not selected by picking and choosing only scales that illustrate our point; they were selected systematically by choosing passages 1, 10, 19, and 28 for the first scale, passages 2, 11, 20, and 29 for the second, passages 3, 12, 21, and 30 for the third, and so on. Study Figure 2 and note that if Carver had chanced to use passages 8, 17, 26, and 35, he would have concluded that letters read per second increase as passages become more difficult.. Passages 9, 18, 27, and 36 would have led him to the opposite conclusion. More generally, there is no reason to prefer the four-passage combination that Carver actually used to any of the combinations illustrated in Figure 2, or to any of thousands of other combinations. The variety of shapes in Figure 2 shows with an eloquence louder than words that a four-point function such as Carver's can be embarrassingly deceptive. A function with so few points can always be deceptive unless the experiment is designed so that the function can be generalized across a population of scales as well as the customary population of Ss. One experimental design that would have enabled Carver to generalize his four-point function across both populations simply confounds the two sampling variables (Coleman, 1964, p. 225). That is, he would have had each of his 5s read a different four-passage scale. The dotted line in Figure 1 shows the four point function he would have probably obtained. The first point represents mean reading speed for the first nine passages, the second point represents the second nine passages, and so on. One can conceptualize any nine passages as a sample representing a population of passages at that mean level of difficulty. The differences between the four means account for only 0.58% of the variance in reading speed and are, of course, not significant, not even for 83 Passage-Subject combinations. (F = 1.45; df = 3, 742; p >.05) In brief, this more adequate design would have led Carver to conclude that reading speed in letters per second is a constant across the entire range of difficulty. The Measurement of Reading Speed. In Figure 3, reading speed is plotted in words per second and also in syllables per second and in morphemes per second. Note that although the number of words read per second declines as passages become more difficult, the number of syllables and morphemes is constant across the entire range of difficulty. The 36 passages which cover the full range of difficulty of English prose are all 150 words in length, but their number of letters progress from 560 to 872, their number of syllables from 173 to 299, and their number of morphemes from 165 to 253. The correlations of these variables with cloze score is.90,.90, and.88. In brief, a 53

54 Syllables Per Second Morphemes Per Second Words Per Second 9-12 17-20 Passage Number Figure 3. Reading Speed Plotted Against Passage Difficulty for Words, Morphemes, and Syllables. Each Unit Gives a Slightly Different Functional Relation Between Speed and Difficulty. hundred words of difficult prose is difficult in large part simply because it contains more letters, syllables, and morphemes to be read. The correlation between the number of letters in a passage and its syllables and morphemes is.97 and.98 respectively. Thus, since letters can be counted more reliably, letters per second is probably the most useful measure of reading speed. Experiment 2 It would seem likely that the shape of Figure 1 could be changed by different instructions. We experimented with several and found this to be true although one could argue that our instructions had to be sufficiently explicit to change the process from reading to skimming or to studying. At one extreme, if we told S that he was to continue reading a passage until it was completely understood, some Ss spent five to ten times as much time per letter on the more difficult passages, but it is, of course, an unusual use of reading speed to say that a passage, parts of which are read 10 times, is read at a slower speed than one that is read through only once. It is worthwhile to report results from less extreme instructions. Six Ss read all 36 passages under the following instructions: "Read these passages as though they were something you were interested«in and wanted to understand, but not as though you were going to be tested on the contents. You can spend as much time on each one as you like, reading it or parts of it as many times as you like. Record the total amount of time that you spent on each passage."

55 I,00 Passage Number Figure 4. Reading Time Per Passage Plotted Against Difficulty for Six Different Readers. Circles Give Times for Four Ss Who Did Not Increase Their Time for Difficult Passages. The Joined Lines Give Times for Two Ss Who Did, Apparently By Reading Difficult Passages Two, Three, or Four Times. Individual times for each of the six Ss are plotted in Figure 4. Note that even these instructions had little effect on most of the Ss. Apparently, instructions have to be extreme, extensive, or explicit before Ss will adapt their reading speed (or more accurately, their reading time) to the difficulty of the passage. Discussion This article raises serious questions about a considerable proportion of all research that has been published in reading. Carver's overlooking his obligation to generalize to a language population is not an isolated case. Of the thousands of experiments published in reading, only a handful have been analyzed (or could have been) so that the reader could tell whether they could be replicated with a different set of materials. A thoughtful perusal of the many curves in Figure 2 suggests that some of the conflicting results that abound in the field could be traced back to this oversight in experimental design. Apparently, most researchers assume that the significance tests which indicate that a reading experiment can be replicated with a different sample of Ss also indicate that it could be replicated with a different sample of reading materials. This is not so. It is important that the readers and editors of journals in reading familiarize themselves with designs and significance tests that predict replication with different materials. The article mentioned earlier (Coleman, 1964) discusses the problem in a simplified fashion and also cites several more mathematically sophisticated discussions. The article raises even more serious questions about much of the previous research in reading speed, most of which has used the word as the unit. Figures 1 and 3 show clearly that when reading speed is measured in a more finely grained unit such as letters, syllables, or

56 morphemes, it is constant across a range of difficulty that extends from first-grade tests to the most difficult technical prose in the language. This suggests that it would pay to reexamine much of the research on reading speed, certainly any that reported a change in speed as a function of difficulty of material. It is likely that many such results can be dismissed as due to nothing more important than that the difficult prose contained a larger volume of material to be read. The question of whether the set of 36 passages is valid or invalid as a readability scale will not be settled by argument, but by whether or not the scale can be put to good use. It should not, however, be called the Miller-Coleman Readability Scale. In the first place, the 36 passages have also been scaled by others along other dimensions. Aquino (1969) scaled them for word-for-word recall and also by having judges sort and sub-sort them according to judged difficulty. Sticht (1971) used 10 judges who scaled them using a direct magnitude estimate of difficulty. It would be just as appropriate to call the scale the Aquino-Sticht Scale. In the second place, the 36 passages can be rank-ordered into not one, but at least 126 scales. In addition to the three measures of Aquino and Sticht, the passages have been scaled for three sorts of cloze scores and for Information Gain (IG). A researcher could rankorder them according to any one of seven indices or according to any of 126 possible combinations that fit his own criteria of readability. But of all possible criteria, Figure 1 shows that Carver's suggestion a measure based on reading speed is least likely to be useful. A measure that fails to discriminate between first-grade material and almost incomphehensibly difficult prose is surely of limited value in defining readability. References AQUINO, M. The validity of the Miller-Coleman readability scale. Reading Research Quarterly, 1969, 4, 342-357. CARVER, R. P. Evidence for the invalidity of the Miller-Coleman readability scale. Journal of Reading Behavior, 1972, 4 (3). COLEMAN, E. B. Generalizing to a language population. Psychological Reports, 1964, 16, 219-226. MILLER, G. R. and COLEMAN, E. B. A set of thirty-six passages calibrated for complexity. Journal of Verbal Learning and Verbal Behavior, 1967, 6, 851-854. STICHT, T. G. Learning by listening in relation to aptitude, reading, and ratecontrolled speech: additional studies. Technical Report 71-5 Human Resources Research Organization: Alexandria, Virginia, 1971.