Using Curriculum-Based Measurement to Predict Performance on State Assessments in Reading

School Psychology Review, 2004, Volume 33, No. 2, pp. 193-203 Using Curriculum-Based Measurement to Predict Performance on State Assessments in Reading Margaret T. McGlinchey Kalamazoo Regional Educational Service Agency Michael D. Hixson Central Michigan University Abstract. The present study investigated the correlation and predictive value of curriculum-based measurement (CBM) procedures for performance on the Michigan Educational Assessment Program s (MEAP) fourth grade reading assessment. A 1-minute oral reading sample was used to predict MEAP performance 2 weeks later. A positive correlation was established between the two measures. The positive and negative predictive power of the reading sample was higher than the base rates of failing and passing the MEAP. This relationship has been demonstrated across 1,362 students during 8 years of MEAP testing. The results support the use of CBM for monitoring reading progress, and establishing which students are at risk for low reading skills and failing state tests. Increasing pressures from new initiatives such as charter schools, vouchers, and private schools have established a highly competitive market in which accountability and assessment have become the catch phrase, and the promise. Schools are faced with the challenge of developing assessment practices that are both meaningful to parents and useful as measures of true progress. A good assessment should not only do this, but it should also answer important questions about instruction. Namely, is it working? Does it need to be adjusted? Did the adjustment work? If our assessment only asks the question Did it work? it is too late. As a result, the assessment must be sensitive to the effects of instruction and be meaningful to teachers. Curriculum-based measurement (CBM) is a set of specific measurement methods for assessing student progress over time and for identifying students in need of additional instructional support and/or further diagnostic testing (Howell & Nolet, 1999). It provides the framework from which a problem-solving approach to instruction can be established. CBM has been extensively researched (Marston & Magnusson, 1985; Shinn, 1989; Shinn & Good, 1993) and found to be a reliable and valid indicator of student skill level (Deno, Mirkin, & Chiang, 1982; Marston, 1989; Shinn, Good, Knutson, Tilly, & Collins, 1992). In one of the initial CBM validity studies conducted in reading, Deno, Mirkin, and Chiang (1982) examined five different mea- Author Note. We would like to thank Galen Alessi and Ruth Ervin, who reviewed this manuscript. We would also like to thank all of the staff at Kalamazoo Public Schools and students in the school psychology program at Western Michigan University who made this research possible. The research reported does not necessarily reflect the views of the Kalamazoo Public Schools. Correspondence concerning this article should be addressed to Margaret T. McGlinchey, Instructional Center, Kalamazoo Regional Educational Service Agency, 1819 East Milham Rd., Kalamazoo, MI 49002-3035; E-mail: mmcglinc@kresanet.org Copyright 2004 by the National Association of School Psychologists, ISSN 0279-6015 193

School Psychology Review, 2004, Volume 33, No. 2 sures that could be used on a frequent basis to monitor reading progress. Passage reading from the basal reader was one of the progressmonitoring measures studied. These measures were correlated with criterion tests of reading and published norm-referenced tests. Deno and colleagues found that the 1-minute reading probe from the child s basal reader was a valid measure of reading skill. Correlation coefficients ranged from.73 to.91, with most coefficients above.80. A number of other criterionrelated validity studies have been conducted since this study. Marston (1989) summarized the results of these studies and found that correlation coefficients between oral reading rates and different measures of global reading skills ranged from.63 to.90, again with most coefficients falling above.80. Other criterion-related validity studies have been conducted comparing reading fluency measures with different basal reading criterion-referenced mastery tests, and teachers holistic ratings of reading ability (Marston & Deno, 1982 as cited in Marston, 1989). These studies provided additional evidence for the validity of curriculum-based reading measures as estimates of global reading proficiency in the elementary grades. More recent research in the area of CBM validity (see Good & Jefferson, 1998 for a review) continues to provide evidence that oral reading rate is a valid measure of reading ability for elementary students. Construct validity studies have provided additional support of oral reading rate for discriminant and treatment validity (Marston, 1989; Shinn & Habedank, 1992). Research has also demonstrated CBM s usefulness in identifying children for special education (Marston, Mirkin, & Deno, 1984; Shinn, 1986; Shinn & Habedank, 1992), establishing and monitoring progress toward IEP goals (Fuchs & Shinn, 1989), monitoring progress in remedial programs (Shinn, 1989), and designing instruction (Shapiro, 1996). In addition to being an alternative to traditional test and place models (Deno, 1985), more recently CBM has been applied in general education settings as a method to document academic gains in basic skills (Fuchs & Fuchs, 1992; Marston, Deno, Kim, Diment, & Rogers, 1995). Using CBM to design instructional programs has resulted in greater achievement in reading, spelling, and math (Fuchs, 1993; Fuchs & Fuchs, 1986; Fuchs, Fuchs, Hamlett, & Stecker, 1990; Shinn, 1995). Recent webbased applications of CBM, such as the Dynamic Indicators of Basic Early Literacy Skills (DIBELS; Official DIBELS Homepage, n.d.) allow for the development of schoolwide models of reading intervention for all students (Simmons et al., 2002). Current federal initiatives such as No Child Left Behind and Reading First have increased demands for early identification and intervention. These initiatives have also increased demands for state and district accountability. Most states use some type of comprehensive state reading assessment in third or fourth grade to evaluate school districts. However, these assessments provide too little information, too late. If CBM can predict performance on state assessments, schools may have a tool to monitor progress of all children toward this long range goal. To date, only one study has assessed the utility of CBM oral reading rates to predict performance on a state-mandated fourth grade reading test (Stage & Jacobsen, 2001). One hundred seventy-three students were administered CBM oral reading probes in the fall, winter, and spring of fourth grade. These scores were then compared to students performance on the spring administration of the Washington Assessment of Student Learning (WASL) reading assessment. The results of this study indicated that the reading rate scores improved the prediction of WASL performance above that based on the base rates of students passing and failing the WASL. Purpose of the Present Study The findings of Stage and Jacobsen (2001) were important because they provided evidence that CBM could improve the prediction of state test performance for school districts, which would then permit early identification and intervention. However, because only one state assessment was studied for 1 year, questions of generalization can be raised. The present study is a replication of Stage and Jacobsen (2001), with a different state fourth 194

Curriculum-Based Measurement and State Assessments grade reading test across 8 years, a much larger sample of students, and a more diverse student population. This study investigated the predictive validity of a CBM reading probe in relation to performance on the Michigan Educational Assessment Program s fourth grade reading test. The state test was selected because it is the measure connected to the high stakes for school districts, and it is viewed as a comprehensive measure of reading skill. Although there is substantial evidence of CBM validity, a relationship to state assessments may provide school districts the practical incentive to adopt this sound researchbased practice. If CBM was sensitive in this regard, it could be used to monitor progress toward, and predict future performance on, the state assessment, and assist in establishing appropriate benchmarks. Once these benchmarks are established at each grade level, ongoing progress monitoring for struggling students may assist the teacher in adjusting instruction as needed to prepare the student for the eventual state test. Participants Method The study took place at one elementary school in an urban school district for 7 of the 8 years (1994 2001), and in Year 4 (1997 1998), across the whole school district s fourth grade. The school district had an enrollment of approximately 11,000 students, including approximately 6,200 elementary students. There were 14 elementary buildings with fourth grade students. Across the district, the non-caucasian population was 52%, and free and reduced lunch status (an indicator of socioeconomic need) was 60%. The school that was involved in the study for all 8 years was a large K-6 building, with a student enrollment between 450-520 students each year. Fourth grade general and special education students participated during each of the school years (range = 55-139 students per year). Across the 8 years, a total of 1,362 students participated in the study. Because of family moves, absences, and test waivers some students did not have both a reading rate and a MEAP score. In those cases, the student was excluded from the study. Across the 8 years, 115 students were excluded due to incomplete information. Table 1 summarizes demographic information of all participants in the study each year. Materials CBM probe. Three passages were randomly selected from the district basal fourth grade reading text, the Macmillan Connections Reading Program (Arnold & Smith, 1987). CBM school district norming procedures outlined by Shinn (1989) were followed. Due to the large number of participants and time constraints, only one 1-minute probe was used in the first 5 years. In the last 3 years, because of increased staff support, three 1-minute probes were used during each norming period and the median score was reported. Each passage was screened using the Fry (1977) readability formula to insure that all passages were at a fourth grade reading level. Michigan Educational Assessment Program (MEAP). The MEAP is a testing program based on Michigan s Essential Goals and Objectives for Education, as approved by the State Board of Education. The program assesses reading, math, writing, science, and social studies skills at 4th, 7th, and 11th grades. The section of the MEAP used in the present study was the fourth grade reading assessment, which is a group-administered untimed test with standard directions and procedures. The test is administered over a 2-day period, with each period lasting approximately 50 minutes. Student performance on two reading passages is included in the score each year. One passage is a story selection, and the other is an informational selection. Each passage selection has 20 multiple-choice questions. The questions are classified as one of three different constructing meaning items: Intersentence, Text, or Beyond Text (Michigan State Board of Education, 1999). Of these coprehension items, Intersentence questions assess the comprehension of material from two to three sentences. Text questions assess the comprehension of a larger section of the text and Beyond Text ques- 195

School Psychology Review, 2004, Volume 33, No. 2 Table 1 Demographic Information of Participating Sample Across Years Assessed Year Assessed 1994-1995- 1996-1997- 1998-1999- 2000-2001- Characteristic 1995 1996 1997 1998 1999 2000 2001 2002 Total General demographics Number of assessed 143 72 71 932 66 73 59 61 1477 students Number of excluded 4 4 7 89 5 0 4 2 115 students a Total sample 139 68 64 843 61 73 55 59 1362 School demographics Sex Students receiving 66% 68% 74% 60% 72% 70% 84% 75% 64% free/reduced lunch Students eligible for 6% 7% 3% 6% 6% 4% 16% 8% 6% special education Female 45% 41% 48% 48% 48% 48% 51% 56% 48% Male 55% 59% 52% 52% 52% 52% 49% 44% 52% Ethnicity African American 45% 42% 55% 45% 46% 48% 49% 47% 46% American Indian 0% 0% 0% 0% 4% 1% 2% 3% <1% Asian American 0% 0% 0% 1% 0% 0% 0% 0% 1% Caucasian 54% 58% 44% 48% 48% 44% 47% 47% 49% Hispanic 1% 0% 2% 7% 2% 7% 2% 2% 5% a Students without CBM or MEAP data were excluded from the data analysis. tions assess the student s ability to integrate personal experiences with text information. Each year, the test summary reports from the state department have classified the majority of questions as Text questions. MEAP test items are based on the Michigan Essential Goals and Objectives for Reading and Mathematics (Michigan Department of Treasury, 2001). A committee composed of teachers reviews test items and then the questions are pilot tested with students. The committee s job is to ensure the content validity of the test and the pilot testing is done for the purpose of item analysis. Cronbach s alpha was used to assess the internal consistency of the responses to each of the two reading 196

Curriculum-Based Measurement and State Assessments selections from the 1998-1999 MEAP reading test. The reliability for each selection was over.80. Data were not available for any other year (Michigan Department of Treasury, 2001). MEAP scores are based on how well the student answers the multiple-choice questions for the story selection and the informational selection. Raw scores are converted to scaled scores and a scaled score of 300 or above on each selection is required for a Satisfactory score. If a child scores 300 or above on only one selection a Moderate score is earned. Scores of 299 or below on both of the reading selections results in a Low score. The MEAP is described as a criterion-referenced test, but cut scores or the criteria for a Satisfactory score are established based on raw scores of the passage during pilot testing. This process of test development changes the raw score required to receive a scaled score of 300 across passages or years of MEAP administrations (Michigan State Board of Education, 1999). Procedure School psychologists, paraprofessionals, and school psychology interns were trained to administer and score the CBM reading probes as described by Shinn (1989) across the 8 years. The training consisted of instruction, modeling, and practice administrations until each assessor scored within one word of the instructor in terms of words read correctly per minute. All assessors were given a packet of the CBM passage, administration directions, and script. All students were administered the same reading passage(s) in the 2 weeks prior to the administration of the MEAP. In the first 3 years, the probe was administered in September and the MEAP was taken in October. In Years 4 through 8, the date of the MEAP administration changed to February and the probes were administered in late January. The CBM reading assessments were conducted in a hallway outside the child s classroom. Two chairs, and in some cases, tables were set up in the hallway. The probes were administered during instructional blocks so the halls were quiet and nondistracting. Procedures outlined by Shinn (1989) were followed for administering and scoring the CBM assessments. The child s regular teacher typically administered the MEAP test, but in some cases a principal or instructional specialist administered the test in small groups. Students identified for special education may have had an accommodation that allowed for individualized administration or breaking the test into smaller units. A 2-week testing window was established in which all schools were expected to complete both the reading and math portions of the MEAP. Interscorer Reliability Due to the high number of students tested, reliability measures were not feasible each year. Therefore, all assessors were held to a high criterion of agreement during training (i.e., +/- one word on each passage). In addition, during the 2001 2002 school year, reliability data were collected for 11 (18%) of the fourth grade students. Two observers listened to a student read and scored the passages independently. The passage scoring was compared word for word for agreement in scoring. Agreements were divided by agreements plus disagreements to yield an index of reliability. Interobserver agreement was.96 or above on each passage scored. Data Analysis Individual student data were analyzed and diagnostic efficiency statistics were used to determine the accuracy of the reading rate cut score (Elwood, 1993; Stage & Jacobsen, 2001). One hundred WCPM was selected as the cut score due to previous research identifying this metric as a cut score (Fuchs & Deno, 1982; Hasbrouck & Tindal, 1992; Stage & Jacobsen, 2001). Five statistical measurements were used to determine the diagnostic accuracy and included: 1. Sensitivity: the percentage of students who failed (less than Satisfactory score) the MEAP who read less than 100 WCPM. 2. Specificity: the percentage of students who passed the MEAP who read 100 WCPM or greater. 197

School Psychology Review, 2004, Volume 33, No. 2 3. Positive predictive power: the probability that a student reading less than 100 WCPM will score less than Satisfactory on the MEAP. 4. Negative predictive power: the probability that a student reading greater than or equal to 100 WCPM will score Satisfactory on the MEAP. 5. The Overall Correct Classification: the percent of agreement between WCPM cut scores and state assessment performance. Results Test-Retest Reliability The oral reading fluency scores obtained 2 weeks prior to the MEAP were correlated with three other CBM reading fluency scores to provide information on the stability of the measure. Seventy students from Year 4 (1997 1998) were included in this analysis. These samples were collected 2 months prior to, 2 weeks prior to, and 3 months after the main reading probe. Correlation coefficients were.87,.95, and.91, respectively. Criterion-Related Validity The means and standard deviations for the MEAP and reading rate scores were calculated each year. The concurrent, criterion-related validity of the CBM reading probe was also examined each year, by correlating the reading rate scores with the MEAP raw scores (the criterion variable). Each year was analyzed separately to ensure that the results were consistent, and not unduly influenced by a given year. The means and standard deviations for the MEAP raw scores and reading rate scores are listed in Table 2. The correlation between reading rate and MEAP scores was fairly consistent across all 8 years (range =.49 to.81), except for the 1998 1999 school year (r =.49). Across all years, the probability of obtaining the given correlation that large or larger given a true null hypothesis (i.e., that there is no relationship between reading rate and MEAP score or R = 0) was less than.001. Relationship Between Reading Rate and MEAP The relationship between reading rate and achieving a Satisfactory score on the MEAP is illustrated in Figure 1. This figure represents the cumulative percentage of students who passed the MEAP, or in other words, the percentage of all students who passed equal to or below a given reading rate. For example, at 10 WCPM, no students at that reading rate or below passed the MEAP. Twenty-six percent of students who passed the MEAP read less than 100 WCPM. Alternatively, 74% of students who passed the MEAP read 100 WCPM or greater. The cumulative percentage of students who passed begins to rise visibly above 50 WCPM. The acceleration in the percentage of students who passed was highest within the approximate range of 100 to 150 WCPM, at which point the rate of increase begins to diminish. Diagnostic Efficiency Statistics The diagnostic efficiency statistics for all 8 years are presented in Table 3. Using 100 WCPM as the cut score, the specificity of the cut score for identifying students who did achieve Satisfactory scores was 74%. The sensitivity of the cut score for identifying students who did not achieve Satisfactory scores on the MEAP was 75%. The base rate of not achieving a Satisfactory score was 54%. The positive predictive power of the cut score (the probability of correctly identifying students who scored below Satisfactory) was 77%, which is an improvement in prediction above the base rate. The negative predictive power of the cut score (the probability of correctly identifying students who achieved Satisfactory scores) was 72%, which again is an improvement in prediction above that based on the base rate. The base rate of achieving a Satisfactory score was 46%. The overall correct classification (correct classifications/n) using 100 WCPM as the cutoff was 74%. Cohen s kappa (1960), which corrects for chance agreements, was calculated to provide another measure of diagnostic efficiency (see Stage & Jacobsen, 2001, for a description of the calculation). Kappa was equal 198

Curriculum-Based Measurement and State Assessments Table 2 Descriptive and Inferential Statistics Across Years Assessed Statistic 1994-1995- 1996-1997- 1998-1999- 2000-2001- 1995 1996 1997 1998 1999 2000 2001 2002 Total MEAP M raw score 11.3 13.6 12.7 14.9 17 12.5 11.9 13.2 14.2 MEAP SD 4.8 3.7 4.5 3.8 2.5 3.8 5 4.3 4.2 WCPM M 72 93 93 104 102 95 87 87 98 WCPM SD 37 45 44 45 33 36 42 39 43.7 Correlation (r).77.69.74.63.49.65.81.76.67 t-tests on r df 137 66 62 841 59 71 53 57 1360 t 14.1* 7.7* 8.8* 23.5* 4.3* 7.1* 10.3* 8.8* 32.8* * p <.001. 100 90 80 Percentage of Students 70 60 50 40 30 20 10 0 0 50 100 150 200 250 CBM ORF Score Figure 1. Cumulative percentage of students earning MEAP satisfactory scores across different CBM scores. 199

School Psychology Review, 2004, Volume 33, No. 2 Table 3 Diagnostic Efficiency Statistics for Achieving a Satisfactory Score on the Michigan Educational Assessment s Fourth Grade Reading Test Using 100 Words Read Correctly Per Minute (WCPM) as the Cutoff Michigan Educational Assessment Program s Reading Test Oral Reading Fluency Fail (Below Satisfactory) Pass (Satisfactory) Total < 100 WCPM n = 545 n =167 n = 712 a b 100 WCPM n = 185 n = 465 n = 650 c d Total n = 730 n = 632 N = 1362 Note. Sensitivity = a/(a + c) = 75%; Specificity = d/(b + d) = 74%; Positive Predictive Power = a/(a + b) = 77%; Negative Predictive Power = d/(c + d) = 72%; MEAP Failure Base Rate = (a +c)/n = 54%; MEAP Pass Base Rate = (b + d)/n = 46%; Overall Correct Classification = (a + d)/n = 74%. to.48, which means that the diagnostic efficiency of the CBM cutoff score for classifying students was 48% above chance. As the diagnostic efficiency statistics indicated, 72% of students who read at least 100 WCPM passed the MEAP. The cut score can be raised or lowered, depending on the level of confidence in which a school district is interested. When the cut score is high, the percentage of students receiving a Satisfactory score is also high, but many students reading less than that also receive Satisfactory scores. For example, at 140+ WCPM, recommended as a mastery level by Howell and Nolet (1999), 84% of students received a Satisfactory score, but 39% of students who read less than 140 WCPM also received a Satisfactory score. A higher cut score generally increases the probability of predicting a Satisfactory score, but also decreases the probability of predicting a failure. Discussion The purpose of this study was to investigate the predictive value of oral reading fluency for performance on a state reading assessment, the MEAP. The results indicate a moderately strong relationship between oral reading rates and MEAP performance. The results were consistent with 1,362 students across 8 years, with variations in the MEAP, and time of administration. The MEAP is viewed as a comprehensive measure of reading skill. Furthermore, it is the standard by which schools in Michigan are judged. Although there is substantial validity evidence for CBM connected to standardized tests, to date there is little CBM validity research connected to mandated state assessments. In the current political climate, state tests drive much of the educational decision making for school districts. The results of this study link an effective research-based practice to one of the political pressures affecting school districts today. Similar to the results from Stage and Jacobsen (2001), oral reading fluency improved the prediction of performance on a state fourth-grade reading assessment above that based on the base rates of passing and failing. For unknown reasons, the correlation between MEAP scores and reading rate was higher in the present study than in the Stage and Jacobsen (2001) study (.67 vs..44), which compared reading rate to the WASL. This may be due to differences in the state tests or to differences in the populations used in the two studies. However, the current study extends the findings of Stage and Jacobsen (2001) by including a much larger percentage of low-ses and non-caucasian students, a much larger sample size, and a multiple-year study. Correlation coefficients were generally stable across the school years. With the excep- 200

Curriculum-Based Measurement and State Assessments tion of 1998-1999, the coefficients ranged from.63 to.81. The coefficients in the last 3 years, when three 1-minute probes were used (range =.65 to.81), were similar to the coefficients in the first 5 years when only a single 1-minute probe was used (range =.49 to.77). This finding supports the previous suggestion by Shinn (1989) that using a single 1-minute probe in norming will not substantially affect the stability of the measure. CBM reading probes are indicators of skill level in a complex domain involving many component skills. It would be a mistake to assume that developing oral reading rates or fluency is sufficient. However, these data do have implications for instructional and assessment practices. First, out of 1,362 district students, only 48% were at a level considered mastery (i.e., 100 WCPM), as defined by previous research (Fuchs & Deno, 1982). Although reading fluency may not be sufficient, it is an essential component of the reading repertoire, and instructional time should be allotted for this critical skill (National Reading Panel, 2000). It incorporates a blend of essential reading skills, such as phonics, word recognition, word meaning, and context clues, and as a measure, it is sensitive to changes in these skill areas. This relationship begins to set a target on a brief repeatable measure that predicts performance on a high stakes assessment. More importantly, this measure can be used on a repeated basis to formatively assist the teacher in making important instructional decisions and adjustments, and evaluate the effect of those adjustments or interventions for individual students. As a screening assessment, it is possible to identify those students on track and making appropriate progress versus the student in need of intensive intervention. This screening process can begin as early as preschool using DIBELS, and can continue throughout the elementary school years based on individual student need. One of the schools in this district successfully used CBM scores to evaluate the effects of a number of interventions, such as class-size reductions, peer tutoring, curriculum adaptations, and individual tutoring (Hixson & McGlinchey, 2002). For the practicing school psychologist, expertise in this assessment process would allow for expansion of the consultant role, especially with regard to the current national Reading First initiative. Curriculum-based measurement data can assist with instructional decision making at a number of levels from the district, school, grade, and individual level. The method cuts across general and special education, prereferral and referral processes, and IEP progress monitoring. What is common and most important across all of these applications is the use of these measures to inform instruction. Limitations Interobserver agreement data were collected during only 1 of the 8 years. The assessors were trained to criterion in the CBM administration procedures and given packets with explicit instructions. Based on this training, and previous research evidence about the reliability of the measures, the authors are confident that the ORF scores were reliable each year, but yearly interobserver agreement data would have provided a further level of confidence in the accuracy of the ORF scores. Other limitations relate to the MEAP itself. Validity studies have not been conducted on the MEAP. Therefore, it is unknown how well MEAP scores correlate with other validated measures of reading. Another concern is the variability in testing conditions for the MEAP. Although the authors have no knowledge of incorrect or inappropriate test administration conditions, because test administration practices are not observed by the state, the degree to which standard conditions are met is unknown. The MEAP, like many state assessments, is a high stakes test, so there is pressure on students, teachers, and administrators to obtain high scores. This pressure and the number of assessors and students assessed could affect the implementation of standard test conditions. The issue of fidelity with standardized test procedures is not unique to the MEAP and is a variable in any school group assessment. A second limitation is that the nature and degree to which some special education students included in the study received test ac- 201

School Psychology Review, 2004, Volume 33, No. 2 commodations is unknown. Across the years of the study, policy regarding inclusion of special education students in state assessments, acceptable accommodations and documentation of such accommodations, has gradually developed. Acceptable accommodations varied slightly across school years, but generally included individualized test administration and extended time limits. Documentation of students who received accommodations were not available, but the authors, who functioned as school psychologists in the district, knew of few students who received accommodations so the actual number of students is probably much less than the 6% of special education students included in the study. A final limitation is the unknown degree to which the present results are applicable to reading assessments in other states, but they do replicate the results of Stage and Jacobsen (2001) on the Washington state fourth grade reading test. Conclusion In summary, the present study adds to the substantial research base of studies that have established CBM reading probes as a valid assessment of reading skill. The fact that a simple, efficient, and repeatable measure can be used to predict performance on a state assessment should be encouraging to educators looking for methods to measure reading progress and adjust instruction. Further, CBM can be a powerful tool in assisting schools in their preparation for state assessments, thereby improving instruction. This is particularly true for districts with a high percentage of children from low-income backgrounds as in the current study. These results may provide the rationale for school districts to adopt an empirically sound, highly efficient assessment practice (i.e., CBM). References Arnold, V. A., & Smith, C. B. (1987). Macmillan connections reading program. New York: Macmillan Publishing Company. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37-46. Deno, S. L. (1985). Curriculum-based measurement: The emerging alternative. Exceptional Children, 52, 219-232. Deno, S. L., Mirkin, P. K., & Chiang, B. (1982). Identifying valid measures of reading. Exceptional Children, 49, 36-45. Elwood, R. W. (1993). Psychological tests and clinical discriminations: Beginning to address the base rate problem. Clinical Psychology Review, 13, 409-419. Fry, E. (1977). Fry s readability graph: Clarifications, validity, and extension to level 17. Journal of Reading, 21, 242-252. Fuchs, L. S. (1993). Enhancing instructional programming and student achievement with curriculum-based measurement. In J. Kramer (Ed.), Curriculum-based measurement (pp. 65-104). Lincoln, NE: Buros Institute of Mental Measurements. Fuchs, L., & Deno, S. (1982). Developing goals and objectives for educational programs [Teaching Guide]. Minneapolis, MN: Institute for Research in Learning Disabilities, University of Minnesota. Fuchs, L. S., & Fuchs, D. (1986). Effects of systematic formative evaluation on student achievement: A metaanalysis. Exceptional Children, 53, 199-208. Fuchs, L. S., & Fuchs, D. (1992). Identifying a measure for monitoring student reading progress. School Psychology Review, 21, 45-58. Fuchs, L. S., Fuchs, D., Hamlett, C. L., & Stecker, P. M. (1990). The role of skills analysis in curriculum based measurement in math. School Psychology Review, 19, 6-22. Fuchs, L. S., & Shinn, M. R. (1989). Writing CBM IEP objectives. In M. R. Shinn (Ed.), Curriculum-based measurement: Assessing special children (pp. 132-154). New York: Guilford Press. Good, R. H., III, & Jefferson, G. (1998). Contemporary perspectives on curriculum-based measurement validity. In M. R. Shinn (Ed.), Advanced applications of curriculum-based measurement (pp.61-88). New York: Guilford Press. Hasbrouck, J. E., & Tindal, G. (1992). Curriculum based oral reading fluency norms for students in grades 2 through 5. Teaching Exceptional Children, 24(3), 41-44. Hixson, M. D., & McGlinchey, M. T. (2002). Curriculumbased measurement reading scores as dynamic indicators of basic reading skills. Journal of Precision Teaching and Celeration, 18, 10-21. Howell, K. W., & Nolet, V. (1999). Curriculum-based evaluation: Teaching and decision making (3rd ed.). Belmont, CA: Wadsworth. Marston, D. (1989). Curriculum-based measurement: What it is and why do it? In M. R. Shinn (Ed.), Curriculumbased measurement: Assessing special children (pp. 19-78). New York: Guilford Press. Marston, D., Deno, S. O., Kim, D., Diment, K., & Rogers, D. (1995). Comparison of reading intervention approaches for students with mild disabilities. Exceptional Children, 62, 20-37. Marston, D., & Magnusson, D. (1985). Implementing curriculum-based measurement in special and regular education settings. Exceptional Children, 52, 266-276. Marston, D. B., Mirkin, P., & Deno, S. (1984). Curriculum-based measurement: An alternative to traditional screening, referral, and identification. Journal of Special Education, 18, 109-117. 202

Curriculum-Based Measurement and State Assessments Michigan Department of Treasury. (2001). Design and validity of the MEAP test. Retrieved April 11, 2002, from http://www.meritaward.state.mi.us/mma/ design.htm Michigan State Board of Education. (1999). 1999 update of the Essential Skills Reading Test Blueprint Michigan Educational Assessment Program. Lansing, MI: Michigan State Board of Education. National Reading Panel. (2000). Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction (NIH Publication No. 00-4769). Washington, DC: U.S. Department of Health and Human Services. Official DIBELS Homepage. (n.d.) Retrieved April 27, 2004, from http://dibels.uoregon.edu/ Shapiro, E. S. (1996). Academic skills problems: Direct assessment and intervention (2 nd ed.). New York: Guilford. Shinn, M. R. (1986). Does anyone really care what happens after the refer-test-place sequence: The systematic evaluation of special education program effectiveness. School Psychology Review, 15, 49-58. Shinn, M. R. (Ed). (1989). Curriculum-based measurement: Assessing special children. New York: Guilford Press. Shinn, M. R. (1995). Curriculum-based measurement and its use in a problem-solving model. In A. Thomas & J. Grimes (Eds.), Best practices in school psychology III (pp. 547-568). Washington, DC: National Association of School Psychologists. Shinn M. R., & Good, R. H. (1993). CBA: An assessment of its current status and prognosis for its future. In J. J. Kramer (Ed.), Curriculum-based measurement (pp. 139-178). Lincoln, NE: Buros Institute of Mental Measurements. Shinn, M. R., Good, R. H., Knutson, N., Tilly, W. D., & Collins, V. L. (1992). Curriculum-based measurement reading fluency: A confirmatory analysis of its relation to reading. School Psychology Review, 21, 459-479. Shinn, M. R., & Habedank, L. (1992). Curriculum-based measurement in special education problem identification and certification decisions. Preventing School Failure, 36, 11-15. Simmons, D. C., Kame enui, E. J., Good, R. H., Harn, B. A., Cole, C., & Braun, D. (2002). Building, implementing, and sustaining a beginning reading improvement model: Lessons learned school by school. In M. R. Shinn, H. M. Walker, & G. Stoner (Eds.), Interventions for academic and behavior problems II: Preventive and remedial approaches (pp.537-569). Bethesda, MD: National Association of School Psychologists. Stage, S. A., & Jacobsen, M. D. (2001). Predicting student success on a state-mandated performance-based assessment using oral reading fluency. School Psychology Review, 30, 407-419. Margaret T. McGlinchey received her PhD in School Psychology from Western Michigan University in 1988 and is an Instructional Consultant at Kalamazoo Regional Educational Service Agency. Her primary research interests are in the areas of curriculum-based measurement, systems analysis in education, and school consultation. Michael Hixson received his PhD in Applied Behavior Analysis from Western Michigan University in 1999 and is Assistant Professor in School Psychology at Central Michigan University. His primary research interests are in curriculum-based measurement, precision teaching, applied behavior analysis, and behavior development. 203