EVALUATION OF THE WATERFORD EARLY READING PROGRAM IN KINDERGARTEN

Similar documents
An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Evaluation of Teach For America:

Psychometric Research Brief Office of Shared Accountability

Effectiveness of McGraw-Hill s Treasures Reading Program in Grades 3 5. October 21, Research Conducted by Empirical Education Inc.

The Impact of Formative Assessment and Remedial Teaching on EFL Learners Listening Comprehension N A H I D Z A R E I N A S TA R A N YA S A M I

Research Design & Analysis Made Easy! Brainstorming Worksheet

A Guide to Adequate Yearly Progress Analyses in Nevada 2007 Nevada Department of Education

Colorado s Unified Improvement Plan for Schools for Online UIP Report

Shelters Elementary School

Cooper Upper Elementary School

Cooper Upper Elementary School

ILLINOIS DISTRICT REPORT CARD

ILLINOIS DISTRICT REPORT CARD

NCEO Technical Report 27

Kansas Adequate Yearly Progress (AYP) Revised Guidance

Technical Report #1. Summary of Decision Rules for Intensive, Strategic, and Benchmark Instructional

The Efficacy of PCI s Reading Program - Level One: A Report of a Randomized Experiment in Brevard Public Schools and Miami-Dade County Public Schools

Evidence for Reliability, Validity and Learning Effectiveness

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Miami-Dade County Public Schools

EXECUTIVE SUMMARY. Online courses for credit recovery in high schools: Effectiveness and promising practices. April 2017

Wisconsin 4 th Grade Reading Results on the 2015 National Assessment of Educational Progress (NAEP)

EFFECTS OF MATHEMATICS ACCELERATION ON ACHIEVEMENT, PERCEPTION, AND BEHAVIOR IN LOW- PERFORMING SECONDARY STUDENTS

A Pilot Study on Pearson s Interactive Science 2011 Program

African American Male Achievement Update

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

ECON 365 fall papers GEOS 330Z fall papers HUMN 300Z fall papers PHIL 370 fall papers

DELAWARE CHARTER SCHOOL ANNUAL REPORT

Process Evaluations for a Multisite Nutrition Education Program

OVERVIEW OF CURRICULUM-BASED MEASUREMENT AS A GENERAL OUTCOME MEASURE

Section V Reclassification of English Learners to Fluent English Proficient

School Competition and Efficiency with Publicly Funded Catholic Schools David Card, Martin D. Dooley, and A. Abigail Payne

Rhyne Elementary School Improvement Plan

Rhyne Elementary School Improvement Plan Rhyne Elementary School Contact Information

Greek Teachers Attitudes toward the Inclusion of Students with Special Educational Needs

Demographic Survey for Focus and Discussion Groups

Supply and Demand of Instructional School Personnel

Running head: METACOGNITIVE STRATEGIES FOR ACADEMIC LISTENING 1. The Relationship between Metacognitive Strategies Awareness

Statistical Peers for Benchmarking 2010 Supplement Grade 11 Including Charter Schools NMSBA Performance 2010

learning collegiate assessment]

Iowa School District Profiles. Le Mars

Massachusetts Department of Elementary and Secondary Education. Title I Comparability

School Size and the Quality of Teaching and Learning

DIBELS Next BENCHMARK ASSESSMENTS

The Oregon Literacy Framework of September 2009 as it Applies to grades K-3

PROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT. James B. Chapman. Dissertation submitted to the Faculty of the Virginia

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Further, Robert W. Lissitz, University of Maryland Huynh Huynh, University of South Carolina ADEQUATE YEARLY PROGRESS

STEM Academy Workshops Evaluation

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

Review of Student Assessment Data

Dibels Math Early Release 2nd Grade Benchmarks

IS FINANCIAL LITERACY IMPROVED BY PARTICIPATING IN A STOCK MARKET GAME?

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report

Coming in. Coming in. Coming in

The Effect of Written Corrective Feedback on the Accuracy of English Article Usage in L2 Writing

George Mason University Graduate School of Education Program: Special Education

Running head: DEVELOPING MULTIPLICATION AUTOMATICTY 1. Examining the Impact of Frustration Levels on Multiplication Automaticity.

DATE ISSUED: 11/2/ of 12 UPDATE 103 EHBE(LEGAL)-P

Probability and Statistics Curriculum Pacing Guide

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

Peer Influence on Academic Achievement: Mean, Variance, and Network Effects under School Choice

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

RtI: Changing the Role of the IAT

DLM NYSED Enrollment File Layout for NYSAA

The Condition of College & Career Readiness 2016

2012 ACT RESULTS BACKGROUND

NATIONAL CENTER FOR EDUCATION STATISTICS RESPONSE TO RECOMMENDATIONS OF THE NATIONAL ASSESSMENT GOVERNING BOARD AD HOC COMMITTEE ON.

An Assessment of the Dual Language Acquisition Model. On Improving Student WASL Scores at. McClure Elementary School at Yakima, Washington.

Level 1 Mathematics and Statistics, 2015

Updated: December Educational Attainment

The Achievement Gap in California: Context, Status, and Approaches for Improvement

STA 225: Introductory Statistics (CT)

Evaluation of the. for Structured Language Training: A Multisensory Language Program for Delayed Readers

Status of Women of Color in Science, Engineering, and Medicine

READY OR NOT? CALIFORNIA'S EARLY ASSESSMENT PROGRAM AND THE TRANSITION TO COLLEGE

Travis Park, Assoc Prof, Cornell University Donna Pearson, Assoc Prof, University of Louisville. NACTEI National Conference Portland, OR May 16, 2012

2013 TRIAL URBAN DISTRICT ASSESSMENT (TUDA) RESULTS

John F. Kennedy Middle School

Progress Monitoring & Response to Intervention in an Outcome Driven Model

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

Data Diskette & CD ROM

The Effects of Statewide Private School Choice on College Enrollment and Graduation

The Impacts of Regular Upward Bound on Postsecondary Outcomes 7-9 Years After Scheduled High School Graduation

Research Update. Educational Migration and Non-return in Northern Ireland May 2008

(Includes a Detailed Analysis of Responses to Overall Satisfaction and Quality of Academic Advising Items) By Steve Chatman

ACADEMIC AFFAIRS GUIDELINES

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

Appendix K: Survey Instrument

THEORY OF PLANNED BEHAVIOR MODEL IN ELECTRONIC LEARNING: A PILOT STUDY

5 Programmatic. The second component area of the equity audit is programmatic. Equity

Effective Pre-school and Primary Education 3-11 Project (EPPE 3-11)

Aimsweb Fluency Norms Chart

Comparing Teachers Adaptations of an Inquiry-Oriented Curriculum Unit with Student Learning. Jay Fogleman and Katherine L. McNeill

Literacy Across Disciplines: An Investigation of Text Used in Content-Specific Classrooms

Rural Education in Oregon

CHAPTER III RESEARCH METHOD

Sheila M. Smith is Assistant Professor, Department of Business Information Technology, College of Business, Ball State University, Muncie, Indiana.

Principal vacancies and appointments

Transcription:

EVALUATION OF THE WATERFORD EARLY READING PROGRAM IN KINDERGARTEN 25-6 STEPHEN POWERS, PH.D. AND CONNIE PRICE-JOHNSON, M.A. CREATIVE RESEARCH ASSOCIATES 23 E. BROADWAY, SUITE 221 TUCSON, ARIZONA 85719 (52) 884-8667 SVPOWERS@AOL.COM JANUARY 27

2 ACKNOWLEDGEMENTS We would like to express our appreciation to the Tucson Unified School District s Office of Accountability and Research for their cooperation in retrieving relevant data from their database and for their expertise in merging and organizing the data for our analysis. This evaluation study would not be possible without their cooperation. Stephen Powers, Ph.D. and Connie Price-Johnson, M.A. Creative Research Associates, Inc.

3 ABSTRACT Background The Waterford Early Reading Program (WERP), a technology-based program for early elementary grades, was provided through Arizona all-day kindergarten funds to kindergarten students in 15 Title I elementary schools in the Tucson Unified School District (TUSD) in the 25-6 school year. The purpose of this study is to evaluate the reading achievement of the kindergartners in the schools with the WERP and in a comparison group of 15 schools in the same district. The schools where the WERP was implemented are identified in this report as Schools A- L. The comparison schools are identified as Schools M-AA. Research Design This evaluation design was a comparison-group study (quasi-experimental design) involving a treatment (WERP) implemented in 15 Title I schools ranked with the highest percentages of students on free/reduced lunch. A comparison group of 15 schools was selected from those with the next highest percentages of students on free/reduced lunch. The comparison schools did not receive the WERP. Both matching techniques and statistical controls were used to make the groups similar in the analysis. The Dynamic Indicators of Basic Early Literacy Skills (DIBELS) Initial Sound Fluency, Letter Naming Fluency, Word Use Fluency, Phoneme Segmentation Fluency, and Nonsense Word Fluency and the district s Core Curriculum Standard Assessment (CCSA) reading test were given as pretests and posttests during the school year. In addition, the amount of time that each kindergartner used the WERP computer software was extracted from the software and used in the analysis. Statistical Analysis Dependent samples t tests were used to determine gains for the WERP and comparison groups, and gain score analysis was used to compare these gains for both groups. Analysis of covariance was used to adjust the posttest means for differences on the pretest means of the students. Data were disaggregated by school, gender, ethnicity, pretest achievement quartiles, primary home language, and English language learner (ELL) status in order to determine patterns of achievement among these groups.

4 Major Findings The WERP kindergartners consistently outperformed the comparison group kindergartners on all outcome measures. school kindergartners did make substantial and in some cases outstanding gains from pretest to posttest. However, when these were compared with the gains of the WERP kindergartners, the WERP gains were substantially and significantly greater. Effect sizes of gains favored the WERP kindergartners, as well as effect sizes comparing the posttest achievement of the WERP kindergartners with the comparison kindergartners. WERP gains were greater for males in the WERP program than for males in the comparison group, and for females in the WERP than for females in the comparison group. WERP gains were greater for Whites, Hispanics, African Americans, Native Americans, and Asians than for their counterparts in the comparison group. WERP gains of White, African American, Hispanic, and Asian kindergartners were greater than the gains of White kindergartners in the comparison group. WERP gains of kindergartners with a primary home language of English, Spanish, and other languages were greater than their counterparts in the comparison group. WERP gains of kindergartners with a primary home language of Spanish were greater than the gains of English primary home language kindergartners in the comparison group. That is, WERP Spanish home language students who were learning English reading skills outperformed the comparison group English primary home language students. WERP gains of kindergartners in four different quartile levels of reading pretest achievement outperformed the comparison students with the largest gains in the top (fourth) quartile. WERP English language learners outperformed comparison group English language learners. WERP English language learners with emergent reading skills outperformed the non- English language learners (proficient English speakers) in the comparison group. Usage of the WERP software was found to be significantly correlated with the reading outcome measures and the pretest to posttest gains. It is an important finding that the greater the use of WERP content, the greater the reading gains.

5 TABLE OF CONTENTS ACKNOWLEDGEMENTS...2 ABSTRACT...3 BACKGROUND AND PURPOSE...7 Background... 7 Purpose... 7 METHODS...8 Study Setting... 8 Research Design... 8 Study Population... 8 Statistical Analyses... 11 Measures of Outcomes... 12 RESULTS...15 Effect Estimates of the Intervention... 15 Intervention Effects on Subgroups... 32 WERP Usage Effects... 52 SUMMARY AND DISCUSSION...55 APPENDIX A: READING SCORES BY SCHOOL...57 APPENDIX B: READING PERCENTILES BY SCHOOL...58 APPENDIX C: READING SCORES BY SCHOOL for ALL STUDENTS...59 REFERENCES...61 ABOUT THE AUTHORS...63

6 TABLES Table 1. Ethnicity of Kindergartners... 9 Table 2. Kindergartners in the WERP Evaluation... 1 Table 3. Kindergartners Usage Minutes with the WERP... 13 Table 4. Administration of DIBELS and CCSA Reading 25-6... 14 Table 5. Gains on All Outcome Measures (WERP Students With 11 or More Usage Minutes)... 16 Table 6. Gains on All Outcome Measures (All Students).Error! Bookmark not defined. Table 7. Gains on All Outcome Measures (Students With 9 or More Days Attendance)... 21 Table 8.Effect Size on All Outcome Measures (WERP Students With 11 or More Usage Minutes)... 23 Table 9. Effect Size on All Outcome Measures (All Students)... 25 Table 1. Effect Size on All Outcome Measures (Students With 9 or More Days Attendance)... 26 Table 11. ANCOVA and Effect Sizes on All Outcome Measures (WERP Students With 11 or More Usage Minutes)... 28 Table 12. ANCOVA and Effect Sizes on All Outcome Measures (All Students)... 3 Table 13. ANCOVA and Effect Sizes on All Outcome Measures (Students With 9 or More Days Attendance)... 31 Table 14.WERP + Reading First and School Means and Gains... 32 Table 15. WERP + Reading First and Gains on All Outcome Measures... 34 Table 16. Males and Females on DIBELS Total Reading Score... 36 Table 17. Ethnic Groups on DIBELS Total Reading Score... 38 Table 18. Ethnic Groups on DIBELS Total Reading Score Grouped by Ethnicity... 39 Table 19. Primary Home Languages on DIBELS Total Reading Score... 43 Table 2. Four Achievement Quartiles on the DIBELS Total Reading Score... 46 Table 21. ELL and Non-ELL Students on DIBELS Total Reading Score... 49 Table 22. ANCOVA of WERP ELL and Non-ELL Students... 51 Table 23. Correlations of WERP Usage, Achievement and Gains... 52 Table 24. DIBELS Total Reading Score by WERP Usage Level... 53 Table 25. DIBELS Total Reading Scores by School... 57 Table 26. DIBELS Total Reading Local Percentiles by School... 58 Table 27. All Students on DIBELS Total Reading Score... 59 Table 28. Schools Ranked by Pretest Means on the DIBELS Total Reading Score... 6

7 BACKGROUND AND PURPOSE Background The importance of early reading interventions has been argued by many researchers (Finn, Rotherham, & Hokanson, 21; National Association for the Education of Young Children [NAEYC] & International Reading Association [IRA], 1998). Finn, et.al. (21) has noted the problem of an achievement gap especially among ethnic groups, and how this gap widens as the years pass. The value of technology in the early grades and its integration with instruction has been noted by many (NAEYC, 1996). Walberg (21), a well-known evaluator, reported after reviewing the Waterford Early Reading Program (WERP) that it was spectacularly effective for beginning readers who initially scored in the lower third of the group (p. 11). The WERP is research-based and uses technology to teach young children to read, write, and keyboard. The program has three levels (Level 1, Level 2, and Level 3) and a separate Phonological Awareness component designed for K-2. Of particular interest to the Tucson Unified School District (TUSD) is that the Waterford Institute (22a) has specified how the WERP addresses issues of the No Child Left Behind legislation in the major areas of emergent reading skills as well as the skills assessed by the Dynamic Indicators of Basic Early Literacy Skills (DIBELS) (Waterford, 22b), which is used in TUSD to assess reading in kindergarten. WERP was implemented in the kindergartens of 15 schools of TUSD in the 25-6 school year. It is from this year of WERP implementation that the data for the present study comes. Purpose The purpose of this study was to evaluate the effectiveness of the WERP in the kindergartens of TUSD Title I schools and to compare the pretest-to-posttest reading achievement of the kindergartners during the 25-6 school year with that of 15 comparison schools that did not receive the program.

8 METHODS Study Setting The TUSD school district is the largest school district in the Tucson area and the second largest in Arizona. It is a multiethnic school district with over 6, students, 3,7 teachers and 2 administrators. Research Design The present study used a non-equivalent group, quasi-experimental design. The treatment was the WERP software, which was loaded onto classroom computers in the 15 Title I schools with the greatest percentage of students on free/reduced lunch and the greatest percentage of students who were English language learners. The comparison schools were from the schools next in line to be eligible for Title I funds with the next highest percentages on free/reduced lunch. Issues of missing data (Allison, 21; Little & Rubin, 22; McKnight, McKnight, & Figueredo, 27) were considered, but the relationships among missing data and other variables were minor and therefore no substitutions or imputations were made. Treatment The treatment used was the WERP software, which was loaded onto classroom computers to supplement the district reading program. The WERP also has teacher manuals, videos, worksheets and other classroom and take-home materials. Students rotated in using the four to six computers in the classroom for 15 minutes at a time as recommended by the Waterford Institute. This study addresses implementation only to the extent that students used the WERP software. Study Population Treatment and Groups WERP schools Fifteen elementary schools were slated to receive the WERP. One of these opted instead for the Waterford Early Math & Science program, and in two additional schools it was impossible to extract the WERP usage data from the computers. The 12 remaining schools (74 kindergarten students) used the WERP software in the kindergartens as planned and usage data was available. These schools are identified in this report as Schools A-L. In two of these (Schools I and L), the low level of program usage resulted in their exclusion from most of the analyses.

9 Schools The comparison group schools are identified in this report as Schools M-AA. 148 kindergarten students participated in the comparison schools. In the WERP group, 51% were males and 49% females. Of the comparison group, 5% were male and 5% were female. The primary language of the WERP group was 48% English, 49% Spanish and 3% other languages. In the comparison group, 68% used English as a primary language, 29% used Spanish, and 3% percent used another language. See Table1 for the ethnic composition of the WERP and comparison groups. Table 1. Ethnicity of Kindergartners Ethnicity WERP N % N % African American 17 4.7 16 7.2 Asian 8 2.2 22 1.5 Hispanic 297 83. 18 68.1 Native American 21 5.9 6 4.1 White 15 4.2 284 19.2 Total 358 1. 148 1. Note. WERP students selected with 11 minutes (6 months) or more usage of WERP. Figure 1. Ethnicity of Kindergartners Student ethnicity African American Asian Hispanic WERP Native American White % 2% 4% 6% 8% 1% Filtering of Students Intent-to-treat analysis (ITT) Research designs with randomization of treatment and control students often consider ITT analyses so that all students randomly assigned are entered into the analysis. This avoids the problem of the bias inherent in analyzing only students who are compliant with the research design (Ellenberg, 1996). Although students were not randomly

1 assigned, the ITT approach of assessing all students present in the program was followed along with other groupings. Efficacy subset analysis was also followed with those kindergarteners with 11 minutes of WERP usage and with those with 9 days or more attendance. The ITT group for the present study comprised all kindergarten students present at the beginning of the 25-6 school year who were pretested. This distinction is important because in the public schools students enter classrooms throughout the school year and are not necessarily those who began the project at the beginning of the year. In the analyses, only those students with both pretest and posttest were included. This reduced the number of students in the study to 334 students in the WERP schools and 1211 in the comparison schools. See Table 2. Filtered by 11 minutes of WERP usage Those students with at least 11 minutes of usage of the WERP comprised the 11- minutes group. This level of usage represents six months of using the program three times a week for 15 minutes at a time and was deemed sufficient to ensure an effect on student learning. This criterion excluded students from Schools I and L as mentioned above, reducing to 1 the number of schools and to 358 the number of WERP students in most of the analyses. These WERP student gains were compared with those of the comparison group, which did not participate in the WERP and therefore had no minutes of usage. Filtered by 9 Days Attendance The gains of all of the WERP and comparison group kindergartners with 9 days or more attendance were compared. This criterion eliminated students from both groups who had poor attendance. Table 2. Kindergartners in the WERP Evaluation Group Total 9 Days 11 Mins Pre-Posttest WERP 74 636 358 334 148 1183 148* 1211 Total 222 1871 1838 1545 Note. *Students in group did not use Waterford, so number of students in the 11 Mins column represents the nonwaterford students used in the comparative analyses.

11 Statistical Analyses Whole-Group Analyses Statistical analyses used included paired and independent samples t tests, gain score analysis, analysis of covariance, and effect size analysis. The students were grouped by 1) WERP students with at least 11 minutes of program usage, 2) the intent-to-treat group (all students), and 3) students with at least 9 days attendance. Gain Score Analysis The pretest-to-posttest gains were compared using paired samples t tests to determine if the gains were statistically significant. In addition, the WERP and the comparison group gains were compared using independent samples t tests to determine if they were significantly different. Analysis of Covariance Analysis of covariance (ANCOVA) was used to adjust the posttest means of the WERP and the comparison groups for differences on the pretest. Following Winer (1971) and Kirk (1968), two of the issues of ANCOVA were examined: Significant differences between the WERP and comparison groups on the pretest would justify the use of ANCOVA to adjust for these differences. To apply ANCOVA, there should also be homogeneity of regressions of the groups being compared. The WERP and comparison groups were compared to determine if their pretest means were significantly different. This was the case for Initial Sounds Fluency, Letter Naming Fluency, Word Use Fluency and Phoneme Segmentation Fluency. This suggests the use of ANCOVA for these outcome measures. The WERP and comparison groups were compared to determine if there was homogeneity of regression. Only on Word Use Fluency and Phoneme Segmentation Fluency was there statistically significant heterogeneity of regression. Therefore, these results are presented with a caveat to the interpretation of the analyses of covariance. Effect Size Analysis Effect size analysis was completed to compare the WERP and comparison groups following the standard effect size methods (Cohen, 1977). The effect size that was considered useful was.2 for a small effect size,.5 for medium effect size, and.8 as a large effect size (Cohen, 1977). Subgroup Analyses In addition to the analyses of the WERP and comparison groups as a whole, several analyses of subgroups were carried out:

12 The gains of three schools using the WERP and the Reading First programs were compared to those of three comparison schools with nearly the same pretest mean reading scores. Pretest-to-posttest gains of male and female kindergartners in the WERP and comparison schools were compared. Pretest-to-posttest gains of African-American, Asian, Hispanic, Native American and White kindergartners in the WERP and comparison schools were compared. Pretest-to-posttest gains of kindergartners with English, Spanish, and other primary home languages in the WERP and comparison schools were compared. Pretest-to-posttest gains of WERP and comparison group kindergartners in four reading achievement quartiles of the pretest were made in order to compare reading gains at different ability levels. Pretest-to-posttest gains of ELL kindergartners and non-ell (English-proficient) kindergartners in the WERP and comparison groups were compared. WERP Usage Effects Correlational analyses between the total minutes of usage, reading achievement, and reading gains of the WERP students were completed to examine the relationship and effectiveness of the usage of WERP. In addition, WERP students were categorized according to the number of minutes they used the WERP software. Their gains in reading were computed from pretest to posttest for each of the seven levels of usage. Measures of Outcomes Usage minutes Staff of Pearson Digital Learning, which markets the Waterford Institute s products, collected the number of minutes each student used the WERP directly from the computers in the classrooms. TUSD s Office of Accountability and Research matched these records with student test scores and eliminated personal identifiers before the records were analyzed for this study. Only students with sufficient exposure to the WERP (i.e., 11 minutes or six months) were used in most of the analyses. Table 3 shows the number of students who used Level 1, Level 2 or the Phonological Awareness component of the WERP for any amount of time, and the range of minutes a single student spent on that level.

13 Table 3. Kindergartners Usage Minutes with the WERP Usage N Minutes Reading Level 1: total minutes in course 7 2175 Reading Level 2: total minutes in course 23 2585 Phonological Awareness: total minutes 74 962 Total of all usage minutes 725-43 Dynamic Indicators of Basic Early Literacy Skills (DIBELS) The Dynamic Indicators of Basic Early Literacy Skills (DIBELS), developed by researchers and specialists in early childhood education at the University of Oregon (Good & Kaminski, 22), is a standardized assessment administered by TUSD to all kindergartners in the district three times a year and sent to the developers of the test for scoring. Scores are reported as raw scores and local percentiles. The DIBELS is composed of five subscales: Initial Sounds Fluency Letter Naming Fluency Word Use Fluency Phoneme Segmentation Fluency, and Nonsense Word Fluency scales. Good and Kaminski (22) reported psychometric research into the properties of the DIBELS. In summary, these authors report alternate-form and test-retest reliabilities and predictive and concurrent validities of the subscales to range from.36 to.91 with a median reliability of.66. The content of the subscales was carefully described and the constructs were described and related to the subscales so that one could conclude a high degree of content validity of these subscales. It was concluded that the DIBELS subscales were adequate for the present study. The Waterford Institute (22b) provided a detailed analysis of how the WERP activities were assessed by the DIBELS, as well as how the WERP addressed issues of the No Child Left Behind law (Waterford, 22a). For the purposes of the present study, the average of the five DIBELS subscales was computed to provide an overall measure of the pretest and posttest reading achievement, or Total Reading score, of the kindergarten students. The internal consistency (alpha) reliability of the test was.79. Only students who completed all administered subscales were entered into the Total Reading score. Core Curriculum Standard Assessment (CCSA) Reading Test The Core Curriculum Standard Assessment (CCSA) reading test was developed by TUSD for district use (TUSDStats, n.d.). It parallels the criterion-based Arizona s Instrument to Measure Standards (AIMS) and is given in the grades where the AIMS is not. The CCSA places kindergartners in four levels of achievement (, 1, 2, 3), which

14 correspond to the AIMS levels of Falls Far Below, Approaches, Meets and Exceeds. Scores of 2 and 3 (Meets and Exceeds) are considered passing or mastery of the content. TUSD teachers gave the CCSA in the fall and spring, serving as a pretest and posttest along with the DIBELS. Administration of reading measures in 25-6 Table 4 shows the times during the school year when the DIBELS subtests and the CCSA were administered. Table 4. Administration of DIBELS and CCSA Reading 25-6 Measure Fall 25 Winter Spring 26 DIBELS Initial Sounds Fluency X X Letter Naming Fluency X X X Word Use Fluency X X X Phoneme Segmentation Fluency X X Nonsense Word Fluency X X Total Reading Score X X CCSA Reading Reading Performance X X

15 RESULTS Effect Estimates of the Intervention Gain Score Analyses The pretest-posttest scores were analyzed using paired samples t tests to determine if there were significant gains. In addition, the gains themselves were compared for the WERP and the comparison group using independent samples t tests. This was the most straightforward analysis of the gains of the WERP and comparison group. Criticism of gain score analysis has focused on instances in which a low pretest score can give the appearance of great gains and great program impact, ignoring the tendency of regression toward the mean (Linn, 1981). To address some of these concerns, ANCOVA was used for additional analyses. Findings WERP students with 11 minutes or more use of WERP software outperformed comparison students on all reading outcome measures. The gains were substantial, statistically significant, and consistent. In the ITT group (all students), the WERP students outperformed the comparison students on all reading outcome measures. These gains were statistically significant and consistent, but were somewhat lower than the gains of the WERP students with 11 usage minutes relative to the comparison group. In the ITT group, both WERP and comparison groups included all students without filtering for 11 usage minutes of the WERP. This may account for the differences in the results of the two analyses. In the 9-day attendance group, the WERP group gains were greater than the comparison group gains, showing that the WERP students consistently and significantly outperformed the comparison students on all outcome reading measures. Summary The three separate analyses reported above were undertaken to determine the effectiveness of the WERP on student achievement when compared to the comparison group. The comparison group was of a higher socio-economic status as measured by percent of students on free/reduced lunch and had a higher percentage of Englishproficient students. In spite of these differences, the WERP students outperformed the comparison students consistently, significantly, and in many cases substantially in each of the analyses of 1) WERP students with 11 usage minutes and the comparison group, 2) the ITT group with all students in the WERP and comparison groups, and 3) both WERP and comparison groups filtered for at least 9 days attendance during the 25-6 school year.

16 Table 5. Gains on All Outcome Measures (WERP Students with 11 or More Usage Minutes) Pretest Posttest Measures N M SD M SD Gains t p DIBELS: ISF WERP 334 4.87 5.71 23.5 13.88 18.63 24.87. 1218 6.42 6.82 17.53 12.24 11.11 31.86. WERP vs. 7.52*** DIBELS: LNF WERP 334 4.44 8.17 43.47 16.32 39.3 44.92. 1155 6.3 1.15 4.89 16.36 34.59 76.84. WERP vs. 4.44*** DIBELS: WUF WERP 325 3.57 7.15 32.93 2.53 29.36 26.85. 998 4.94 1.4 32.34 2.87 27.4 4.55. WERP vs. 1.96 a DIBELS: PSF WERP 355 21.5 15.93 46.43 15.5 25.38 3.72. 1219 17.1 15.87 39.34 18.69 22.24 46.7. WERP vs. 3.14** DIBELS: NWF WERP 355 18.26 14.61 38.67 2.59 2.41 24.81. 1217 14.66 15.2 31.46 2.21 16.8 38.21. WERP vs. 3.61*** DIBELS: Total Reading WERP 334 1.62 6.98 33.59 12.42 22.97 48.8. 1211 1.22 8.29 29.32 13.33 19.1 71.46. WERP vs. 3.87*** TUSD: CCSA Reading WERP 311 1.9.49 2.68.64 1.59 38.. 1263 1.7.59 2.41 1.2 1.34 46.44. WERP vs..25*** Note. ISF = Initial Sounds Fluency, LNF = Letter Naming Fluency, WUF = Word Use Fluency, PSF = Phoneme Segmentation Fluency, NWF = Nonsense Word Fluency. *p <.5, ** p <. 1, *** p <.1 from independent t tests comparing gains. a p =.142

17 Figure 2. Gains on All Outcome Measures 25 DIBELS Initial Sounds Fluency Gains Mean score 2 15 1 5 6.42 4.87 Pretest to posttest 23.5 17.53 Waterford Mean score 5 4 3 2 1 DIBELS Letter Naming Fluency Gains 43.47 4.89 6.3 4.44 Pretest to posttest Waterford DIBELS Word Use Fluency Gains Mean score 35 3 25 2 15 1 5 4.94 3.57 Pretest to posttest 32.34 32.93 Waterford

18 Figure 2. Gains on All Outcome Measures (continued) Mean score 5 4 3 2 1 DIBELS Phoneme Segmentation Fluency Gains 46.43 39.34 21.5 17.1 Pretest to posttest Waterford 5 DIBELS Nonsense Word Fluency Gains 4 38.67 Mean score 3 2 18.26 31.46 Waterford 1 14.66 Pretest to posttest Mean score 4 3 2 1 DIBELS Total Reading Gains 33.59 29.32 1.62 1.22 Pretest to posttest Waterford

19 Figure 2. Gains on All Outcome Measures (continued) CCSA Reading Gains 3 2.5 2.68 Mean score 2 1.5 1.5 1.9 1.7 2.41 Waterford Pretest to posttest

2 Table 6. Gains on All Outcome Measures (All Students) Pretest Posttest Measures N M SD M SD Gains t p DIBELS: ISF WERP 64 5.26 5.99 2.81 13.41 15.55 29.25. 1218 6.42 6.82 17.53 12.24 11.11 31.86. WERP vs. 4.44*** DIBELS: LNF WERP 63 4.2 7.39 41.99 17.15 37.97 58.67. 1155 6.3 1.15 4.89 16.36 34.59 76.84. WERP vs. 3.38*** DIBELS: WUF WERP 584 3.73 8.18 31.29 19.82 27.56 34.72. 998 4.94 1.4 32.34 2.87 27.4 4.55. WERP vs..16*** DIBELS: PSF WERP 67 2.44 16.52 44.43 16.78 23.99 38.83. 1219 17.1 15.87 39.34 18.69 22.24 46.7. WERP vs. 1.75* DIBELS: NWF WERP 67 15.66 14.1 34.59 2.1 18.93 33.52. 1217 14.66 15.2 31.46 2.21 16.8 38.21. WERP vs. 2.13** DIBELS: Total Reading WERP 636 1.8 7.34 31.11 12.58 21.3 62.16. 1211 1.22 8.3 29.32 13.33 19.1 71.46. WERP vs. 1.93*** TUSD: CCSA Reading WERP 625 1.8.48 2.62.75 1.54 45.65. 1263 1.7.59 2.41 1.2 1.34 46.44. WERP vs..2*** Note. ISF = Initial Sounds Fluency, LNF = Letter Naming Fluency, WUF = Word Use Fluency, PSF = Phoneme Segmentation Fluency, NWF = Nonsense Word Fluency. *p <.5, ** p <. 1, *** p <.1 from independent t tests comparing gains.

21 Table 7. Gains on All Outcome Measures (Students With 9 or More Days Attendance) Pretest Posttest Measures N M SD M SD Gains t p DIBELS: ISF WERP 64 5.26 5.99 2.81 13.41 15.55 29.25. 1188 6.35 6.79 17.55 12.24 11.2 31.69. WERP vs. 4.35*** DIBELS: LNF WERP 628 4.3 7.4 42.9 17.7 38.6 58.95. 1152 6.31 1.16 4.92 16.35 34.61 76.83. WERP vs. 3.45*** DIBELS: WUF WERP 582 3.75 8.19 31.31 19.8 27.56 34.72. 995 4.95 1.41 32.36 2.86 27.41 4.5. WERP vs..15*** DIBELS: PSF WERP 661 2.5 16.54 44.38 16.85 23.88 38.51. 121 17.18 15.87 39.47 18.62 22.29 46.61. WERP vs. 1.59* DIBELS: NWF WERP 661 15.76 14.13 34.68 2.8 18.92 33.21. 128 14.74 15.3 31.57 2.2 16.83 38.1. WERP vs. 2.9** DIBELS: Total Reading WERP 636 1.8 7.34 31.11 12.58 21.3 62.16. 1183 1.19 8.26 29.51 13.31 19.32 71.85. WERP vs. 1.71*** TUSD: CCSA Reading WERP 613 1.8.48 2.65.7 1.57 48.84. 1197 1.7.58 2.53.9 1.46 56.68. WERP vs..11** Note. ISF = Initial Sounds Fluency, LNF = Letter Naming Fluency, WUF = Word Use Fluency, PSF = Phoneme Segmentation Fluency, NWF = Nonsense Word Fluency. *p <.5, ** p <. 1, *** p <.1 from independent t tests comparing gains.

22 Effect Size Analyses Effect size analysis was used to compare the WERP and comparison groups following the standard effect size methods (Cohen, 1977). The effect size that was considered useful was.2 for a small effect size,.5 for medium effect size, and.8 as a large effect size (Cohen, 1977). The differences between the effect sizes and the ratio of the effect sizes were examined. Findings WERP students with 11 minutes or more use of WERP software outperformed the comparison students as measured by the effect sizes of their gains on all reading outcome measures. The differences in gains were substantial, consistent and larger than.2. Four of the effect sizes were small, one was medium, and the difference in effect sizes of the DIBELS Total Reading score and the CCSA reading test were medium (.62) to large (.85). In the ITT group (all students) the WERP student effect sizes were greater than the comparison effect sizes. The differences in effect sizes were small to moderate for four comparisons. The DIBELS Total Reading score and the CCSA reading test showed effect size differences of.42 and.53. In the 9-day attendance group, the WERP effect sizes were greater than the comparison effect sizes. The differences in effect sizes were small to moderate for all but three of the DIBELS reading scales. The DIBELS Total Reading score and the CCSA reading test showed effect size differences of.38 and.32. Summary The three separate analyses reported above were undertaken to determine the effectiveness of the WERP student achievement when compared to a comparison group. The WERP students outperformed the comparison students consistently with greater effect sizes. Most of the effect size differences were at or above the criterion for a small effect size of.2.

23 Table 8. Effect Size on All Outcome Measures (WERP Students With 11 or More Usage Minutes) Measures Gains N M GainSD ES Ratio DIBELS: ISF WERP 334 18.63 13.7 1.36 1.49 1218 11.11 12.17.91 WERP vs.45 DIBELS: LNF WERP 334 39.3 15.88 2.46 1.9 1155 34.59 15.3 2.26 WERP vs.2 DIBELS: WUF WERP 325 29.36 19.72 1.49 1.16 998 27.4 21.34 1.28 WERP vs.21 DIBELS: PSF WERP 355 25.38 15.57 1.63 1.22 1219 22.24 16.63 1.34 WERP vs.29 DIBELS: NWF WERP 355 2.41 15.5 1.32 1.2 1217 16.8 15.33 1.1 WERP vs.22 DIBELS: Total Reading WERP 334 22.97 8.6 2.67 1.3 1211 19.1 9.3 2.5 WERP vs.62 TUSD: CCSA Reading WERP 311 1.59.74 2.15 1.65 1263 1.34 1.3 1.3 WERP vs.85 Note. N = Number, M = Mean Gain, GainSD = Gain Standard Deviation, ES = Effect Size of Gain, Ratio = Ratio of WERP ES to comparison ES. The pretest-posttest effect size in the mean gain divided by the standard deviation (Walberg, 21).

24 Figure 3. Effect Size on All Outcome Measures Effect Size Mean Score 3 2.5 2 1.5 1 1.36.91 2.46 2.26 1.63 1.49 1.28 1.34 1.32 1.1 2.67 2.5 2.15 1.3 Waterford.5 ISF LNF WUF PSF NWF Total Reading CCSA

25 Table 9. Effect Size on All Outcome Measures (All Students) Measures Gains N M GainSD ES Ratio DIBELS: ISF WERP 64 15.55 13.45 1.16 1.27 1218 11.11 12.17.91 WERP vs.25 DIBELS: LNF WERP 63 37.97 16.24 2.33 1.3 1155 34.59 15.3 2.26 WERP vs.7 DIBELS: WUF WERP 584 27.56 19.18 1.44 1.13 998 27.4 21.34 1.28 WERP vs.16 DIBELS: PSF WERP 67 23.99 15.99 1.5 1.12 1219 22.24 16.63 1.34 WERP vs.16 DIBELS: NWF WERP 67 18.93 14.62 1.29 1.17 1217 16.8 15.33 1.1 WERP vs.19 DIBELS: Total Reading WERP 636 21.3 8.53 2.47 1.2 1211 19.1 9.3 2.5 WERP vs.42 TUSD: CCSA Reading WERP 625 1.54.84 1.83 1.41 1263 1.34 1.3 1.3 WERP vs.53 Note. ISF = Initial Sounds Fluency, LNF = Letter Naming Fluency, WUF = Word Use Fluency, PSF = Phoneme Segmentation Fluency, NWF = Nonsense Word Fluency. N = Number, M = Mean Gain, PreSD = Pretest Standard Deviation, ES = Effect Size of Gain, Ratio = Ratio of WERP ES to comparison ES. The pretest-posttest effect size in the mean gain divided by the standard deviation (Walberg, 21).

26 Table 1. Effect Size on All Outcome Measures (Students With 9 or More Days Attendance) Measures Gains N M GainSD ES Ratio DIBELS: ISF WERP 64 15.55 13.45 1.16 1.26 1188 11.2 12.18.92 WERP vs.24 DIBELS: LNF WERP 628 38.6 16.18 2.35 1.4 1152 34.61 15.29 2.26 WERP vs.9 DIBELS: WUF WERP 582 27.56 19.16 1.44 1.13 995 27.41 21.35 1.28 WERP vs.16 DIBELS: PSF WERP 661 23.88 15.95 1.5 1.12 121 22.29 16.64 1.34 WERP vs.16 DIBELS: NWF WERP 661 18.92 14.65 1.29 1.17 128 16.83 15.35 1.1 WERP vs.19 DIBELS: Total Reading WERP 636 21.3 8.53 2.47 1.18 1183 19.32 9.24 2.9 WERP vs.38 TUSD: CCSA Reading WERP 613 1.57.8 1.96 1.2 1197 1.46.89 1.64 WERP vs.32 Note. ISF = Initial Sounds Fluency, LNF = Letter Naming Fluency, WUF = Word Use Fluency, PSF = Phoneme Segmentation Fluency, NWF = Nonsense Word Fluency. N = Number, M = Mean Gain, PreSD = Pretest Standard Deviation, ES = Effect Size of Gain, Ratio = Ratio of WERP ES to comparison ES. The pretest-posttest effect size in the mean gain divided by the standard deviation (Walberg, 21).

27 Analysis of Covariance and Effect Size Analysis of covariance (ANCOVA) was used to adjust the posttest means of the WERP and the comparison groups for differences on the pretest. Following this, the adjusted posttest means were compared as part of the ANCOVA with an F test, and effect sizes were calculated to determine if they were small (.2), medium (.5), or large (.8) using Cohen s (1977) criteria. Findings WERP students with 11 minutes or more using the software had significantly higher (p <.1) posttest mean scores than the comparison group except in Word Use Fluency. The effect sizes ranged from small to medium for all adjusted posttest mean comparisons except for Word Use Fluency. In the ITT group (all students), the WERP group had significantly higher (p <.1) posttest mean scores than the comparison group except in Word Use Fluency, where the posttest mean comparison was non-significant. The effect sizes ranged from small to medium for all adjusted posttest mean comparisons except for in Letter Naming Fluency, Word Use Fluency, and Nonsense Word Fluency. In the 9-day or more attendance group, the WERP group posttest means were significantly (p <.1) higher than the comparison group with the exception of Word Use Fluency. The effect sizes that were small ranged from.18 to.31. These effect sizes combined with the F tests indicated that the WERP program was effective in all areas but Word Use Fluency. Summary The three separate analyses reported above were undertaken to determine the effectiveness of the WERP on student achievement when compared to the comparison group with the posttest means adjusted for pretest differences. For all three separate analyses, the WERP posttest means were higher than those of the comparison group except in Word Use Fluency. For the three analyses with 21 ANCOVA F tests, 18 (86%) were significant favoring the WERP group. All of the comparisons of the WERP and comparison groups using the DIBELS Total Reading score as well as the CCSA reading resulted in statistically significant differences favoring the WERP group. In addition, effect sizes involving the Total Reading score ranged from.2 to.42, indicating an effect difference between the two groups.

28 Table 11. ANCOVA and Effect Sizes on All Outcome Measures (WERP Students With 11 or More Usage Minutes) Covariate AdjPosttest Measures N M SD M SD ES F p DIBELS: ISF WERP 334 4.87 5.71 24.14 13.88.56 81.57. 1218 6.42 6.82 17.35 12.24 DIBELS: LNF WERP 334 4.44 8.17 44.41 16.32.25 16.33. 1155 6.3 1.15 4.61 16.36 DIBELS: WUF WERP 325 3.57 7.15 33.41 2.53.6.89.345 998 4.94 1.4 32.18 2.87 DIBELS: PSF WERP 355 21.5 15.93 44.58 15.5.31 26.22. 1219 17.1 15.87 39.88 18.69 DIBELS: NWF WERP 355 18.26 14.61 36.19 2.59.26 18.65. 1217 14.66 15.2 32.18 2.21 DIBELS: Total Reading WERP 334 1.62 6.98 33.22 12.42.42 46.16. 1211 1.22 8.29 29.43 13.33 TUSD: CCSA Reading WERP 311 1.9.49 2.67.64.28 2.4. 1263 1.7.59 2.41 1.2 Note. ISF = Initial Sounds Fluency, LNF = Letter Naming Fluency, WUF = Word Use Fluency, PSF = Phoneme Segmentation Fluency, NWF = Nonsense Word Fluency. The effect size is the adjusted mean posttest difference divided by the square root of the ANCOVA mean squared residual.

29 Figure 4. ANCOVA and Effect Sizes of on All Outcome Measures ANCOVA and Effect Sizes Effect size.7.6.5.4.3.2.1.56.25.6.31.26.42 ISF LNF WUF PSF NWF Total Reading Reading test.28 CCSA

3 Table 12. ANCOVA and Effect Sizes on All Outcome Measures (All Students) Covariate AdjPosttest Measures N M SD M SD ES F P DIBELS: ISF WERP 64 5.26 5.99 21.2 13.41.32 41.96. 1218 6.42 6.82 17.32 12.24 DIBELS: LNF WERP 63 4.2 7.39 43. 17.15.17 12.13.1 1155 6.3 1.15 4.33 16.36 DIBELS: WUF WERP 584 3.73 8.18 31.65 19.82..2.651 998 4.94 1.4 32.12 2.87 DIBELS: PSF WERP 67 2.44 16.52 43.11 16.78.2 17.41. 1219 17.1 15.87 4.6 18.69 DIBELS: NWF WERP 67 15.66 14.1 34. 2.1.15 9.44.2 1217 14.66 15.2 31.78 2.21 DIBELS: Total Reading WERP 636 1.8 7.34 31.23 12.58.22 2.23. 1211 1.22 8.3 29.26 13.33 TUSD: CCSA Reading WERP 625 1.8.48 2.62.75.23 2.8. 1263 1.7.59 2.41 1.2 Note. ISF = Initial Sounds Fluency, LNF = Letter Naming Fluency, WUF = Word Use Fluency, PSF = Phoneme Segmentation Fluency, NWF = Nonsense Word Fluency. The effect size is the adjusted mean posttest difference divided by the square root of the ANCOVA mean squared residual.

31 Table 13. ANCOVA and Effect Sizes on All Outcome Measures (Students With 9 or More Days Attendance) Covariate AdjPosttest Measures N M SD M SD ES F P DIBELS: ISF WERP 64 5.26 5.99 21.17 13.41.31 4.3. 1188 6.35 6.79 17.35 12.24 DIBELS: LNF WERP 628 4.3 7.4 43.11 17.7.18 12.77. 1152 6.31 1.16 4.37 16.35 DIBELS: WUF WERP 582 3.75 8.19 31.68 19.8 -.2.2.652 995 4.95 1.41 32.15 2.86 DIBELS: PSF WERP 661 2.5 16.54 43.8 16.85.19 15.51. 121 17.18 15.87 4.18 18.62 DIBELS: NWF WERP 661 15.76 14.13 34.9 2.8.15 9.2.3 128 14.74 15.3 31.9 2.2 DIBELS: Total Reading WERP 636 1.8 7.34 31.2 12.58.2 16.9. 1183 1.19 8.26 29.46 13.31 TUSD: CCSA reading WERP 613 1.8.48 2.65.7.15 9.7.3 1197 1.6.58 2.53.9 Note. ISF = Initial Sounds Fluency, LNF = Letter Naming Fluency, WUF = Word Use Fluency, PSF = Phoneme Segmentation Fluency, NWF = Nonsense Word Fluency. The effect size is the adjusted mean posttest difference divided by the square root of the ANCOVA mean squared residual. The covariate was the reading pretest.

32 Intervention Effects on Subgroups This section examines the outcome measures when the student population is disaggregated by program, by gender, by ethnicity, by primary home language and by ELL status. WERP and Reading First The following pairs of schools were examined because these WERP schools had a high level of implementation of the program, were also Reading First schools, and were closely matched on the DIBELS pretest Total Reading Score with a comparison school that did not have either program. Thus School J (WERP) was matched with School X (comparison), School K (WERP) was matched with School V (comparison), and School H (WERP) was matched with School M (comparison). The three schools with the WERP and the Reading First program outperformed the comparison schools with which they were matched, both together as a group and school by school. These schools were School J vs. School X; School K vs. School V; and School H vs. School M. Analysis of the DIBELS subscales indicated that students receiving Reading First and the WERP performed significantly better than students in the comparison schools in Initial Word Fluency, Letter Naming Fluency, Nonsense Word Fluency, Total Reading score and in the CCSA reading assessment. In the DIBELS Total Reading score the difference was 4.9 points, statistically significant at the.1 level. See Tables 14 and 15 and Figures 5 and Figure 6. Table 14. WERP + Reading First and School Means and Gains Schools N Pretest Posttest Gain WERP+ Reading First: School J 83 11.24 35.76 24.52 : School X 69 11.33 32.63 21.3 WERP vs -.9 3.13 3.22 WERP+ Reading First: School K 79 11.89 37.99 26.1 : School V 97 11.9 31.97 2.7 WERP vs -.1 6.2 6.3 WERP+ Reading First: School H 52 12.86 32.98 2.12 : School M 92 13.6 29.56 16.5 WERP vs -.2 3.42 3.62 Note. WERP and comparison schools were matched by DIBELS Total Reading Score Pretest Mean.

33 Figure 5. WERP + Reading First and Schools Matched on DIBELS Reading Pretest Mean of School J and School X Mean score 4 3 2 1 11.24 11.33 35.76 32.63 Waterford (School J) (School X) Pretest to posttest of School K and School V Mean score 4 3 2 1 11.89 11.9 37.99 31.97 Waterford (School K) (School V) Pretest to posttest of School H and School M 4 Mean score 3 2 1 12.86 13.6 32.98 29.56 Waterford (School H) (School M) Pretest to posttest

34 Table 15. WERP + Reading First and Gains on All Outcome Measures Pretest Posttest Measures N M SD M SD Gain t p DIBELS: ISF WERP 214 5.1 5.4 25.24 13.81 2.14 21.62. 26 6.5 7.24 15. 11.42 8.5 11.19. WERP vs. 11.64*** DIBELS: LNF WERP 214 4.88 8.99 45.95 15.81 41.7 38.81. 239 5.4 9.67 41.79 16.55 36.75 36.34. WERP vs. 4.32** DIBELS: WUF WERP 211 3.41 7.55 32.72 2.63 29.31 2.9. 84 9.46 13.16 39.51 2.71 3.5 13.94. WERP vs. -.74 DIBELS: PSF WERP 23 23.96 16.16 5.53 11.11 26.57 25.81. 255 19.37 16.32 44.93 18.58 25.56 24.12. WERP vs. 1.1 DIBELS: NWF WERP 23 2.52 14.2 43.94 2.12 23.42 22.33. 253 17.17 13.96 34.91 19.1 17.74 18.85. WERP vs. 5.68*** DIBELS: Total Reading WERP 214 11.88 7.6 35.91 11.46 24.3 44.11. 258 12.16 9.31 31.29 13.47 19.13 3.99. WERP vs. 4.9*** TUSD: CCSA Reading WERP 189 1.6.5 2.72.58 1.66 31.75. 265 1.8.6 2.36 1.7 1.28 19.84. WERP vs..38*** Note. ISF = Initial Sounds Fluency, LNF = Letter Naming Fluency, WUF = Word Use Fluency, PSF = Phoneme Segmentation Fluency, NWF = Nonsense Word Fluency. WERP students selected with 11 minutes (6 months) or more usage of WERP Reading Program. *p <.5, ** p <. 1, *** p <.1 from independent t tests comparing gains.

35 Figure 6. WERP + Reading First and Gains on Total Reading Score WERP+Reading First Mean score 4 3 2 1 11.88 12.16 35.91 31.29 Waterford Pretest to posttest

36 Gender Females significantly outperformed males on the DIBELS Total Reading Score in both WERP and comparison schools. See Table 16. Males in the WERP schools outperformed males in comparison schools on the DIBELS Total Reading score, and WERP females outperformed comparison females. See Table 16 and Figure 7. Table 16. Males and Females on DIBELS Total Reading Score Pretest Posttest Group N M SD M SD Gain t p WERP Female 164 12.42 7.17 36.79 11.67 24.37 38.7. Male 17 8.89 6.33 3.5 12.38 21.61 32.9. Female vs Male 2.76** Female 63 11.6 8.35 31.9 13.12 2.3 54.59. Male 68 9.4 8.17 27.57 13.32 18.17 47.18. Female vs Male 1.86*** Pretest Posttest Group N M SD M SD Gain t p Male WERP 17 8.89 6.33 3.5 12.38 21.61 32.9. 68 9.4 8.17 27.57 13.32 18.17 47.18. WERP vs. 3.44*** Female WERP 164 12.42 7.17 36.79 11.67 24.37 38.7. 63 11.6 8.35 31.9 13.12 2.3 54.59. WERP vs. 4.34*** Note. WERP students selected with 11 minutes (6 months) or more usage of WERP Reading Program. *p <.5, ** p <. 1, *** p <.1 from independent t tests comparing gains.

37 Figure 7. Males and Females on DIBELS Total Reading Score Gains for Females 4 36.79 Mean score 3 2 1 12.42 11.6 31.9 Waterford Pretest to posttest Gains for Males 4 Mean score 3 2 1 9.4 8.89 3.5 27.57 Waterford Pretest to posttest

38 Ethnicity All ethnic groups, whether in the WERP schools or in the comparison schools, made important gains from pretest to posttest on the DIBELS Total Reading score. See Table 17. A comparison of gains shows that all ethnic groups receiving the WERP made greater gains than their counterparts in the comparison group. A surprising finding was that Hispanic (23.19), Asian (22.9), and African American (2.19) students in the WERP schools made greater gains pretest to posttest on the Total Reading score than did the White (19.82) students not receiving WERP. The greatest gain pretest to posttest (26.23 points) was made by the White students in the WERP schools. These students also showed the greatest gain relative to their counterparts in the comparison schools for a statistically significant difference of 6.41 points. The White WERP students showed the greatest gain relative to their comparison counterparts and the Native Americans showed the lowest. An analysis of variance (ANOVA) showed that the differences among ethnicities within the WERP group were not statistically significant. Table 17. Ethnic Groups on DIBELS Total Reading Score Pretest Posttest Group N M SD M SD Gain t p WERP White 13 14.48 7.61 4.71 9.67 26.23 1.68. African American 16 6.16 4.97 26.35 9.75 2.19 1.15. Hispanic 279 1.65 6.91 33.84 12.48 23.19 45.45. Native American 18 9.63 5.86 29.27 9.86 19.63 1.69. Asian 8 14.55 9.53 37.45 16.6 22.9 5.32.1 White 219 12.43 9.17 32.25 13.91 19.82 31.97. African American 81 1.24 9.9 28.7 14.95 17.83 15.67. Hispanic 85 9.59 7.92 28.65 12.95 19.6 6.33. Native American 46 1.69 7.94 29.7 12.2 19.1 13.93. Asian 15 12.31 7.56 3.47 15.95 18.16 6.18. Note. WERP students selected with 11 minutes (6 months) or more usage of WERP Reading Program. * p <.5, * <.1, *** p <.1 from independent t tests to compare gains.

39 Table 18. Ethnic Groups on DIBELS Total Reading Score Grouped by Ethnicity Pretest Posttest Group N M SD M SD Gain t p White WERP 13 14.48 7.61 4.71 9.67 26.23 1.68. 219 12.43 9.17 32.25 13.91 19.82 31.97. WERP vs. 6.41* African American WERP 16 6.16 4.97 26.35 9.75 2.19 1.15. 81 1.24 9.9 28.7 14.95 17.83 15.67. WERP vs. 2.36 Hispanic WERP 279 1.65 6.91 33.84 12.48 23.19 45.45. 85 9.59 7.92 28.65 12.95 19.6 6.33. WERP vs. 4.13*** Native American WERP 18 9.63 5.86 29.27 9.86 19.63 1.69. 46 1.69 7.94 29.7 12.2 19.1 13.93. WERP vs..62 Asian WERP 8 14.55 9.53 37.45 16.6 22.9 5.32.1 15 12.31 7.56 3.47 15.95 18.16 6.18. WERP vs. 4.74 Note. WERP students selected with 11 minutes (6 months) or more usage of WERP Reading Program. *p <.5, ** p <. 1, *** p <.1 from independent t tests comparing gains. Figure 8. Ethnic Groups on DIBELS Total Reading Score 4 Gains for African American Students Mean score 3 2 1 1.24 6.16 28.7 26.35 Waterford Pretest to posttest

4 Figure 8. Ethnic Groups on DIBELS Total Reading Score (continued) Gains for Asian Students 4 37.45 Mean score 3 2 1 14.55 12.31 3.47 Waterford Pretest to posttest Gains for Hispanic Students 4 Mean score 3 2 1 1.65 9.59 33.84 28.65 Waterford Pretest to posttest Gains for Native American Students 4 Mean score 3 2 1 1.69 9.63 29.7 29.27 Waterford Pretest to posttest

41 Figure 8. Ethnic Groups on DIBELS Total Reading Score (continued) Gains for White Students 5 4 4.71 Mean score 3 2 1 14.48 12.43 32.25 Waterford Pretest to posttest Figure 9. Mean Gains by Ethnicity on DIBELS Total Reading Score Mean Gains Across Ethnicities 3 26.23 Mean gain 25 2 15 1 22.9 23.19 2.19 17.83 18.16 19.6 19.6319.1 19.82 Waterford 5 African American Asian Hispanic Native American White

42 Primary Home Language Whether their primary home language was English, Spanish or another language, WERP students outperformed their counterparts in the comparison group on the DIBELS Total Reading Score. This difference was statistically significant for the English and the Spanish home language groups. WERP students with Spanish (22.21) as their primary home language significantly outperformed in gains the comparison group students who spoke English as their primary home language (2.15). The greatest gain in pretest to posttest scores was by the English-speaking WERP students, who gained 24.18 points. The WERP group with the greatest gain (5.56 points) relative to the comparison group was that of students who spoke a primary home language other than English or Spanish. This diverse group includes refugee children who often have a history of upheavals, trauma and no prior school experience.

43 Table 19. Primary Home Languages on DIBELS Total Reading Score Pretest Posttest Group N M SD M SD Gain t p WERP English 16 11.41 6.69 35.58 11.66 24.18 36.68. Spanish 163 1.8 7.13 32.29 12.84 22.21 32.52. Other 11 7.11 7.48 23.82 1.62 16.71 7.79. English 823 11.42 8.77 31.57 13.17 2.15 62.79. Spanish 362 7.94 6.53 25.23 12.17 17.29 36.49. Other 26 4.15 4.88 15.3 12.31 11.15 6.43. Pretest Posttest Group N M SD M SD Gain t p English WERP 16 11.41 6.69 35.58 11.66 24.18 36.68. 823 11.42 8.77 31.57 13.17 2.15 62.79. WERP vs. 4.3*** Spanish WERP 163 1.8 7.13 32.29 12.84 22.21 32.52. 362 7.94 6.53 25.23 12.17 17.29 36.49. WERP vs. 4.92*** Other WERP 11 7.11 7.48 23.82 1.62 16.71 7.79. 26 4.15 4.88 15.3 12.31 11.15 6.43. WERP vs. 5.56 Note. Other languages are Af-Mayma, Amharic, Arabic, Cantonese, Persian, Filipino, French, Laotian, Marshallese, Portuguese, Russian, Somali, and Vietnamese. WERP students selected with 11 minutes (6 months) or more usage of WERP Reading Program. *p <.5, ** p <. 1, *** p <.1 from independent t test comparing gains.

44 Figure 1. Primary Home Languages on DIBELS Total Reading Score Primary Home Language English Mean score 4 3 2 1 11.42 11.41 35.58 31.57 Waterford Pretest to posttest Primary Home Language Spanish 4 Mean score 3 2 1 1.8 7.94 32.29 25.23 Waterford Pretest to posttest Other Primary Home Language 4 Mean score 3 2 1 7.11 4.15 23.82 15.3 Waterford Pretest to posttest