DISCOVERY EDUCATION ASSESSMENT PROGRESS MONITORING SCREENING TOOL

1 DISCOVERY EDUCATION ASSESSMENT PROGRESS MONITORING SCREENING TOOL NATIONAL CENTER ON RESPONSE TO INTERVENTION EVALUATION CRITERIA

2 Table of Contents National Center for RTI Screening Tool...3 Five Criteria for Screening Tool 4 DEA Progress Monitoring Screening Tool Data...6 Generalizability..6 Reliability...7 Validity...8 Classification Analysis..12 Appendices 18

3 National Center for RTI Screening Tool Starting in 2009, Discovery Education Assessment (DEA) benchmark assessments have received high marks for use as screening tools from the National Center on Response to Intervention. Screening tools can be used to identify students who are at risk of not meeting grade level proficiency standards. These at risk students can then be placed into Response to Intervention (RTI) or similar programs. The National Center defines a screening tool as follows: Screening involves brief assessments that are valid, reliable, and evidence-based. They are conducted with all students or targeted groups of students to identify students who are at risk of academic failure and, therefore, likely to need additional or alternative forms of instruction to supplement the conventional general education approach. The National Center s Technical Review Committee (TRC) on Screening independently established a set of criteria for evaluating the scientific rigor of screening tools. The TRC rated each submitted tool against these criteria. Five types of scientifically rigorous criteria were evaluated: (1) Classification Accuracy; (2) Generalizability; (3) Reliability; (4) Validity; and (5) Disaggregated Reliability, Validity, and Classification Data for Diverse Populations. DEA received high marks on all five criteria. (The NCRTI Screening Tools chart can be found at http://www.rti4success.org/chart/screeningtools/screeningtoolschart.html#) Convincing Evidence Partially Convincing Evidence Unconvincing Evidence

4 These five criteria were defined as follows: Generalizability Generalizability refers to the extent to which results generated from one population can be applied to another population. A tool is considered more generalizable if studies have been conducted on larger, more representative samples. A rating of Moderate High means the screening tool has a Large representative national sample or multiple regional/state samples with no cross-validation or one or more regional/state samples with cross-validation. Reliability Reliability refers to the consistency with which a tool classifies students from one administration to the next. A tool is considered reliable if it produces the same results when administering the test under different conditions, at different times, or using different forms of the test. A rating of Convincing Evidence means that split-half, coefficient alpha, test-retest, or inter-rater reliability (is) greater than.80. Validity Validity refers to the extent to which a tool accurately measures the underlying construct that it is intended to measure. Validity is measured in three ways: content validity, construct validities above.70, and predictive validities above.70. Classification Accuracy Classification accuracy refers to the extent to which a screening tool is able to accurately classify students into "at risk for reading or mathematics disability" and "not at risk for reading or mathematics disability" categories. Classification accuracy was measured by a statistic known as the Area Under the Curve (AUC). AUC values have to be.85 or greater to receive a rating of Convincing Evidence. (Area Under the Curve (AUC) Statistic: an overall indication of the diagnostic accuracy of a Receiver Operating Characteristic (ROC) curve. ROC curves are a generalization of the set of potential combinations of sensitivity and specificity possible for predictors. AUC values closer to 1 indicate the screening measure reliably distinguishes among students with satisfactory and unsatisfactory reading performance, whereas values at.50 indicate the predictor is no better than chance.)

5 Disaggregated Reliability, Validity, and Classification Data for Diverse Populations Data are disaggregated when they are calculated and reported separately for specific sub-groups. Evidence for disaggregated reliability, validity, and classification data receive the highest scores in this category. The following sections describe the specific studies DEA used to achieve high marks from the National Center.

6 DEA Progress Monitoring Screening Tool Data DEA Screeners encompass the subjects of Reading and Mathematics, spanning Grades 3 to 10. Evaluation criteria will be presented in the following order: Generalizability, Reliability, Validity, and then Classification Accuracy. Information on disaggregation by ethnic group will be outlined in the sections on validity and classification accuracy. Generalizability (Moderate High) Once again, generalizability refers to the extent to which results generated from one population can be applied to another population. A tool is considered more generalizable if studies have been conducted on larger, more representative samples. DEA presented data drawn from two representative sources: (1) The state of Kentucky, comprised of more than 6000 students from five representative school districts; and (2) the District of Columbia Public School System, one of the nation s largest urban districts comprised of more than 20,000 students in Grades 3 to 10. Standardized test scores were obtained for each student from the Kentucky Core Commonwealth Test (KCCT) from the Spring of 2008 and the District of Columbia Comprehensive Assessment System (DCCAS), also from the Spring of 2008. Each student also completed three DEA benchmarks during the 2007-2008 school year: Fall, Winter, and Spring. Additional data was also obtained for the DCCAS from Spring 2009.

7 Reliability (Convincing Evidence) Reliability refers to the consistency with which a tool classifies students from one administration to the next. A tool is considered reliable if it produces the same results when administering the test under different conditions, at different times, or using different forms of the test. Cronbach s alpha was used to measure a screener s reliability. The following tables presents the range and median reliability coefficients for DEA Screeners for the three time periods (Fall, Winter, Spring), separately for Kentucky and DC, and also separately for Reading and Mathematics. The median reliabilities for all test ranges exceed.80, the criteria established by the National Center to receive a Convincing rating. Table 1: Reliabilities for DEA Reading Screeners Reading Test Period Range Median Fall.73 to.85.84 Kentucky District of Columbia Winter.82 to.86.84 Spring.84 to.86.85 Fall.82 to.87.86 Winter.77 to.89.82 Spring.84 to.87.86 Table 2: Reliabilities for DEA Mathematics Screeners Mathematics Test Period Range Median Fall.76 to.87.83 Kentucky Winter.79 to.84.83 Spring.81 to.86.84 District of Columbia Fall.79 to.87.82 Winter.75 to.85.81 Spring.81 to.90.85

8 Validity (Partially Convincing Evidence) Validity refers to the extent to which a tool accurately measures the underlying construct that it is intended to measure. Content validity represents how well a tool measures the skills and knowledge of a particular domain. Criterion validity measures how well scores on a tool correlate with scores on an external measure, such as state tests. Content Validity Content validity evidence shows that test content is appropriate for the particular constructs that are being measured. Content validity is measured by agreement among subject matter experts about test material and alignment to state standards, by highly reliable training procedures for item writers, by thorough reviews of test material for accuracy and lack of bias, and by examination of depth of knowledge of test questions. To ensure content validity of all tests, Discovery Education Assessment carefully aligns the content of its assessments to a given state s content standards and the content sampled by the respective high stakes test. Discovery Education Assessment hereby employs one of the leading alignment research methodologies, the Webb Alignment Tool (WAT), which has continually supported the alignment of our tests to state specific content standards both in breadth (i.e., amount of standards and objectives sampled) and depth (i.e., cognitive complexity of standards and objectives). All Discovery Education Assessment tests are thus state specific and feature matching reporting categories of a given state s large-scale assessment used for accountability purposes. DEA s screening tools in Reading and Mathematics are aligned to state specific standards. The Kentucky screening tools match standards on the Kentucky Core Content Test (KCCT). The District of Columbia screening tools match standards on the District of Columbia Comprehensive Assessment System (DCCAS). Criterion validity: predictive and concurrent Criterion validity evidence demonstrates that test scores predict scores on an important criterion variable, such as a state s standardized large-scale assessment. Predictive validity occurs when the screening tool is administered at least three months before a state test. DEA screening tools given during the Fall and Winter periods are predictive of the state test given in the spring. Concurrent validity occurs when the screening tool is administered less than three months before a state test. The DEA screening tool administered in the spring represents concurrent validity. The following tables present predictive and concurrent correlations between DEA Reading and Mathematics Screeners and either Kentucky or District of Columbia state tests from Spring 2008. Additional data is also provided for the District of Columbia for Spring 2009. The median correlations for Reading range from.61 to.75, and the median correlations for Mathematics

9 range from.61 to.79. Correlations are presented for all students first and then for the two disaggregated groups of African-American and Hispanic. The National Center established criteria of.70 for predictive validities. Many DEA validities exceeded this value and many others were in the high.60 range. DEA received a rating of Partially Convincing for validity. Table 3: Predictive and Concurrent Validity for Reading Screeners for All Students Reading Test Period Range Median Fall Predictive.69 to.74.71 Kentucky Winter Predictive.67 to.71.68 Spring Concurrent.64 to.73.70 Fall Predictive.65 to.69.67 District of Columbia Winter Predictive.65 to.71.66 Spring Concurrent.65 to.70.69 Table 4: Predictive and Concurrent Validity for Reading Screeners for African-American Students Reading Test Period Range Median Fall Predictive.60 to.68.67 Kentucky Winter Predictive.60 to.75.67 Spring Concurrent.48 to.70.68 Fall Predictive.57 to.65.61 District of Columbia Winter Predictive.61 to.70.62 Spring Concurrent.63 to.69.64 Table 5: Predictive and Concurrent Validity for Reading Screeners for Hispanic Students Reading Test Period Range Median Fall Predictive.59 to.70.65 District of Columbia Winter Predictive.62 to.69.65 Spring Concurrent.60 to.72.63

10 Table 6: Predictive and Concurrent Validity for Reading Screeners for District of Columbia 2009 Reading Test Period Range Median Fall Predictive.66 to.73.70 District of Columbia Winter Predictive.68 to.78.74 Spring Concurrent.69 to.78.76 Table 7: Predictive and Concurrent Validity for Mathematics Screeners for All Students Mathematics Test Period Range Median Fall Predictive.71 to.81.76 Kentucky Winter Predictive.72 to.82.76 Spring Concurrent.73 to.78.76 Fall Predictive.56 to.70.67 District of Columbia Winter Predictive.67 to.75.71 Spring Concurrent.65 to.77.74 Table 8: Predictive and Concurrent Validity for Mathematics Screeners for African-American Students Mathematics Test Period Range Median Fall Predictive.67 to.70.69 Kentucky Winter Predictive.66 to.77.72 Spring Concurrent.63 to.76.72 Fall Predictive.52 to.64.61 District of Columbia Winter Predictive.63 to.71.64 Spring Concurrent.58 to.74.69 Table 9: Predictive and Concurrent Validity for Mathematics Screeners for Hispanic Students Mathematics Test Period Range Median Fall Predictive.47 to.70.64 District of Columbia Winter Predictive.68 to.78.68 Spring Concurrent.65 to.76.72

11 Table 10: Predictive and Concurrent Validity for Mathematics Screeners for District of Columbia 2009 Mathematics Test Period Range Median Fall Predictive.65 to.75.71 District of Columbia Winter Predictive.72 to.80.76 Spring Concurrent.78 to.81.79

12 Classification Accuracy (Convincing Evidence) A screening tool should classify students into one of two categories: At Risk or Not At Risk. The accuracy of this classification can be assessed by comparing this prediction to a student s status on a standardized outcome measure. DEA benchmarks for Reading and Mathematics classified students into the two categories of At Risk and Not At Risk based on the following Kentucky and DC specific proficiency levels: Kentucky Novice and Apprentice (At Risk) Proficient and Distinguished (Not At Risk) District of Columbia Below Basic and Basic (At Risk) Proficient and Advanced (Not At Risk) The actual proficiency level of students was obtained from results on the Spring 2008 KCCT or the Spring 2008 DCCAS. Furthermore, additional results were obtained for the Spring 2009 DCCAS. The accuracy and errors of predictions using a screening tool can be classified into one of four outcomes. Screener At Risk Not At Risk (a+c) a) True Positive indicates the number of students predicted as At Risk on the screener that are actually At Risk on the state test. c) False Positive indicates the number of students predicted as Not At Risk on the screener that are actually At Risk on the state test. So, these students have been falsely identified as Not At Risk. State Test At Risk Not At Risk True Positive (a) False Positive (b) Total Predicted At Risk (a+b) False Negative (c) True Negative (d) Total Predicted Not At Risk (c+d) Total True At Risk Total True Not At Total Students Risk (b+d) (a+b+c+d) b) False Positive indicates the number of students predicted as At Risk on the screener that are actually Not At Risk on the state test. So, these students have been falsely identified as At Risk. d) True Negative indicates the number of students predicted as Not At Risk on the screener that are actually Not At Risk on the state test.

13 Two desirable characteristics of screeners are Sensitivity and Specificity. Sensitivity is the percent of Total True At Risk students that are True Positives. Specificity is the percent of Total True Not At Risk students that are True Negatives. Sensitivity and Specificity are illustrated in the following table Screener At Risk Not At Risk State Test At Risk Not At Risk True Positive (a) False Positive (b) Total Predicted At Risk (a+b) False Negative (c) True Negative (d) Total Predicted Not At Risk (c+d) Total True At Risk (a+c) Total True Not At Risk (b+d) Total Students (a+b+c+d) Sensitivity is True Positive (a) divided by Total True At Risk (a+c) Specificity is True Negative (d) divided by Total True Not at Risk (b+d) Let s look at a specific example. The following table is for Kentucky Grade 3 Reading. The Screener is the DEA Fall Reading Benchmark and the State Test is the KCCT for Spring 2008. Screener DEA Fall Benchmark State Test KCCT 2008 At Risk Not At Risk At Risk 117 126 243 Not At Risk 18 407 425 125 533 668 Sensitivity is True Positive (a) divided by Total True At Risk (a+c) Sensitivity = 117/125 =.87 or 87% Specificity is True Negative (d) divided by Total True Not at Risk (b+d) Specificity is 407/533 =.76 or 76%

14 A screener tool has to establish a cut score that separates At Risk from Not At Risk. This cut score is used to predict performance on the state test. So the cut score has to be set in advance. A particular cut score is associated with levels of Sensitivity and Specificity. A different cut score would have different levels. Good cut scores strive to balance high levels of Sensitivity and Specificity. Table 11 shows the relationship between sensitivity and specificity for this same Kentucky Grade 3 Reading test. The test had 36 questions. Selecting a particular value of the number correct has an associated level of sensitivity and specificity. If the cut score of 16 was used, this table is indicating that the Sensitivity would be.72 or 72% and the Specificity would be.85 or 85%. Thus, there would be higher accuracy in classifying Not At Risk students than At Risk students. Good screening tools strive to balance Sensitivity and Specificity at high levels. For this particular test, a cut score of 18 was used to differentiate At Risk from Not At Risk. This cut score has a Sensitivity index of.87 or 87% and a Specificity index of.76 or 76% (the same values that were obtained from the analysis in the table above). The relationship between Sensitivity and Specificity can be graphed in an analysis called a Receiver Operating Characteristic. For this graph, values of Sensitivity are plotted on the y-axis and values of 1 minus Specificity are plotted on the x-axis. Figure 1 shows this ROC curve for the Kentucky Grade 3 Reading test. Classification accuracy was measured by a statistic known as the Area Under the Curve (AUC). AUC values had to be.85 or greater to receive a rating of Convincing Evidence. (Area Under the Curve (AUC) Statistic: an overall indication of the diagnostic accuracy of a Receiver Operating Characteristic (ROC) curve. ROC curves are a generalization of the set of potential combinations of sensitivity and specificity possible for predictors. AUC values closer to 1 indicate the screening measure reliably distinguishes among students with satisfactory and unsatisfactory reading performance, whereas values at.50 indicate the predictor is no better than chance.) Tables 12 to 14 present Sensitivity and AUC values for Reading Screeners for all students and for the African-American and Hispanic student subgroups. Tables 15 to 17 present Sensitivity and AUC values for Mathematics Screeners for all students and for the African-American and Hispanic student subgroups. The AUC values are all in the.80 and above range with most at.85 and above. DEA received a rating of Convincing Evidence for classification accuracy.

15 Table 11: Levels of Sensitivity and Specificity by Cut Score Cut Score Sensitivity Specificity 1 - Specificity 2 0.00 1.00 0.00 4 0.00 1.00 0.00 5 0.01 1.00 0.00 6 0.03 0.99 0.01 7 0.08 0.98 0.02 8 0.10 0.98 0.02 9 0.14 0.98 0.02 10 0.24 0.97 0.03 11 0.35 0.95 0.05 12 0.44 0.94 0.06 13 0.50 0.92 0.08 14 0.56 0.90 0.10 15 0.64 0.88 0.12 16 0.72 0.85 0.15 17 0.79 0.81 0.19 18 0.87 0.76 0.24 19 0.93 0.73 0.27 20 0.97 0.68 0.32 21 0.97 0.64 0.36 22 0.98 0.57 0.43 23 0.98 0.52 0.48 24 0.99 0.46 0.54 25 0.99 0.41 0.59 26 0.99 0.33 0.67 27 0.99 0.27 0.73 28 0.99 0.21 0.79 29 0.99 0.17 0.83 30 0.99 0.11 0.89 31 1.00 0.07 0.93 32 1.00 0.04 0.96 33 1.00 0.02 0.98 34 1.00 0.01 0.99 35 1.00 0.00 1.00 36 1.00 0.00 1.00 Figure 1: ROC Curve for Kentucky Grade 3 Reading Test

16 Table 12: Sensitivity and AUC for Reading Screeners for All Students Area Under the Sensitivity Curve (AUC) Reading Test Period Range Median Range Median Kentucky Fall Predictive.80 to.89.84.85 to.90.88 Winter Predictive.78 to.88.84.78 to.88.86 District of Columbia Fall Predictive.79 to.92.90.84 to.86.85 Winter Predictive.75 to.92.88.84 to.87.86 Table 13: Sensitivity and AUC for Reading Screeners for African-American Students Area Under the Sensitivity Curve (AUC) Reading Test Period Range Median Range Median Kentucky Fall Predictive.79 to.93.89.78 to.86.83 Winter Predictive.85 to.91.89.81 to.91.84 District of Columbia Fall Predictive.79 to.93.90.81 to.83.83 Winter Predictive.76 to.92.88.82 to.86.84 Table 14: Sensitivity and AUC for Reading Screeners for Hispanic Students Area Under the Sensitivity Curve (AUC) Reading Test Period Range Median Range Median District of Columbia Fall Predictive.79 to.93.87.78 to.90.84 Winter Predictive.73 to.90.87.81 to.91.83

17 Table 15: Sensitivity and AUC for Mathematics Screeners for All Students Area Under the Sensitivity Curve (AUC) Mathematics Test Period Range Median Range Median Kentucky Fall Predictive.81 to.90.85.86 to.91.90 Winter Predictive.83 to.94.88.87 to.93.89 District of Columbia Fall Predictive.84 to.94.91.79 to.87.85 Winter Predictive.89 to.92.91.86 to.90.87 Table 16: Sensitivity and AUC for Mathematics Screeners for African-American Students Area Under the Sensitivity Curve (AUC) Mathematics Test Period Range Median Range Median Kentucky Fall Predictive.81 to.93.89.81 to.90.83 Winter Predictive.78 to 1.00.89.83 to 1.00.86 District of Columbia Fall Predictive.84 to.93.91.81 to.84.82 Winter Predictive.89 to.93.91.84 to.88.85 Table 17: Sensitivity and AUC for Mathematics Screeners for Hispanic Students Area Under the Sensitivity Curve (AUC) Mathematics Test Period Range Median Range Median District of Columbia Fall Predictive.83 to.92.86.75 to.88.85 Winter Predictive.86 to.93.89.82 to.91.89

18 Appendices Reliability Tables Table 17: Table 18: Table 19: Table 20: Cronbach s Alpha for Kentucky Reading Tests Cronbach s Alpha for Kentucky Mathematics Tests Cronbach s Alpha for District of Columbia Reading Tests Cronbach s Alpha for District of Columbia Mathematics Tests Reading Validity Tables Table 21: Predictive Validity for Kentucky Reading Tests Table 22: Concurrent Validity for Kentucky Reading Tests Table 23: Predictive Validity for District of Columbia 2008 Reading Tests Table 24: Concurrent Validity for District of Columbia 2008 Reading Tests Table 25: Predictive Validity for District of Columbia 2009 Reading Tests Table 26: Concurrent Validity for District of Columbia 2009 Reading Tests Math Validity Tables Table 27: Table 28: Table 29: Table 30: Table 31: Table 32: Predictive Validity for Kentucky Mathematics Tests Concurrent Validity for Kentucky Mathematics Tests Predictive Validity for District of Columbia 2008 Mathematics Tests Concurrent Validity for District of Columbia 2008 Mathematics Tests Predictive Validity for District of Columbia 2009 Mathematics Tests Concurrent Validity for District of Columbia 2009 Mathematics Tests Reading Classification Tables Table 33: Classification Tables for Kentucky Reading Tests Table 34: Classification Tables for District of Columbia Reading Tests Math Classification Tables Table 35: Classification Tables for Kentucky Mathematics Tests Table 36: Classification Tables for District of Columbia Mathematics Tests

19 Reliability Table 17: Kentucky Cronbach's Alpha Reliability Coefficients for Reading 0708 Fall Winter Spring Grade N Coefficient N Coefficient N Coefficient 3 10,405 0.85 9,477 0.84 6,878 0.85 4 10,895 0.73 9,222 0.85 6,593 0.86 5 10,695 0.84 9,097 0.82 6,416 0.84 6 11,246 0.82 9,500 0.84 7,727 0.86 7 10,914 0.84 9,977 0.86 8,560 0.86 8 10,712 0.85 9,543 0.84 8,723 0.84 10 5,040 0.85 3,975 0.85 3,313 0.84 Median 0.84 0.84 0.85 Table 18: Kentucky Cronbach's Alpha Reliability Coefficients for Math 0708 Fall Winter Spring Grade N Coefficient N Coefficient N Coefficient 3 10,624 0.76 9,716 0.84 6,775 0.86 4 10,918 0.87 9,161 0.84 6,505 0.84 5 10,840 0.86 9,136 0.79 6,486 0.84 6 11,284 0.83 9,530 0.83 7,472 0.81 7 10,694 0.82 9,970 0.79 8,537 0.86 8 10,750 0.86 9,871 0.81 8,576 0.86 10 4,579 0.81 3,444 0.84 2,796 0.82 Median 0.83 0.83 0.84

20 Table 19: District of Columbia Cronbach's Alpha Reliability Coefficiencts for Reading 0708 Fall Winter Spring Grade N Coefficient N Coefficient N Coefficient 3 3,010 0.87 3,225 0.89 3,956 0.87 4 3,057 0.86 3,200 0.87 3,929 0.85 5 2,970 0.86 2,953 0.81 3,634 0.86 6 2,672 0.84 2,811 0.78 3,441 0.84 7 2,317 0.84 2,339 0.77 3,102 0.87 8 2,490 0.82 2,510 0.84 3,621 0.87 10 2,493 0.86 2,355 0.82 2,717 0.84 Median 0.86 0.82 0.86 Table 20: District of Columbia Cronbach's Alpha Reliability Coefficiencts for Math 0708 Fall Winter Spring Grade N Coefficient N Coefficient N Coefficient 3 2,967 0.87 3,193 0.85 3,960 0.90 4 3,004 0.84 3,173 0.84 3,923 0.89 5 2,915 0.85 2,919 0.83 3,646 0.87 6 2,630 0.82 2,759 0.78 3,442 0.81 7 2,277 0.80 2,287 0.75 3,127 0.84 8 2,496 0.79 2,475 0.77 3,648 0.83 10 1,993 0.82 2,319 0.81 2,703 0.85 Median 0.82 0.81 0.85

21 Reading Validity Table 21: KY DEA Fall & Winter Assessment & 2008 KCCT Reading Results Predictive Validity - Full Sample Fall Winter Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed) 3 952 0.69 p <.01 1314 0.69 p <.01 4 1060 0.74 p <.01 1398 0.70 p <.01 5 1099 0.71 p <.01 1407 0.67 p <.01 6 1668 0.69 p <.01 1809 0.71 p <.01 7 1411 0.71 p <.01 1615 0.67 p <.01 8 1440 0.71 p <.01 1830 0.68 p <.01 10 354 0.76 p <.01 352 0.64 p <.01 Median 0.71 0.68 Disagg for Ethnicity: Black Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed) 3 111 0.66 p <.01 122 0.69 p <.01 4 157 0.67 p <.01 176 0.73 p <.01 5 171 0.68 p <.01 183 0.75 p <.01 6 398 0.60 p <.01 418 0.64 p <.01 7 390 0.68 p <.01 406 0.60 p <.01 8 353 0.61 p <.01 389 0.63 p <.01 Median 0.67 0.67

22 Table 22: KY DEA Spring Assessment & 2008 KCCT Reading Results Concurrent Validity - Full Sample Spring Grade N Coefficient Sig. (2-tailed) 3 1101 0.70 p <.01 4 1337 0.67 p <.01 5 1301 0.68 p <.01 6 1942 0.73 p <.01 7 1670 0.71 p <.01 8 1746 0.70 p <.01 10 309 0.64 p <.01 Median 0.7 Disagg for Ethnicity: Black Grade N Coefficient Sig. (2-tailed) 3 70 0.66 p <.01 4 151 0.48 p <.01 5 129 0.67 p <.01 6 393 0.70 p <.01 7 369 0.70 p <.01 8 360 0.68 p <.01 Median 0.68

23 Table 23: DC DEA Fall & Winter Assessment & 2008 DC-CAS Reading Results Predictive Validity - Full Sample Fall Winter Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed) 3 3333 0.65 p <.01 3388 0.65 p <.01 4 3314 0.65 p <.01 3353 0.69 p <.01 5 3079 0.69 p <.01 3124 0.66 p <.01 6 2643 0.67 p <.01 2730 0.65 p <.01 7 2180 0.66 p <.01 2253 0.67 p <.01 8 2474 0.67 p <.01 2549 0.66 p <.01 10 1936 0.67 p <.01 1829 0.71 p <.01 Median 0.67 0.66 Disagg for Ethnicity: Black Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed) 3 2597 0.61 p <.01 2651 0.61 p <.01 4 2647 0.57 p <.01 2676 0.62 p <.01 5 2507 0.64 p <.01 2544 0.62 p <.01 6 2150 0.62 p <.01 2227 0.61 p <.01 7 1828 0.60 p <.01 1889 0.62 p <.01 8 2100 0.61 p <.01 2155 0.63 p <.01 10 1560 0.65 p <.01 1485 0.70 p <.01 Median 0.61 0.62 Disagg for Ethnicity: Hispanic Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed) 3 383 0.64 p <.01 387 0.62 p <.01 4 341 0.59 p <.01 346 0.66 p <.01 5 329 0.74 p <.01 336 0.65 p <.01 6 297 0.69 p <.01 301 0.69 p <.01 7 231 0.65 p <.01 237 0.69 p <.01 8 235 0.70 p <.01 247 0.60 p <.01 10 218 0.59 p <.01 212 0.62 p <.01 Median 0.65 0.65

24 Table 24: DC DEA Spring Assessment & 2008 DC-CAS Reading Results Concurrent Validity - Full Sample Spring Grade N Coefficient Sig. (2-tailed) 3 3372 0.68 p <.01 4 3372 0.69 p <.01 5 3141 0.65 p <.01 6 2796 0.69 p <.01 7 2249 0.70 p <.01 8 2594 0.66 p <.01 10 1924 0.69 p <.01 Median 0.69 Disagg for Ethnicity: Black DC Test C Reading Grade N Coefficient Sig. (2-tailed) 3 2627 0.65 p <.01 4 2693 0.63 p <.01 5 2558 0.61 p <.01 6 2297 0.65 p <.01 7 1878 0.64 p <.01 8 2200 0.63 p <.01 10 1598 0.69 p <.01 Median 0.64 Disagg for Ethnicity: Hispanic DC Test C Reading Grade N Coefficient Sig. (2-tailed) 3 392 0.60 p <.01 4 349 0.62 p <.01 5 343 0.65 p <.01 6 299 0.70 p <.01 7 242 0.72 p <.01 8 248 0.60 p <.01 10 208 0.63 p <.01 Median 0.63

25 Table 25: DC DEA Fall & Winter Assessment & 2009 DC-CAS Reading Results Predictive Validity - Full Sample Fall Winter Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed) 3 3334 0.66 p <.01 3388 0.68 p <.01 4 3002 0.67 p <.01 3020 0.71 p <.01 5 2907 0.71 p <.01 2970 0.71 p <.01 6 2106 0.73 p <.01 2183 0.76 p <.01 7 2028 0.70 p <.01 2039 0.76 p <.01 8 1976 0.69 p <.01 2006 0.78 p <.01 10 1756 0.72 p <.01 1708 0.74 p <.01 Median 0.70 0.74 Table 26: DC DEA Spring Assessment & 2009 DC-CAS Results Concurrent Validity - Full Sample - DC Test C Reading Spring Grade N Coefficient Sig. (2-tailed) 3 3427 0.72 p <.01 4 3077 0.69 p <.01 5 3024 0.75 p <.01 6 2224 0.77 p <.01 7 2124 0.76 p <.01 8 2068 0.77 p <.01 10 1882 0.78 p <.01 0.76

26 Mathematics Validity Table 27: KY DEA Fall & Winter Assessment & 2008 KCCT Mathematics Results Predictive Validity - Full Sample Fall Winter Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed) 3 1098 0.71 p <.01 1434 0.76 p <.01 4 1075 0.79 p <.01 1409 0.76 p <.01 5 1116 0.76 p <.01 1386 0.74 p <.01 6 1743 0.73 p <.01 1809 0.75 p <.01 7 1479 0.76 p <.01 1649 0.72 p <.01 8 1489 0.81 p <.01 1800 0.77 p <.01 11 162 0.79 p <.01 319 0.82 p <.01 Median 0.76 0.76 Disagg for Ethnicity: Black Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed) 3 184 0.68 p <.01 220 0.76 p <.01 4 204 0.67 p <.01 220 0.75 p <.01 5 201 0.64 p <.01 218 0.66 p <.01 6 477 0.69 p <.01 421 0.72 p <.01 7 446 0.70 p <.01 405 0.66 p <.01 8 434 0.69 p <.01 447 0.69 p <.01 11 45 0.69 p <.01 44 0.77 p <.01 Median 0.69 0.72

27 Table 28: KY DEA Spring Assessment & 2008 KCCT Mathematics Results Concurrent Validity - Full Sample Spring Grade N Coefficient Sig. (2-tailed) 3 1356 0.74 p <.01 4 1308 0.78 p <.01 5 1311 0.79 p <.01 6 1992 0.76 p <.01 7 1753 0.73 p <.01 8 1820 0.78 p <.01 11 289 0.73 p <.01 Median 0.76 Disagg for Ethnicity: Black KY Test B Math Grade N Coefficient Sig. (2-tailed) 3 202 0.75 p <.01 4 178 0.76 p <.01 5 170 0.74 p <.01 6 470 0.72 p <.01 7 438 0.63 p <.01 8 448 0.71 p <.01 11 46 0.72 p <.01 Median 0.72

28 Table 29: DC DEA Fall & Winter Assessment & 2008 DC-CAS Mathematics Results Predictive Validity - Full Sample Fall Winter Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed) 3 3284 0.67 p <.01 3364 0.71 p <.01 4 3284 0.69 p <.01 3328 0.71 p <.01 5 3040 0.70 p <.01 3089 0.75 p <.01 6 2644 0.67 p <.01 2728 0.69 p <.01 7 2165 0.68 p <.01 2238 0.70 p <.01 8 2462 0.63 p <.01 2543 0.67 p <.01 10 1882 0.56 p <.01 1863 0.73 p <.01 Median 0.67 0.71 Disagg for Ethnicity: Black Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed) 3 2547 0.61 p <.01 2629 0.67 p <.01 4 2576 0.62 p <.01 2648 0.64 p <.01 5 2466 0.64 p <.01 2510 0.71 p <.01 6 2138 0.61 p <.01 2227 0.63 p <.01 7 1800 0.59 p <.01 1863 0.61 p <.01 8 2067 0.55 p <.01 2131 0.61 p <.01 10 1508 0.52 p <.01 1504 0.69 p <.01 Median 0.61 0.64 Disagg for Ethnicity: Hispanic Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed) 3 382 0.64 p <.01 389 0.66 p <.01 4 342 0.65 p <.01 348 0.68 p <.01 5 330 0.70 p <.01 334 0.78 p <.01 6 305 0.64 p <.01 301 0.72 p <.01 7 242 0.70 p <.01 244 0.74 p <.01 8 257 0.64 p <.01 268 0.68 p <.01 10 215 0.47 p <.01 212 0.68 p <.01 Median 0.64 0.68

29 Table 30: DC DEA Spring Assessment & 2008 DC-CAS Mathematics Results Concurrent Validity - Full Sample Spring Grade N Coefficient Sig. (2-tailed) 3 3375 0.77 p <.01 4 3369 0.77 p <.01 5 3159 0.74 p <.01 6 2804 0.68 p <.01 7 2279 0.74 p <.01 8 2612 0.65 p <.01 10 1963 0.72 p <.01 Median 0.74 Disagg for Ethnicity: Black Grade N Coefficient Sig. (2-tailed) 3 2625 0.73 p <.01 4 2693 0.74 p <.01 5 2575 0.70 p <.01 6 2296 0.61 p <.01 7 1900 0.67 p <.01 8 2197 0.58 p <.01 10 1596 0.69 p <.01 Median 0.69 Disagg for Ethnicity: Hispanic Grade N Coefficient Sig. (2-tailed) 3 397 0.76 p <.01 4 343 0.72 p <.01 5 341 0.74 p <.01 6 307 0.71 p <.01 7 246 0.74 p <.01 8 270 0.66 p <.01 10 218 0.65 p <.01 Median 0.72

30 Table 31: DC DEA Fall & Winter Assessment & 2009 DC-CAS Mathematics Results Predictive Validity - Full Sample Fall Winter Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed) 3 3339 0.66 p <.01 3368 0.72 p <.01 4 2995 0.71 p <.01 3028 0.72 p <.01 5 2903 0.74 p <.01 2938 0.80 p <.01 6 2129 0.79 p <.01 2141 0.78 p <.01 7 1985 0.66 p <.01 2056 0.78 p <.01 8 1991 0.65 p <.01 2035 0.73 p <.01 10 1730 0.75 p <.01 1735 0.76 p <.01 Median 0.71 0.76 Disagg for Ethnicity: Black Grade N Coefficient Sig. (2-tailed) N Coefficient Sig. (2-tailed) 3 2485 0.63 p <.01 2507 0.69 p <.01 4 2281 0.66 p <.01 2314 0.67 p <.01 5 2280 0.69 p <.01 2298 0.76 p <.01 6 1672 0.74 p <.01 1683 0.74 p <.01 7 1595 0.62 p <.01 1661 0.72 p <.01 8 1619 0.54 p <.01 1659 0.65 p <.01 10 1175 0.74 p <.01 1359 0.73 p <.01 Median 0.66 0.72

31 Table 32: DC DEA Spring Assessment and 2009 DC-CAS Mathematics Results Concurrent Validity - Full Sample Spring Grade N Coefficient Sig. (2-tailed) 3 3452 0.78 p <.01 4 3084 0.78 p <.01 5 3026 0.79 p <.01 6 2211 0.78 p <.01 7 2124 0.79 p <.01 8 2082 0.79 p <.01 10 1792 0.81 p <.01 0.79 Disagg for Ethnicity: Black Grade N Coefficient Sig. (2-tailed) 3 2581 0.76 p <.01 4 2357 0.75 p <.01 5 2377 0.76 p <.01 6 1740 0.74 p <.01 7 1721 0.73 p <.01 8 1703 0.70 p <.01 10 1420 0.78 p <.01 0.75

32 Reading Classification Tables Table 33: Kentucky Reading Sensitivity, Specificity & Area Under the Curve Fall Winter Sensitivity Specificity AUC Sensitivity Specificity AUC Grade 3 0.87 0.76 0.88 0.88 0.74 0.88 Grade 4 0.89 0.78 0.90 0.84 0.72 0.86 Grade 5 0.84 0.78 0.89 0.84 0.72 0.87 Grade 6 0.82 0.76 0.85 0.82 0.77 0.87 Grade 7 0.85 0.76 0.87 0.82 0.72 0.85 Grade 8 0.80 0.80 0.87 0.84 0.71 0.85 Grade 10 0.83 0.79 0.88 0.78 0.68 0.78 Median 0.84 0.78 0.88 0.84 0.72 0.86 Dissag for Ethnicity: Black Sensitivity Specificity AUC Sensitivity Specificity AUC Grade 3 0.89 0.58 0.82 0.91 0.59 0.84 Grade 4 0.89 0.68 0.83 0.88 0.59 0.83 Grade 5 0.93 0.63 0.88 0.91 0.62 0.91 Grade 6 0.82 0.63 0.78 0.89 0.62 0.86 Grade 7 0.89 0.70 0.86 0.85 0.60 0.82 Grade 8 0.79 0.78 0.83 0.86 0.61 0.81 Median 0.89 0.66 0.83 0.89 0.61 0.84

33 Table 34: DC Reading Sensitivity, Specificity & Area Under the Curve Fall Winter Sensitivity Specificity AUC Sensitivity Specificity AUC Grade 3 0.90 0.64 0.86 0.87 0.69 0.85 Grade 4 0.88 0.63 0.84 0.89 0.67 0.86 Grade 5 0.90 0.63 0.86 0.86 0.68 0.85 Grade 6 0.79 0.75 0.85 0.75 0.74 0.84 Grade 7 0.91 0.62 0.86 0.88 0.65 0.86 Grade 8 0.92 0.58 0.85 0.92 0.59 0.87 Grade 10 0.89 0.64 0.85 0.92 0.66 0.87 Median 0.90 0.63 0.85 0.88 0.67 0.86 Dissag for Ethnicity: Black Sensitivity Specificity AUC Sensitivity Specificity AUC Grade 3 0.90 0.58 0.83 0.87 0.64 0.84 Grade 4 0.90 0.55 0.81 0.89 0.59 0.83 Grade 5 0.90 0.57 0.83 0.87 0.64 0.83 Grade 6 0.79 0.72 0.83 0.76 0.71 0.82 Grade 7 0.92 0.57 0.85 0.88 0.59 0.84 Grade 8 0.93 0.51 0.83 0.92 0.52 0.84 Grade 10 0.89 0.61 0.83 0.92 0.63 0.86 Median 0.90 0.57 0.83 0.88 0.63 0.84 Dissag for Ethnicity: Hispanic Sensitivity Specificity AUC Sensitivity Specificity AUC Grade 3 0.93 0.56 0.84 0.90 0.58 0.82 Grade 4 0.79 0.58 0.78 0.85 0.60 0.82 Grade 5 0.94 0.64 0.90 0.87 0.64 0.87 Grade 6 0.82 0.61 0.84 0.73 0.69 0.81 Grade 7 0.87 0.65 0.85 0.79 0.74 0.88 Grade 8 0.92 0.66 0.88 0.87 0.71 0.91 Grade 10 0.85 0.58 0.82 0.87 0.69 0.83 Median 0.87 0.61 0.84 0.87 0.69 0.83

34 Mathematics Classification Tables Table 35: Kentucky Math Sensitivity, Specificity & Area Under the Curve Fall Winter Sensitivity Specificity AUC Sensitivity Specificity AUC Grade 3 0.83 0.76 0.86 0.83 0.77 0.89 Grade 4 0.89 0.80 0.91 0.83 0.77 0.89 Grade 5 0.86 0.84 0.91 0.83 0.76 0.87 Grade 6 0.81 0.80 0.86 0.88 0.72 0.88 Grade 7 0.90 0.67 0.87 0.89 0.66 0.87 Grade 8 0.84 0.83 0.91 0.93 0.71 0.89 Grade 10 0.85 0.80 0.90 0.94 0.80 0.93 Median 0.85 0.80 0.90 0.88 0.76 0.89 Dissag for Ethnicity: Black Sensitivity Specificity AUC Sensitivity Specificity AUC Grade 3 0.93 0.56 0.82 0.91 0.60 0.86 Grade 4 0.89 0.64 0.83 0.89 0.63 0.86 Grade 5 0.86 0.68 0.86 0.78 0.74 0.87 Grade 6 0.81 0.70 0.81 0.89 0.62 0.86 Grade 7 0.89 0.56 0.83 0.88 0.62 0.84 Grade 8 0.84 0.70 0.83 0.92 0.53 0.83 Grade 11 0.92 0.50 0.90 1.00 0.75 1.00 Median 0.89 0.64 0.83 0.89 0.62 0.86

35 Table 36: DC Math Sensitivity, Specificity & Area Under the Curve Fall Winter Sensitivity Specificity AUC Sensitivity Specificity AUC Grade 3 0.91 0.57 0.84 0.92 0.60 0.86 Grade 4 0.90 0.63 0.85 0.91 0.66 0.87 Grade 5 0.94 0.61 0.87 0.93 0.66 0.90 Grade 6 0.93 0.59 0.86 0.89 0.65 0.87 Grade 7 0.91 0.61 0.84 0.91 0.65 0.87 Grade 8 0.88 0.64 0.85 0.91 0.66 0.88 Grade 10 0.84 0.61 0.79 0.90 0.66 0.86 Median 0.91 0.61 0.85 0.91 0.66 0.87 Dissag for Ethnicity: Black Sensitivity Specificity AUC Sensitivity Specificity AUC Grade 3 0.93 0.46 0.82 0.92 0.52 0.85 Grade 4 0.91 0.58 0.83 0.91 0.61 0.85 Grade 5 0.94 0.52 0.84 0.93 0.59 0.88 Grade 6 0.93 0.51 0.83 0.89 0.60 0.84 Grade 7 0.91 0.52 0.81 0.90 0.56 0.84 Grade 8 0.89 0.56 0.82 0.91 0.62 0.86 Grade 10 0.84 0.58 0.78 0.92 0.64 0.86 Median 0.91 0.52 0.82 0.91 0.60 0.85 Dissag for Ethnicity: Hispanic Sensitivity Specificity AUC Sensitivity Specificity AUC Grade 3 0.86 0.50 0.77 0.88 0.54 0.81 Grade 4 0.86 0.55 0.83 0.91 0.62 0.85 Grade 5 0.92 0.54 0.85 0.93 0.61 0.91 Grade 6 0.91 0.60 0.87 0.89 0.70 0.91 Grade 7 0.83 0.69 0.86 0.86 0.74 0.89 Grade 8 0.89 0.65 0.88 0.89 0.66 0.91 Grade 10 0.83 0.53 0.75 0.92 0.54 0.82 Median 0.86 0.55 0.85 0.89 0.62 0.89