Dropout Prediction Model. Executive Summary

Charlotte-Mecklenburg Schools Research Brief 2008, Charlotte-Mecklenburg Schools Report from the August 2008 2007-08 Dropout Prediction Model Executive Summary Student decisions to drop out have future implications not only for themselves but for society as a whole. Dropouts often face increased risks of unemployment, criminal activity, and health problems. Associated with these increased risks are well-documented public economic costs. Thus, preventing student dropout is important not only for the future of students, but for the strength of society as a whole. To provide assistance to School Social Work staff, the (CRE) constructed a dropout prediction model based on 2007-08 school year data. Dropouts were readily identifiable in data files from this school year, allowing student information from previous school years to be associated with future decisions to dropout. A number of demographic, behavioral, and academic measures were included in a logistic regression model to generate predicted probabilities of dropout for each student. The predicted dropout indicator generated by the model can be compared with the known dropout indicator, yielding a measure of classification accuracy. The model was able to accurately identify 81% of all dropouts while simultaneously identifying 81% of all non-dropouts. The predicated probability of dropout was used to create a Predictor Index, a numeric value used to designate a student s level of risk of dropping out. School rosters were shared with the Office of School Social Work that contained pertinent student information along with the Predictor Index for each student. Counselors will utilize these rosters to identify a smaller group of students they can provide targeted assistance to in hopes of preventing a decision to drop-out. Office of Accountability 1

Office of Accountability 2 Introduction A facet of student achievement often overshadowed by standardized test scores is the student dropout rate. As the United States economy becomes increasingly dependent on a highly skilled and educated workforce, the student dropout phenomenon will have more drastic and farther reaching effects on society (Dorn, 1993). Rumberger (1987) noted that dropouts experience higher levels of unemployment, are more likely to have health problems, engage in criminal activities, become dependent on government programs, and can expect to earn less than high school graduates. A summary table on the National Dropout Prevention Center s (2004) website showed a 31% decrease in the average hourly wage (adjusted for inflation) of dropouts between 1973 and 1997. McDill, Natrielo, and Pallas (1986) estimated the lifetime earnings losses for 516,000 sophomore dropouts in 1980 at approximately $55 billion, after adjusting the estimated lifetime earnings downward by half to adjust for biases related to differences in ability. Costs associated with student dropout are not only felt by the dropouts themselves. Catterall (1987) estimated the total loss in lifetime tax revenue associated with dropouts at around $70 billion for one cohort of 8 th grade students in the U.S. LeCompte and Dworkin (1991) claimed that between 1986 and 1991, New York City spent close to $40 million annually on dropout prevention programs and Rumberger (1995) cited increases in the amount of money spent on dropout prevention, job training, and welfare programs. These programs and forms of assistance impose economic stressors on other members of society at large. In addition, increasing dropout rates do not bode well for the advancement of a highly-educated workforce, thereby reducing the future labor productivity potential. Given the long-term negative effects of dropping out for both the student and the community at large, Charlotte-Mecklenburg Schools (CMS) has instituted a case management model focused on dropout prevention. To maximize the efficiency of this case management model, those students most likely to dropout need to be identified before dropout actually occurs. Some scholars subscribe to the theory that dropout is a culmination in the process of disengagement from school (Newmann, et al., 1992; Wehlage, et al., 1989), related to school, home and societal factors. Despite the research-based evidence identifying factors such as family socioeconomic status (SES) and family structure (single-parent family, etc.), the prediction models generated here are based on information readily available to CMS staff. A review of the model-building activities, as well as the results of model-efficacy tests, will be presented in the Methods and Results sections that follow. Data Methods To test the efficacy of a statistical prediction model, the true outcome to be modeled must be known a priori. Once the model has been generated, predicted outcomes can be compared to true outcomes to determine the prediction accuracy of the model. A data file of all 2007-08 students was constructed at the end of the school year, containing student demographic and achievement information, as well as attendance, retention, mobility, and withdrawal code information. Because students dropout throughout the entire year, academic achievement and attendance measures were identified from prior years for use in predicting dropout during the 2007-08 school year.

Basic student demographic information included gender, race or ethnicity, Free or Reduced Lunch status (FRL), Exceptional Child status (EC), Limited English Proficiency status (LEP) and whether or not the child was gifted. The various categories within FRL, LEP and EC were all re-coded as dichotomous variables to ease parameter interpretation. For example, students with no EC status were coded as a zero, while all students with any EC status (other than gifted) were coded as one. Total absences were calculated from the 2006-07 school year data, along with the number of days served in Out-of-school Suspension and In-school Suspension in 2006-07. Finally, the number of times previously dropped out, the number of times previously retained, and the number of schools enrolled-in (mobility) were tracked back through each student s academic history. For the high school students in the data file, performance on their Algebra I and English I End of Course (EOC) assessments were obtained. In addition, students were tracked longitudinally back through time to identify the last year they were in 8 th grade and had valid EOG scores. In all instances, assessment achievement levels were used in the predictor models to facilitate interpretation of coefficients. Model The dropout prediction model takes the form of the following mathematical formula: Office of Accountability 3 1 p = 1+ exp( α β x β x... β x) 1 2 j In words, this formula states that the probability of dropping out (p) is equal to one divided by one plus the exponential function of a negative intercept (α) minus the effect of a predictor(s) (β 1 3 ) multiplied by the value of that predictor(s) (x) for any given student. The result of this formula is a value for each student between zero and one indicating the likelihood of that student dropping out. This value can then be calculated for future students to target those most likely to drop out of school. These values were calculated for each student in the 2007-08 cohort file, resulting in a predicted value of whether the student dropped out or remained in school. The efficacy of the model was tested by randomly splitting the 2007-08 cohort file into two separate files; one for generating the prediction model and the second for determining how well the prediction model identifies students likely to drop out. Comparing predicted values generated by the prediction model to the actual value known to be true, we can determine the percentage of cases for which the model predicted dropout accurately. Statistically significant parameter estimates were retained for use in identifying future potential dropouts. These estimates were entered into a logistic regression formula and applied to the second data file containing students from the 2007-08 school year. A predicted probability of dropout was generated for each student in the data file, and a subsequent receiver operating

characteristic (ROC) curve was plotted. To maximize the predictive utility of the model, the probability associated with the smallest difference between sensitivity (identifying actual dropouts as dropouts) and specificity (identifying non-dropouts as non-dropouts) measures associated with the ROC curve was identified as the optimal cut-point (Hosmer & Lemeshow, 2000). This cut-point allowed us to classify students into one of two categories: low-risk or high-risk students. The prediction model and established cut-points will then be applied to a data file for incoming 2008-09 students to provide the School Social Work office with a list of hi-risk students. Model Development Results Table 1 below shows the parameter estimates and odds ratios for the statistically significant explanatory variables utilized in the dropout prediction model. Initial models entering all available predictor variables were generated (see Appendix A for saturated model results). Table 1. Parameter estimates and odds ratios associated with dropout probability Parameter Estimate Odds Ratio Constant -5.9642 0.0026 Grade 9 Grade 10 0.3945 1.4837 Grade 11 0.7910 2.2055 Grade 12 0.2915 1.3385 Exceptional Child -0.4290 0.6512 Black -0.4430 0.6421 Overage 0.9256 2.5233 Unexcused Absences 07 0.0425 1.0434 Excused Absences 07 0.0306 1.0311 ISS Days 07 0.0688 1.0713 OSS Days 07 0.0351 1.0357 Mobility 0.1277 1.1362 Prior Dropout 0.8746 2.3980 Algebra I Level 1 Algebra I Level 2 0.2383 1.2691 Algebra I Level 3-0.0147 0.9854 Algebra I Level 4-0.5282 0.5897 English I Level 1 English I Level 2-0.2675 0.7653 English I Level 3-0.4747 0.6221 English I Level 4-0.9021 0.4057 Office of Accountability 4

The estimates presented in Table 1 provide the reader with an indication of each explanatory variable s influence on the probability of dropout. Generally, a negative estimate suggests that an increase in the value of the predictor is associated with a decline in the probability of dropout. Conversely, a positive estimate suggests an increase in the value of the predictor is associated with an increase in the probability of dropout. Odds ratios are easier for interpretation purposes, as they represent the predicted odds for students with a particular characteristic, controlling for all other characteristics in the model. For example, Prior Dropout has an odds ratio of 2.398, meaning the odds of dropout for students that dropped out in prior years are 2.4 times the odds for a student that never dropped out previously. In other words, the odds of dropping out for these students are 140% higher than for students never having previously dropped out. Conversely, the English I Level 4 odds ratio of.4057 suggests that students attaining a level 4 on the English I EOC assessment are less likely to dropout, or the odds are 59% (1 -.4057) lower. We applied the significant coefficients in logistic model form to the data in the second half of the file containing 2007-08 student records. Resulting predicted probabilities of dropouts were compared to the actual 2007-08 dropout indicator variable to generate the ROC curve. An ROC curve depicts the model s ability to correctly classify students into two groups as the discrimination threshold changes. Sensitivity is the model s ability to accurately predict dropouts as dropouts, while specificity is the ability to accurately predict non-dropouts as nondropouts. By plotting the sensitivity and specificity against each other, the intersection of the two curves easily identifies the optimal cut-point. ROC analysis output allowed us to calculate the difference between the two measures in absolute value form, choosing the minimal difference as the optimal cut point. A cut point of.038 accurately identified 81% of dropouts and 81% of non-dropouts as non-dropouts. Office of Accountability 5

1.000 0.900 0.800 0.700 Sensitivity-Specificity 0.600 0.500 0.400 0.300 0.200 0.100 0.000 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Probability Specificity Sensitivity Figure 1. Sensitivity-specificity plot for determining classification cut-point. Office of Accountability 6

Model Application Applying the coefficients (labeled estimates in Table 1 above) to 2008-09 demographic and academic school year values with the formula presented earlier yielded a probability of dropout for each expected 2008-09 student. In previous years, a value labeled the Predictor Index was provided to schools as a numeric way to represent the level of risk for students. To remain consistent, predicted probabilities for students were converted to logits, which perform similar to a z-score with a mean of zero. Figure 2 below displays the association between probability and logits. Note that a student with a probability of.5 has a corresponding logit value of zero. Thus, students more than 50% likely to dropout have a positive logit value. Additionally, a student with a logit of 2.5 is more at-risk of dropout than a student with a logit of 1.5. 1 0.9 0.8 0.7 0.6 Probability 0.5 0.4 0.3 0.2 0.1 0-8 -6-4 -2 0 2 4 6 8 Logit Figure 2. Probability by logit plot for high school dropout prediction. Microsoft Excel rosters were created for each school that contained pertinent information and their associated risk. The School Social Work office will share these rosters with school counselors prior to the start of the 2008-09 school year. Counselors will use these rosters and the information contained within them to focus efforts on those students most in need in the hopes of keeping students engaged in school. Conclusion Office of Accountability 7

The brief review of the literature presented in the introduction outlines the eventual consequences that dropouts face. In addition, student decisions to dropout ultimately impact society in both financial and social ways. Thus, the NCLB legislation includes graduation and dropout rates as measures to be included in the Adequate Yearly Progress (AYP) of schools. In an attempt to prevent students from dropping out, CMS school counselors provide assistance to those students deemed to be at risk of dropping out. The production of a dropout prediction model allows for Prevention & Intervention staff to combine their intuitive knowledge of individual students with statistical prediction based on raw data. The model presented herein was generated based on historical data for CMS high school students, as the greatest rate of incidence occurs at the high school level. A number of indicators, including student demographic variables, behavioral and academic measures were found to be associated with student decisions to dropout. The wide array of variables associated with dropout provides some insight into understanding why students decide to dropout, and also allows for a more accurate overall model. Though the model was also applied to at the elementary and middle school level, the accuracy with which the model predicts dropout at this level is suspect at best. Future research into the reasons and indicators associated with the small population of elementary and middle school dropouts is warranted. Better understanding of the early indicators associated with dropout would allow for intervention at an early age, inhibiting disengagement from school and an eventual decision to drop out. Prepared by Jason Schoeneberger, Director of Research & Evaluation References Catterall, J. (1987). On the social costs of dropping out. High School Journal, 19-30. Dorn, S. (1993). Origins of the Dropout Problem. History of Education Quarterly, 33, 353-373. Hosmer, D. & Lemeshow, S. (2000). Applied Logistic Regression. New York, NY: John Wiley & Sons, Inc. LeCompte, M. & Dworkin, G. (1991). Giving Up On School Student Dropouts and Teacher Burnouts. Newbury Park, CA: Corwin Press. McDill, E., Natriello, G, & Pallas, A. (1986). A population at risk: Potential consequences of tougher school standards for student dropouts. American Journal of Education, 94, 135-181. Newmann, F., Wehlage, G, & Lamborn, S. (1992). The significance and sources of student engagement. In Newmann, F. (Ed.), Student Engagement and Achievement in American Secondary Schools. New York, NY: Teachers College Press. No Child Left Behind Act of 2001, Pub. L. No. 107-110, 2, 115 Stat. 1425 (2002). Office of Accountability 8

Rumberger, R. (1987). High school dropouts: A review of issues and evidence. Review of Educational Research, 57, 101-121. Rumberger, R. (1995). Dropping out of middle school: a multilevel analysis of students and schools. American Educational Research Journal, 32, 3, 583-625. Wehlage, G. & Rutter, R., Smith, G., Lesko, N., & Fernandez, R. (1989). Reducing the Risk: Schools as Communities of Support. New York, NY: Falmer. Office of Accountability 9

Appendix A Appendix A. Saturated model parameter estimates and odds ratios associated with dropout probability Parameter Estimate p-value Odds Ratio Constant -6.0198 0.0000 0.0024 Grade 9 0.0000 Grade 10 0.3789 0.0004 1.4607 Grade 11 0.7654 0.0000 2.1499 Grade 12 0.2678 0.0868 1.3071 FRL 0.1400 0.1444 1.1502 LEP -0.1768 0.2702 0.8380 EC -0.4226 0.0006 0.6553 Gender 0.0122 0.8855 1.0123 Asian -0.2260 0.4029 0.7977 Black -0.4378 0.0005 0.6454 Hispanic 0.1801 0.2924 1.1973 Native American 0.8543 0.0193 2.3496 Multi-racial -0.2033 0.5741 0.8160 Gifted -0.5136 0.0819 0.5983 CIS Served 0.0477 0.8219 1.0488 Overage 0.9301 0.0000 2.5348 Prior Dropout 0.8839 0.0000 2.4202 Grade 8 Math Level 1 0.2440 Grade 8 Math Level 2 0.2464 0.0778 1.2794 Grade 8 Math Level 3 0.1949 0.2337 1.2151 Grade 8 Math Level 4 0.3669 0.1034 1.4432 Grade 8 Read Level 1 0.6975 Grade 8 Read Level 2-0.1764 0.3980 0.8383 Grade 8 Read Level 3-0.0937 0.6792 0.9105 Grade 8 Read Level 4-0.0091 0.9729 0.9909 Algebra I Level 1 0.0027 Algebra I Level 2 0.1720 0.1833 1.1877 Algebra I Level 3-0.1017 0.5379 0.9033 Algebra I Level 4-0.5760 0.0175 0.5622 English I Level 1 0.0008 English I Level 2-0.3011 0.0354 0.7400 English I Level 3-0.5465 0.0016 0.5790 English I Level 4-0.9896 0.0001 0.3717 Unexcused Absences 0.0423 0.0000 1.0432 Excused Absences 0.0288 0.0015 1.0292 ISS Days 0.0652 0.0412 1.0674 OSS Days 0.0343 0.0000 1.0349 Mobility 0.1254 0.0000 1.1336 Office of Accountability 10