A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and Planning
Overview Motivation for Analyses Analyses and Results Descriptive Logistic Regression Decision Tree Analysis 1 st Year Retention 6 year Graduation Rate Impact of 2year Degree on Performance at CMU Conclusion 2
Motivation CMU typically enrolls between 1400 and 1500 transfer students each year Constitutes nearly 25% of all new students In the past used all new transfers as cohort ignoring number of transfer credits Given the limited resources of the Office of Student Success, can we identify specific groups for outreach (intervention)? 3
Entry Credentials by Class 60% 50% 3-year (2011-2013) mean %count by Class of Entry 80 Mean Transfer hours by Class of Entry (2006-2014) 40% 30% 20% 10% 0% 21.2% 50.3% Freshman(<26hrs) Sophomore (27-56hrs) 25.9% Junior(57-86hrs) 60 40 20 0 17.1 41.1 66.0 Freshman Sophomore Junior 4.00 Mean Transfer GPA 3.50 3.00 2.50 2.90 3.04 3.24 2.00 Fresh Soph Junior 4
First Year Performance 4.00 3.50 3.00 2.50 2.00 First Term GPA by Entry Level 3.02 2.76 2.46 Fresh Soph Junior 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% % Persisting into 2nd Year by Class of Entry 70.5% 77.5% 84.3% Fresh Soph Junior 5
LOESS of Persistence by Transfer Hours 0 25 50 75 100 125 6
Graduation by Level of Entry Last 3-year mean % Graduating in Year by Class of Entry 90% 80% 70% 60% 50% 40% Freshman Sophomore Junior 30% 20% 10% 0% 1st Year 2nd year 3rd Year 4th Year 5th Year 6th Year 7
Impact of 2 Year Degree At CC Next we looked at the impact of obtaining a 2 year degree on persistence, graduation, and GPA 8
Impact of 2 year Degree 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 85.7% 74.6% 67.2% 43.8% 75.8% 58.8% Pers2 Grad4 Grad6 Degree No Degree 9
Graduation in 4yrs No Associate Degree Associate Degree Graduated within 4 Years No Yes 1006 (56.2%) 190 (32.8%) 784 (43.8%) 389 (67.2%) χ 2 = 95.714, p <.001 However, persistence and graduation are confounded with number of transfer hours. 10
Logistic Regression- Graduation in 4yrs B S.E. df Sig. Exp(B) TRANHRS 0.028 0.003 1 0.000 1.029 Prior_2yr_Degree 0.28 0.122 1 0.021 1.323 Constant -1.197 0.106 1 0.000 0.302 a. Variable(s) entered on step 1: TRANHRS, Prior_2yr_Degree. 11
Results The impact of a 2 year degree on persistence to second year and graduation in 4 or 6 years was significant beyond the impact of number of transfer hours 12
Best Way to Classify Students Who are at Risk? Performance varies by transfer hours but what about other factors? Could use logistic regression to determine which factors are predictive However, this is not useful in determining which groups of students are at risk 13
Factors That Likely Impact Persistence Variables Full or part time student Transfer GPA Total transfer hours Transfer College Year Low Income Status First Generation Status 14
Logistic - Stepwise Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 1 a TRANGPA.796.054 217.894 1.000 2.216 Constant -1.123.161 48.490 1.000.325 Step 2 b TRANGPA.710.055 166.387 1.000 2.034 TRANHRS.011.001 57.435 1.000 1.011 Constant -1.294.163 62.740 1.000.274 Step 3 c FULLPART.739.128 33.566 1.000 2.095 TRANGPA.702.055 161.845 1.000 2.017 TRANHRS.012.001 68.157 1.000 1.012 Constant -2.020.207 94.866 1.000.133 Step 4 d LOWINC -.324.067 23.659 1.000.723 FULLPART.733.128 32.890 1.000 2.081 TRANGPA.700.055 160.898 1.000 2.014 TRANHRS.013.001 75.412 1.000 1.013 Constant -1.968.208 89.593 1.000.140 a. Variable(s) entered on step 1: TRANGPA. b. Variable(s) entered on step 2: TRANHRS. c. Variable(s) entered on step 3: FULLPART. d. Variable(s) entered on step 4: LOWINC. 15
Decision Trees for Outreach 16
Decision Tree Models Several types of decision tree models Here we chose the Chi-square Automatic Interaction Detection (CHAID) Model over Classification and Regression Trees (CRT) With CRT, GPA might split several times to be refined enough to be predictive 17
Decision Tree Models These models can be used simply to classify a set of data (e.g. what is the best way to classify our transfer students in terms of retention factors) Or can be used for prediction (e.g. can we flag new transfer students who are at risk (or not at risk) for persistence?) 18
CHAID The procedure creates tree-based models that determine how variables best combine to explain the outcome in a given dependent variable Dependent variable is binary response Retained vs Not Predictor variables are any combination of variable types (continuous or categorical) 19
Method Start by selecting a subset of data for training Use model to predict a new set of data Here we chose a subset of 70% of the data and fit to the remaining 30% Check misclassification rate and standard error for predictability 20
CHAID Analysis For First Year Retention Start by classifying to determine if even possible If yes, build prediction model Input variables Full or part time student Transfer GPA Total transfer hours Transfer College Year Low Income Status First Generation Status All were Selected by Model 21
Persist to 2 nd Year <=2.410 (2.41-3.06] (3.06-3.35] (3.35-3.73] >3.73 22
23
24
Which Nodes are Important? Gains for Nodes Training Node Index 5 115.1% 25 114.5% 30 114.2% 28 110.8% 31 104.6% 23 103.9% 26 103.2% 22 102.5% 16 101.0% 27 98.6% 8 98.0% 29 95.5% 32 93.3% 17 89.1% 12 88.4% 24 86.7% 20 85.2% 21 85.2% 9 80.4% 18 72.5% 6 70.0% Node 5: Transfer GPA >3.73 90.1% Persist to 2 nd year Node 6: Transfer GPA <= 2.41 Transfer Hrs <=23, 54.8% Persist to 2 nd year 25
Model Evaluation Classification Predicted Percent Sample Not Persist2 Persist2 Correct Training Not Persist2 0 1242 0.0% Persist2 0 4469 100.0% Overall Percentage 0.0% 100.0% 78.3% Test Not Persist2 0 563 0.0% Persist2 0 1917 100.0% Overall Percentage 0.0% 100.0% 77.3% Risk Sample Estimate Std. Error Training.217.005 Test.227.008 Growing Method: CHAID Dependent Variable: Persisted to 2nd Year 26
Summary of Persistence Decision Tree Transfer GPA is most predictive Other predictive factors vary by transfer GPA For clarity and ease of understanding for those who use the information, we created the following tables: 27
Retention at Risk Table Transfer GPA 4.0 3.9 3.8 3.7 3.6 3.5 3.4 3.3 3.2 3.1 3.0 2.9 2.8 2.7 2.6 2.5 2.4 2.3 2.2 2.1 2.0 28 Transfer Credits >60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 6.2% 10.5% Low income in this range at higher risk. 3.3% 17.4% 18.3% Low income in this range at higher risk. 4.5% 1.5% 5.0% First Generation in this range at higher risk. 3.5%
Decision Tree Analysis For 6 Year Input variables Graduation Rate Full or part time student Transfer GPA Total transfer hours Transfer College Year Low Income Status, First Generation Status All variables except full vs part time were selected by model 29
6 Year Graduation <=2.31 2.30-2.69 2.69-2.99 2.99-3.29 <=3.29 30
Summary of 6 Year Graduation Decision Tree For low transfer GPA (<=2.3), 1 st generation status at greater risk For second lowest GPA (2.3-2.69], transfer credit hours and 2 vs 4 year important For middle GPA (2.69-3.29], 2 vs 4 year and transfer credit hours important For highest GPA (>3.29) transfer hours, 2 vs 4 year, and low income important 31
Transfer Credits >60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 Grad at Risk Table Transfer GPA 4.0 3.9 3.8 3.7 3.6 3.5 3.4 3.3 3.2 3.1 3.0 2.9 2.8 2.7 2.6 2.5 2.4 2.3 2.2 2.1 2.0 15.0% CC in this range at higher risk. 7.9% CC transfer in this range at higher risk. 10.0% First Generation in this range at higher risk. 32
Summary The more refined assessment of transfer students revealed some interesting findings Using decision tree analyses on transfer data, we are able to identify very specific groups at risk (and groups likely to succeed) We have presented these results to the retention subcommittee, strategic enrollment group, and to the vice president of enrollment services Questions? 33