Evaluating Dropout Prevention and Recovery Models

Similar documents
Asian Development Bank - International Initiative for Impact Evaluation. Video Lecture Series

Section 3.4. Logframe Module. This module will help you understand and use the logical framework in project design and proposal writing.

The Talent Development High School Model Context, Components, and Initial Impacts on Ninth-Grade Students Engagement and Performance

TU-E2090 Research Assignment in Operations Management and Services

elearning OVERVIEW GFA Consulting Group GmbH 1

WORK OF LEADERS GROUP REPORT

EXECUTIVE SUMMARY. Online courses for credit recovery in high schools: Effectiveness and promising practices. April 2017

Software Maintenance

Lecture 1: Machine Learning Basics

Probability and Statistics Curriculum Pacing Guide

A Game-based Assessment of Children s Choices to Seek Feedback and to Revise

Practical Research. Planning and Design. Paul D. Leedy. Jeanne Ellis Ormrod. Upper Saddle River, New Jersey Columbus, Ohio

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

The Efficacy of PCI s Reading Program - Level One: A Report of a Randomized Experiment in Brevard Public Schools and Miami-Dade County Public Schools

The Impact of Formative Assessment and Remedial Teaching on EFL Learners Listening Comprehension N A H I D Z A R E I N A S TA R A N YA S A M I

Research Design & Analysis Made Easy! Brainstorming Worksheet

PROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT. James B. Chapman. Dissertation submitted to the Faculty of the Virginia

A Study of Metacognitive Awareness of Non-English Majors in L2 Listening

Learning Lesson Study Course

Introduction. Educational policymakers in most schools and districts face considerable pressure to

Thesis-Proposal Outline/Template

Relationships Between Motivation And Student Performance In A Technology-Rich Classroom Environment

STA 225: Introductory Statistics (CT)

Early Warning System Implementation Guide

Success Factors for Creativity Workshops in RE

Tun your everyday simulation activity into research

Introduction to Simulation

Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse

Developing Students Research Proposal Design through Group Investigation Method

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

What is PDE? Research Report. Paul Nichols

Workload Policy Department of Art and Art History Revised 5/2/2007

Critical Thinking in the Workplace. for City of Tallahassee Gabrielle K. Gabrielli, Ph.D.

Evidence-based Practice: A Workshop for Training Adult Basic Education, TANF and One Stop Practitioners and Program Administrators

Travis Park, Assoc Prof, Cornell University Donna Pearson, Assoc Prof, University of Louisville. NACTEI National Conference Portland, OR May 16, 2012

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

Grade Dropping, Strategic Behavior, and Student Satisficing

Getting Started with Deliberate Practice

Oakland Schools Response to Critics of the Common Core Standards for English Language Arts and Literacy Are These High Quality Standards?

How Might the Common Core Standards Impact Education in the Future?

Centre for Evaluation & Monitoring SOSCA. Feedback Information

Probability Therefore (25) (1.33)

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Corpus Linguistics (L615)

Unpacking a Standard: Making Dinner with Student Differences in Mind

Unit 3. Design Activity. Overview. Purpose. Profile

A Comparison of Charter Schools and Traditional Public Schools in Idaho

EPI BIO 446 DESIGN, CONDUCT, and ANALYSIS of CLINICAL TRIALS 1.0 Credit SPRING QUARTER 2014

Results In. Planning Questions. Tony Frontier Five Levers to Improve Learning 1

ROA Technical Report. Jaap Dronkers ROA-TR-2014/1. Research Centre for Education and the Labour Market ROA

Simple Random Sample (SRS) & Voluntary Response Sample: Examples: A Voluntary Response Sample: Examples: Systematic Sample Best Used When

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Jason A. Grissom Susanna Loeb. Forthcoming, American Educational Research Journal

BSP !!! Trainer s Manual. Sheldon Loman, Ph.D. Portland State University. M. Kathleen Strickland-Cohen, Ph.D. University of Oregon

12- A whirlwind tour of statistics

R01 NIH Grants. John E. Lochman, PhD, ABPP Center for Prevention of Youth Behavior Problems Department of Psychology

Graduate Program in Education

Assessment Method 1: RDEV 7636 Capstone Project Assessment Method Description

The Effect of Close Reading on Reading Comprehension. Scores of Fifth Grade Students with Specific Learning Disabilities.

PETER BLATCHFORD, PAUL BASSETT, HARVEY GOLDSTEIN & CLARE MARTIN,

Level 6. Higher Education Funding Council for England (HEFCE) Fee for 2017/18 is 9,250*

STANDARDS AND RUBRICS FOR SCHOOL IMPROVEMENT 2005 REVISED EDITION

ABILITY SORTING AND THE IMPORTANCE OF COLLEGE QUALITY TO STUDENT ACHIEVEMENT: EVIDENCE FROM COMMUNITY COLLEGES

Freshman On-Track Toolkit

AUTHOR ACCEPTED MANUSCRIPT

Kelli Allen. Vicki Nieter. Jeanna Scheve. Foreword by Gregory J. Kaiser

WHAT ARE VIRTUAL MANIPULATIVES?

AC : DEVELOPMENT OF AN INTRODUCTION TO INFRAS- TRUCTURE COURSE

Office Hours: Mon & Fri 10:00-12:00. Course Description

Unit 7 Data analysis and design

learning collegiate assessment]

PCG Special Education Brief

E-3: Check for academic understanding

MEASURING GENDER EQUALITY IN EDUCATION: LESSONS FROM 43 COUNTRIES

The Foundations of Interpersonal Communication

EFFECTS OF MATHEMATICS ACCELERATION ON ACHIEVEMENT, PERCEPTION, AND BEHAVIOR IN LOW- PERFORMING SECONDARY STUDENTS

Social Emotional Learning in High School: How Three Urban High Schools Engage, Educate, and Empower Youth

White Paper. The Art of Learning

Quantitative Research Questionnaire

Active Ingredients of Instructional Coaching Results from a qualitative strand embedded in a randomized control trial

Course Content Concepts

Bayley scales of Infant and Toddler Development Third edition

Machine Learning and Development Policy

THEORY OF PLANNED BEHAVIOR MODEL IN ELECTRONIC LEARNING: A PILOT STUDY

School Leadership Rubrics

Wisconsin 4 th Grade Reading Results on the 2015 National Assessment of Educational Progress (NAEP)

Ryerson University Sociology SOC 483: Advanced Research and Statistics

Harvesting the Wisdom of Coalitions

Tutor Trust Secondary

Role Models, the Formation of Beliefs, and Girls Math. Ability: Evidence from Random Assignment of Students. in Chinese Middle Schools

Guidelines for Writing an Internship Report

w o r k i n g p a p e r s

DSTO WTOIBUT10N STATEMENT A

Genevieve L. Hartman, Ph.D.

Conducting an interview

A Study of the Effectiveness of Using PER-Based Reforms in a Summer Setting

Multiple Measures Assessment Project - FAQs

Transcription:

Evaluating Dropout Prevention and Recovery Models The University of California Educational Evaluation Center Dr. John T. Yun, Director NGA Center for Best Practices State Strategies to Achieve Graduation for All Seaport Hotel Boston, Massachusetts September 20, 2010

Presentation Outline Introduction Importance of evaluation in today s policy climate Evaluation Types Goals versus Objectives Theories of Action Methods for the Madness Some Time to Work Questions

Evaluation Types Process An evaluation designed to assess implementation of program Audience? Formative An evaluation designed to guide program development and improvement Audience? Summative An evaluation designed to assess programmatic impact Audience?

What are the Differences? Process Evaluations look at implementation and do not discuss whether what is being done is effective (fidelity and implementation). Formative Evaluations are designed to provide information solely for the purpose of program improvement. Summative Evaluations look at examining the outcomes and theory of action

What are the Differences? There are many different understandings of these terms here are mine Formative and Summative are largely differences in philosophy and purpose not necessarily true differences in approach Formative and summative evaluations can use similar methods and generally differ in terms of the rigor required Process evaluations are more clearly defined. You can perform a process evaluations can be either formative or summative

Program Goals versus Objectives Goals Ultimate outcomes of program (distal outcomes) Happier life Greater income Attending college Less cost to society Broad impact May be very difficult to measure Program Objectives Measureable changes that should occur during project/intervention Contain criteria for measuring success and failure

SMART Objectives S -Specific M Measurable A Appropriate R Realistic T Timebound By keeping to these general rules for objectives you can be sure that what you say you want to do is measureable.

Importance of Theories of Action This tells you what you believe are the outcomes of the programs Shows the causal links between program components, proximate, and distal outcomes. Allows for testing of both theory and implementation The clearer the TOA the easier the evaluation Not always easy in practice

Sample Theory of Action (logic model) Intervention Proximate Outcome Distal Outcome Provide more information to parents and students about college Increase comfort and interest in college going Increase rates of college going

Parent Education Program Logic model SITUATION: During a county needs assessment, majority of parents reported that they were having difficulty parenting and felt stressed as a result Copyright 2008 Board of Regents of the University of Wisconsin System, d/b/a Division of Cooperative Extension of the University of Wisconsin-Extension.

Parent Education Program Logic model Copyright 2008 Board of Regents of the University of Wisconsin System, d/b/a Division of Cooperative Extension of the University of Wisconsin-Extension.

Implementation v. Theory Failure Implementation Failure Didn t implement well Theory Failure Things don t work the way you think they will They both look the same based on outcome.

Failures Both failures show no change in outcome MUST be able to distinguish between them! Intervention Proximate Outcome Distal Outcome Provide more information to parents and students about college Increase comfort and interest in college going Increase rates of college going Poor implementation leads to bad outcomes Implementation Failure Good Implementation, no change in outcomes Theory Failure

Limitations of Logic Model Approach To a hammer everything looks like a nail Logic models can become an end and not means they may keep you from seeing what s actually happening in an organization Assume since fit program into the box the box fits Can program complexity be captured in a logic model/theory of action? Example when it cannot? Assume that the box will always stay the same All logic models are timebound

Logic Model Takeaway Critical to have a clearly delineated logic model/theory of Action Provides guideposts for evaluation design Creates a powerful test of key program assumptions By building in as much detail as possible, can look at both process and outcomes

Methods for the Madness I will focus on Impact evaluation Most important to the purposes of policy Experiments (First Best World) Very strongest methodology allows for causal attribution under certain circumstances Quasi-Experiments (Second Best World) Regression Discontinuity Analysis Propensity Score Models Interrupted Time Series Student Fixed Effects Models Difference in Difference Models

Definitions Experimentation: involves deliberate intrusion into an ongoing process to identify effects of that intrusion Randomized experiments: involve assignment to treatment and comparison groups based on chance Quasi-experiments: involve assignment to treatment not based on chance

How to approach design The goal of many designs (but not all) is to establish causality Key to understand the power and limits of the approach Central Problem is establishing Counterfactual, or what would have happened if the student had not participated. Most poor evaluation due to comparison of non-identical students. Experiments are Great at causal description but NOT Causal Explanation They can tell us what results from deliberately manipulating single experimental conditions Are not as good at determining why the condition lead to the outcome.

Where Causality and Random Assignment Meet Logic of Causal Relationships Cause must precede effect Cause must covary with effect Must rule out alternative causes Randomized Experiments Do All This They give treatment, then measure effect Can easily measure covariation Randomization makes most other causes less likely This is related to threats to internal validity Quasi-experiments are problematic on the third criterion.

Advantages of Experiments Unbiased estimates of effects Relatively few, transparent and testable assumptions More statistical power than alternatives Long history of implementation in health, and in some areas of education Credibility in science and policy circles

Disadvantages of Experiments Not always feasible for reasons of ethics, politics, or logistics Experience is limited, especially with higher order units like whole schools Need to have: No differential attrition No contamination across different conditions

What No Effect Looks Like

What a Main Effect Looks Like

Regression Discontinuity Assignment based only on a cutoff score Second best design for causal inference Proofs that provides unbiased inference Empirical evidence it produces similar results to an experiment It can be widely used in education Data analysis is quite tricky, but manageable

Assignment under RD Assignment can be by a merit score, need score, first come, first served, date of birth RD can actually involve any assignment variable that is ordered, including made-up ones Key concepts are an assignment variable, a cutoff score, and an outcome Think of RD as a randomized experiment at the cutoff point Think of RD as a design with a completely known assignment process

Upshot for RD A very powerful design Lots of opportunity in Education for use Depends on ability to get good cut score and people to stick to it Can be combined with randomize designs Can be difficult to correctly specify model Less power than RD (need larger samples, approximately 2.5x)

Propensity Scores Propensity score analysis tries to model selection into treatment Propensity scores are the probability that given your observables (measured variables) that you will be assigned to treatment Goal of Propensity score analysis is to find people with IDENTICAL probabilities to be in treatment who were either in treatment or not in treatment, thus you get a comparison group that is equivalent.

Upshot for Propensity Scores Need as many observables that are relevant to selection into the program PRIOR to intervention Ideally, these should be strongly correlated to assignment, less correlated with outcome. Work best when there is a clear selection theory that can be modeled using Propensity scores this allows you to select good variables to use.

Interrupted Time Series (ITS) Represent a whole series of design types (short time series, difference-in-difference, fixed-effects models) A series of observations on a dependent variable over time ~N = 100 observations is the desirable standard ~N < 100 observations is still helpful, even with very few observations (e.g., N = 7) Interrupted by the introduction of an intervention. The time series should show an effect at the time of the interruption.

ITS A very powerful design Dependent on the availability of a good archived outcome data Dependent on the ability to gather time series outcomes Note that much more archived data is available at the school and district level than the individual Design effects can do much to improve ability to make causal inference Design effects can be comparison to untreated groups or to outcomes that are unlikely to be affected by treatment, but likely to be affected by contextual variables

Student Fixed Effects Can be considered a subset of ITS Compare student s growth on important outcomes (test scores, motivation, etc ) The key is to see if growth is affected postintervention. In addition, you subtract out using a fixedeffect, the mean outcome for each student so you are not comparing students to one another, but only student changes relative to the intervention

Student receives After school program in one year out of three: Test for break from trend growth Student Test Score These years act as the control group as student not in after school program Student treated (after school program) YEAR

Difference in Difference (DID) A version of ITS and fixed effect models for non-experimental situations. Identify groups that underwent a policy change and compare to trends for groups that did not undergo the policy change. Most obvious example: comparing trends in schools that did and did not receive the intervention Usually only a few data points, distinguishes it from traditional ITS

School receives After-school program in one year out of three: Test for break from trend growth Student Test Score Control school that has similar trends in scores School treated this year (after school program) YEAR

Upshot for DID and Fixed Effects Need equivalent groups to compare growth This can be difficult in practice Impacts must happen quickly Need good controls The more data the more powerful the method

Other Considerations Statistical power and sample size The statistical power of a study refers to the probability that you can see an effect if it exists. Statistical power increases with: large samples or MANY clusters of schools or teachers outcome variables that have a low natural variation lots of baseline (pre-experiment) measures of the outcome variable (to account for random initial differences) Major funders request that applicant to indicate the Minimum Detectable Effect Size (MDES) of a proposed study. Ex. MDES of 0.2 means the study could reject the hypothesis of 0 effect with high probability if true effect was 0.2 standard deviations or higher. 40

Some Work Time Pre-Questions for Evaluation Presentation The following questions would be useful to reflect upon prior to the evaluation presentation on Thursday, September 30 th. 1. What is/are the main goal(s) of your program? 2. Specifically, how do the components of the program lead to these outcomes? What are the key intermediate steps? (Theory of Action; Causal Model) 3. How can you measure outcomes of your program? 4. Who is the audience for the evaluation? What is the purpose of the evaluation? Program Improvement or Justification of Funding? 5. What evaluation technique(s) could you implement? a) First-best: Randomized Controlled Trial (RCT) b) Second-Best: Regression Discontinuity (RD) c) Third-Best: Propensity Score Models d) Third-Best: Interrupted Time Series Model (ITS) e) Third-Best: Difference in difference models (DID) f) Third-Best: Student fixed-effect models 41

Questions

Conclusions Key to clearly delineate your logic model/theory of Action Must set methods that are appropriate to your goal There are many methods that can get you to your answer, each has critical tradeoffs associated with them Consider whether Cost Benefit Analysis is appropriate for your goals

Thank you for your time John T. Yun Director, University of California Educational Evaluation Center jyun@education.ucsb.edu ucec@education.ucsb.edu (805)893-2342