Choosing Your Validity: Analyzing the Impacts of Charter Schools in a U.S. State Using Two Types of Matching Designs

Similar documents
An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

The Effects of Statewide Private School Choice on College Enrollment and Graduation

Peer Influence on Academic Achievement: Mean, Variance, and Network Effects under School Choice

School Competition and Efficiency with Publicly Funded Catholic Schools David Card, Martin D. Dooley, and A. Abigail Payne

Educational Attainment

Shelters Elementary School

Transportation Equity Analysis

2013 TRIAL URBAN DISTRICT ASSESSMENT (TUDA) RESULTS

Longitudinal Analysis of the Effectiveness of DCPS Teachers

The Talent Development High School Model Context, Components, and Initial Impacts on Ninth-Grade Students Engagement and Performance

A Comparison of Charter Schools and Traditional Public Schools in Idaho

NCEO Technical Report 27

Kansas Adequate Yearly Progress (AYP) Revised Guidance

Serving Country and Community: A Study of Service in AmeriCorps. A Profile of AmeriCorps Members at Baseline. June 2001

ABILITY SORTING AND THE IMPORTANCE OF COLLEGE QUALITY TO STUDENT ACHIEVEMENT: EVIDENCE FROM COMMUNITY COLLEGES

Like much of the country, Detroit suffered significant job losses during the Great Recession.

School Year Enrollment Policies

Psychometric Research Brief Office of Shared Accountability

Status of Women of Color in Science, Engineering, and Medicine

A Guide to Adequate Yearly Progress Analyses in Nevada 2007 Nevada Department of Education

Student Mobility Rates in Massachusetts Public Schools

READY OR NOT? CALIFORNIA'S EARLY ASSESSMENT PROGRAM AND THE TRANSITION TO COLLEGE

University-Based Induction in Low-Performing Schools: Outcomes for North Carolina New Teacher Support Program Participants in

Sector Differences in Student Learning: Differences in Achievement Gains Across School Years and During the Summer

Miami-Dade County Public Schools

Public School Choice DRAFT

Evaluation of a College Freshman Diversity Research Program

BENCHMARK TREND COMPARISON REPORT:

Coming in. Coming in. Coming in

Karla Brooks Baehr, Ed.D. Senior Advisor and Consultant The District Management Council

Wisconsin 4 th Grade Reading Results on the 2015 National Assessment of Educational Progress (NAEP)

RAISING ACHIEVEMENT BY RAISING STANDARDS. Presenter: Erin Jones Assistant Superintendent for Student Achievement, OSPI

The Impacts of Regular Upward Bound on Postsecondary Outcomes 7-9 Years After Scheduled High School Graduation

Updated: December Educational Attainment

A Diverse Student Body

Iowa School District Profiles. Le Mars

Access Center Assessment Report

IS FINANCIAL LITERACY IMPROVED BY PARTICIPATING IN A STOCK MARKET GAME?

5 Programmatic. The second component area of the equity audit is programmatic. Equity

Cooper Upper Elementary School

Segmentation Study of Tulsa Area Higher Education Needs Ages 36+ March Prepared for: Conducted by:

George Mason University Graduate School of Education Program: Special Education

Trends & Issues Report

Effectiveness of McGraw-Hill s Treasures Reading Program in Grades 3 5. October 21, Research Conducted by Empirical Education Inc.

Australia s tertiary education sector

Financing Education In Minnesota

Statistical Peers for Benchmarking 2010 Supplement Grade 11 Including Charter Schools NMSBA Performance 2010

Review of Student Assessment Data

A Pilot Study on Pearson s Interactive Science 2011 Program

learning collegiate assessment]

The Condition of College & Career Readiness 2016

Data Glossary. Summa Cum Laude: the top 2% of each college's distribution of cumulative GPAs for the graduating cohort. Academic Honors (Latin Honors)

PEER EFFECTS IN THE CLASSROOM: LEARNING FROM GENDER AND RACE VARIATION *

Research Design & Analysis Made Easy! Brainstorming Worksheet

Evidence for Reliability, Validity and Learning Effectiveness

Cooper Upper Elementary School

Institution of Higher Education Demographic Survey

BUILDING CAPACITY FOR COLLEGE AND CAREER READINESS: LESSONS LEARNED FROM NAEP ITEM ANALYSES. Council of the Great City Schools

Evaluation of Teach For America:

Charter School Performance Accountability

The Impact of Honors Programs on Undergraduate Academic Performance, Retention, and Graduation

Demographic Survey for Focus and Discussion Groups

LANGUAGE DIVERSITY AND ECONOMIC DEVELOPMENT. Paul De Grauwe. University of Leuven

FTE General Instructions

How to Judge the Quality of an Objective Classroom Test

Norms How were TerraNova 3 norms derived? Does the norm sample reflect my diverse school population?

University of Toronto

DATE ISSUED: 11/2/ of 12 UPDATE 103 EHBE(LEGAL)-P

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Firms and Markets Saturdays Summer I 2014

Financial aid: Degree-seeking undergraduates, FY15-16 CU-Boulder Office of Data Analytics, Institutional Research March 2017

Principal vacancies and appointments

Politics and Society Curriculum Specification

Learning But Not Earning? The Value of Job Corps Training for Hispanics

STEM Academy Workshops Evaluation

Early Warning System Implementation Guide

Moving the Needle: Creating Better Career Opportunities and Workforce Readiness. Austin ISD Progress Report

Charter School Performance Comparable to Other Public Schools; Stronger Accountability Needed

The Effect of Income on Educational Attainment: Evidence from State Earned Income Tax Credit Expansions

Omak School District WAVA K-5 Learning Improvement Plan

Best Colleges Main Survey

Massachusetts Department of Elementary and Secondary Education. Title I Comparability

Common Core Path to Achievement. A Three Year Blueprint to Success

Undergraduates Views of K-12 Teaching as a Career Choice

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

HDR Presentation of Thesis Procedures pro-030 Version: 2.01

Lecture 1: Machine Learning Basics

Enrollment Trends. Past, Present, and. Future. Presentation Topics. NCCC enrollment down from peak levels

Delaware Performance Appraisal System Building greater skills and knowledge for educators

OFFICE OF ENROLLMENT MANAGEMENT. Annual Report


Jason A. Grissom Susanna Loeb. Forthcoming, American Educational Research Journal

Teacher intelligence: What is it and why do we care?

Rural Education in Oregon

Effective Recruitment and Retention Strategies for Underrepresented Minority Students: Perspectives from Dental Students

UK Institutional Research Brief: Results of the 2012 National Survey of Student Engagement: A Comparison with Carnegie Peer Institutions

Tun your everyday simulation activity into research

U VA THE CHANGING FACE OF UVA STUDENTS: SSESSMENT. About The Study

Graduate Division Annual Report Key Findings

User Manual. Understanding ASQ and ASQ PLUS /ASQ PLUS Express and Planning Your Study

Transcription:

Choosing Your Validity: Analyzing the Impacts of Charter Schools in a U.S. State Using Two Types of Matching Designs Leesa M. Foreman, Kaitlin P. Anderson, Gary W. Ritter, Patrick J. Wolf University of Arkansas Presented at the Association for Education Finance and Policy, 41 st Annual Conference, Denver, CO March 19, 2016 Abstract In this study, we consider situations in which imperfect lotteries are conducted for a subset of grades within schools. While these lotteries unfortunately do not create opportunities for full-scale randomized experiment evaluations, they do provide opportunities for improvements to quasiexperimental designs. The goal of the paper is to examine charters schools using two distinct matching methods, to check the congruence of those results, and to use the very small lottery subsample as a robustness check for both. We assess the effectiveness of the charter schools in a single mid-size U.S. state using two quasiexperimental modeling methods that can be used in combination to address both internal and external validity concerns when lotteries do not turn out to be randomized experiments. The first method matches charter students to traditional public school students from the same district feeder schools/areas; in this method, the population from which the matches are drawn includes all students in the feeder schools. The second matches charter students from oversubscribed areas to waitlist students; in this method, the population from which the matches are drawn includes all only those students who applied for admission to at least one charter school within the region. The first method (the TPS-Matching method), has stronger external validity as it compares students statewide, while the second method (the Waitlist-Matching method), has stronger internal validity as it compares a set of students who have all applied to charter schools, thus addressing the issue of selection bias. Data from state math and literacy exams for 4 th -8 th grade students from 2012-14 are used for the analyses. Results are compared and found to be similar between methods for both math and reading. We find modest positive results for charter school students which are strongest for the first year of analysis. Robustness checks are performed and also reported. In the end, we find similar results using multiple methods; this gives us greater confidence in our quasiexperimental strategies used to assess the effectiveness of charter schools throughout the state. Keywords: school choice; charter schools; research methods; quasi-experimental design 1

Choosing Your Validity: Analyzing the Impacts of Charter Schools in a U.S. State Using Two Types of Matching Designs 1. Introduction When a Lottery Isn t a Randomized Control Trial In education research, we attempt to find causal relationships. We are concerned with whether estimates are biased and if they can be generalized. What can we learn about these students, teachers, schools, programs, and interventions in this place and time that might be useful and which we might be able to use with other students, teachers, schools, and programs to improve them? Randomized experiments are considered the gold standard for education research as they address the issues of selection bias many quasi-experimental methods face. However, randomized experiments are often neither feasible nor generalizable. They have strong internal validity but lack external validity. When studying charter schools (or other oversubscribed programs), researchers look for oversubscribed schools that perform lotteries as a means of student selection. Lotteries can be used as randomized experiments because winners are chosen at random from a group of applicants who have all applied. Though some may win and others may lose, the fact that all of them applied indicates there is something similar about these individuals which is different from those that did not apply. Due to the lotteries, we would be able to compare outcomes for students who were randomly admitted to the school to the outcomes for students who were randomly not admitted to the school. With this method, we would have a Randomized Control Trial (RCT), which is the most rigorous research design for evaluating the causal impacts of a program (Rossi et al., 2004). This method is particularly strong because it allows for a comparison of students who all are invested in attending a charter school (by applying to the school). In the context of charter school evaluation research, the differences between students performance should be attributed to the impact of attending a charter school and not on differences in parent motivation. Experimental approaches have been lauded for their strong internal validity, as they have the ability to generate unbiased causal estimates of programmatic effects. Experiments also have been properly criticized for their limited external validity, due to the generalizability of those causal estimates to the entire sample of program participants and, ultimately, to potential participants in other places and circumstances. With experimental charter evaluations, we likely get an unbiased estimate of the impact of higher-quality charter schools on student achievement. Quasiexperimental design (QED) studies, such as those involving student matching and longitudinal analysis employed in our primary analysis here, tend to be weaker than experiments in internal validity but much stronger than experiments in external validity. While we would prefer experimental approaches that use oversubscription to conduct an RCT evaluation, having lotteries does not guarantee that an RCT analysis is possible. In the paragraphs 2

that follow, we highlight four issues connected to random assignment designs that can limit their usefulness. 1. Oversubscribed schools are not representative of all charter schools. 2. Oversubscribed schools can have very few seats open for lottery admission after year one. 3. In the initial year of operation where lotteries can best be exploited for study, however, this is not necessarily the best year to study the school. 4. Lottery analyses require better record keeping than is often done in regular schools. First, the requirement for oversubscription indicates a particular desirability for these schools and thus is not necessarily representative of other charters. It can be argued that oversubscribed schools are not representative of the true charter effects because of the fact that they are oversubscribed while others are not. Perhaps there is something different about these schools, which leads them to be oversubscribed higher achieving students, better teachers, more involved parents, different curriculum, etc. Working with oversubscribed schools may not be representative of all charters but it provides the best opportunity for a randomized experiment and offers the ability to address selection bias issues. Second, if we could ignore the external validity challenges of an oversubscribed charter school, we must then deal with the fact that, in most cases, there are often very few lottery seats available in an oversubscribed school in any year but the first. Often there are only a few grades where lotteries occur and there may be only a few seats available as most children in each grade move to the next. In the best cases, a school will add additional classes to a grade which would provide for a larger lottery. Third, for new schools and programs just beginning, the opportunity to implement a randomized control trial via lotteries is optimal. However, when looking to examine charter effects, we find that new schools don t often perform well in the first years of operation and the estimates we obtain may not reflect the true effects of that program over time (CREDO, 2009). If we are looking at a charter school system district-wide or state-wide, relying on new schools with large oversubscription may not give accurate estimates of the charter effect as first year studies aren t representative of schools at maturity. Finally, careful RCT s require excellent data management. That is, charter school personnel need to be diligent in tracking initial applicants, initial admits, waitlist admits, and those who declined offers of enrollment. Many charters do not have the types of data management systems required to carefully organize such information. Additionally, some states do not require charter schools to report lottery and waitlist results, or maintain them over time. Thus, perhaps the best opportunity for careful lottery studies exists in charter middle schools, where there is a pipeline grade each year in which all seats in the initial grade are open for 3

lottery admission. For this reason, many of the best known charter school lottery studies are conducted at middle schools, where there is a pipeline grade, and the set of entering students have previously attended traditional public schools and have pre-data available. 1.1 Study Origin In this analysis, we use lottery and waitlist information from the 2012-13 school year. Only schools that reported their waitlists were included in the analysis. It is possible that some schools had a waitlist but did not report it 1. It is also possible that a school used a lottery admission process but, upon enrolling students, had no waitlist because various parents who received admissions declined to enroll and all waitlisted students eventually were admitted to the school. We received waitlisted student data for eight oversubscribed schools. Only waitlisted students with previous public school enrollment were included (out of state, private school, or home school applicants were not reported). In addition, it was not indicated whether any students were awarded automatic admission outside of the lottery (for sibling preference or previous mid-year transfer students). We found very few winners in very few grades with very large waitlists in comparison; thus, estimates of the charter effect were not reliable. (Details and findings of our lottery analysis can be found in the Findings section and Appendix A). So what can be done? We have imperfect lotteries, a broken RCT if you will. However, we have waitlist information that could still be useful. Several of the charters in our state have large waitlists even in grades that are most likely filled with continuing students. Thus, we have experiments in which few people get the actual treatment, but the control population is very large. Can we somehow exploit this? We decided to use the fact that both enrolled charter students and reported waitlist students were all applicants to charter schools. Instead of comparing outcomes for winners and losers of lotteries (because we have so few lottery winners), we compare outcomes of all enrolled charter students and all reported waitlist students in 2012-13. By making use of the waitlist students in this way, we could still address concerns of selection bias and increase internal validity by comparing applicants. We use this information as the robustness check to the larger TPS-matching evaluation. In this study, we address two overarching questions, one practical and one methodological: (1) Are charter schools in this state effective at improving student achievement? (2) Should we believe these results? Does our strategy of using waitlist students as the comparison population yield similar results as a matching study comparing charter students to similar students in TPS schools? 1.2 Study Motivation 1 We believe that there is only one school that was oversubscribed but did not provide waitlist data to the evaluation team. 4

Initially, we hoped to evaluate numerous state charter schools using RCT s based on the student admission lotteries. At least, we hoped to use the lotteries that occurred at a few oversubscribed schools as a robustness check to the larger TPS-matching study that was completed adding internal validity to the external validity provided by the statewide matching analysis. Unfortunately, only a few schools were reported as oversubscribed and, within those schools, there were only seats available in a few grades. Thus, we were not confident that results from the lottery-based evaluation of charter schools though a gold standard model would fairly represent the effectiveness of charter schools statewide. As we cannot rely on the results of the lottery analysis, we turn to quasi-experimental models of estimating charter school effects that are used when randomized control trials are not possible. Matching methods attempt to address issues of selection bias by finding the closest student match based on observable characteristics. Because matching studies use large, often longitudinal, administrative data sets that represent the entire population of students in a district or state, both the study power of the evaluation and the external validity are improved. In addition, Bifulco (2012) found that propensity score matching methods used with Ordinary Least Squares regression, that include students from the same district with variables added for pretreatment achievement measures, can provide similar effect estimates as randomized control trials for evaluating the impacts of school choice programs. We use this framework for our main evaluation methodology (TPS-matching analysis). Moreover, to address reasonable concerns about selection bias threatening the validity of matching studies (Rossi et al., 2004; Cook & Campbell, 1979), we conduct an additional analysis focused specifically on the sample of charter applicants (waitlist-matching analysis). We used the same matching method for both matching analyses and the same OLS regression model. This paper is organized in the following way. A review of the relevant studies of charter schools that have used both randomized control trials and matching methods can be found in the Review of Literature section that follows. Details on the matching method and regression model used can be found in the Methods section. Results of our analyses are reported in the Findings. We conclude with a discussion of the limitations, policy implications, and plans for future research. 2. Review of Literature - Charter School Studies Educational choice as a school improvement strategy has been seriously contemplated since the 1960s. Providing choice to families and students who otherwise are often subject to the monopolistic traditional public schools could, in theory, create competition that spurs innovation in traditional public schools. Nobel laureate economist Milton Friedman from these early days was encouraging policy makers to introduce competition and give the customers alternatives in the education sector, saying that the injection of competition would do much to promote a healthy variety of schools (Friedman, 1983; 1962). 5

One prominent form of school choice is public charter schooling, developed in Minnesota in the early 1990s. Charter schools are distinctive public schools that are allowed the freedom to be more innovative while being held accountable for advancing student achievement. Because they are public schools, they are open to all children, do not charge tuition, and do not have special entrance requirements (National Alliance for Public Charter Schools). These schools provide parents with an alternative public school option to the traditional public schools in their neighborhoods. Currently, 42 states and the District of Columbia have charter school laws and charter school support in each state varies widely (Center for Education Reform). To frame our findings and to provide context, we consider the relevant studies of charter schools that have used both randomized control trials and matching methods. The literature review included here is by no means exhaustive. For a more complete review of the literature on charter school research, see Betts & Tang (2014). They find that charter schools are performing better in math than traditional public schools in most grades, with middle schools producing the largest gains. They find reading achievement to be positive but insignificant, though this appears to be driven by a few studies with large negative effect sizes. They note that impacts in the charter sector vary considerably, in particular across geographic areas. 2.1 Charter RCT Studies Mathematica Policy Research (MPR) conducted an evaluation of 36 charter middle schools in 15 states (Clark, et al., 2011). MPR limited its evaluation to charter schools that were oversubscribed and used random lotteries to determine which students were and were not admitted. Such lotteries provide the foundation for a Randomized Controlled Trial (RCT) or experiment. The MPR study found that the charter middle schools in the sample produced achievement gains that were, on average, similar to those of the control group. Urban charters tended to have statistically significant positive effects on student achievement while rural charters tended to have statistically significant negative effects. Lower-income students realized more positive achievement gains from charters while higher-income students experienced more negative achievement effects. Fortson et al. (2012) conducted an RCT which examined a sample of middle school students from 15 charter schools in six states over a two-year period. They found in the full sample, students randomly selected to attend charter schools through the lottery had nearly identical average math and reading test scores as students in the control group. From a restricted subsample, they found charter school students had identical average math test scores as students in the control group, but negative reading test scores which were statistically significant. Hoxby et al. (2009) conducted a random assignment evaluation of New York City charter schools that included 93% of NYC charter students in grades 3-12 over an eight-year period. They found that on average, students who attended a charter school for all grades, kindergarten through eight, could close about 86 percent of the black-white achievement gap in math and 66 percent of that 6

achievement gap in English. Students who attended fewer years would improve by commensurately smaller amounts. They also found that students who attended a charter high school were more likely to receive a Regents diploma and to score higher on the Regents exams than their control group counterparts. In an RCT study done by Abdulkadiroglu et al. (2009), which included students in Boston s middle and high charter schools in grades 6-12 over a seven-year period, the authors found large positive effects for charter schools students in reading and math at both the middle and high school levels. They found particularly large effects in math at the middle school level. Not surprisingly, these studies of oversubscribed charters, which families intentionally select, the authors find generally positive results. 2.2 Charter Matching Studies Many studies are conducted on charters that are not necessarily oversubscribed. The Center for Research on Educational Outcomes (CREDO) at Stanford University has performed three national evaluations of charter school performance. In all three studies, CREDO used a Virtual Twin Matching (VTM) method, which matches each charter school student with a composite of multiple traditional public school students that, collectively, reflect the charter student s observable characteristics that are assumed to affect achievement. The first of these studies reported in 2009 examined charter school populations in 15 states and the District of Columbia with data available from 2003-04 through 2007-08. That evaluation concluded that 17 percent of charter schools in the sample generated achievement outcomes that were significantly better than the outcomes for the comparison students, 37 percent of charters delivered achievement outcomes that were significantly worse, and for 46 percent of the charters there was no statistical difference between the outcomes of their students and the virtual twins. Charter performance was somewhat more positive for low-income students and at charters that had been open longer (commonly referred to as the maturity advantage). The second study reported in 2013 served as a follow-up to the 2009 CREDO study, evaluating the same states as previously as well as 10 new states and the city of New York, with data that had been released since the 2009 report. The study concluded that charter schooling generated a very small but statistically significant benefit in reading gains that amounted to 7 extra days of achievement growth but no difference in math. Low-income, English Language Learner (ELL), and special needs students appeared to benefit the most from charter schools in terms of achievement gains, according to the study. Evaluators noted that school closure rates had some impact on the findings overall. In 2015, CREDO released a national study focused on 41 urban areas with substantial concentrations of public charter schools. The study concluded that urban charters deliver average 7

achievement benefits in both reading and math that are statistically significant and substantially meaningful, amounting to 28 additional days of learning growth in reading and 40 days in math. Summary The results from these national studies of charter school effects on student test scores are remarkably similar. They all suggest that the average effect of charter schooling in general on student achievement is modest. The national studies with longitudinal components indicate that average charter effects tend to be somewhat negative in the earlier years of charter schooling but tend to be somewhat positive now. The studies all agree that urban charters demonstrate much larger and more consistent positive achievement benefits than do rural charters, especially with disadvantaged populations of students. 2.3 Contributions of this Paper In this analysis, we present the results of a charter school evaluation of a medium-sized U.S. state using a student matching methodology similar to the one used in the CREDO studies except that we match students 1:1 instead of 1: many to avoid the potential problem of bias raised in critiques of the CREDO approach. 2 We discuss the extent to which our primary findings from that analysis are consistent with the general patterns of charter school effects from the national studies reviewed above. Importantly, we conduct sensitivity and robustness checks of our primary results using alternative methodologies and analytic samples. A primary goal of the paper to assess the congruence of two distinct matching methods (a general matching study and a matching study using only a sample of charter applicants as comparison kids) to assess the effectiveness of charter schools. Bifulco (2012) indicates that longitudinal student matching studies that include baseline test scores and geography as matching variables can produce effect estimates that are less than 4 percent different from those obtained through experimental analysis. We use Bifulco s approach for our main analysis and build upon it for our secondary analysis by applying the same matching method but now only matching charter students with students on charter waitlists. Matching within the population of students who sought entrance to charters presumably controls for parent motivation to seek charter schooling and thereby eliminates the main threat to the internal validity of QED charter studies. Furthermore, as we could not verify the conduct and validity of the charter school lotteries performed in the state that we study, but we can confirm which students were eligible applicants and which students were left on the waitlist, we classify the experimental analysis of the charter effects as a Broken RCT which could contain selection bias. The lottery analysis is not at all 2 The most salient criticism of the Virtual Twin Matching is that the use of a larger number of students in the comparison group in an estimation of achievement gains over time biases the charter effects estimates towards 0 because the estimations are more precise on the comparison group side of the analysis than on the charter group side (Hoxby, 2009). 8

useful for telling us overall program impacts, because it is based on a small subsample of students in lottery grades in a subsample of non-representative schools. However, we can use the lottery analysis in these grades as robustness checks for our two matching strategies. Therefore, a student matching approach within the pool of charter applicants can address the issue of selection bias and increase internal validity. We compare our primary QED results with the results of the waitlistmatching analysis and lottery evaluation with additional robustness checks. We now turn to a description of the data used for the analyses. 3. Data The research team was provided de-identified student level data for a mid-sized U.S state, for years 2010-11 to 2013-14. Each ID was paired with information for each school year including the school attended, race/ethnicity, gender, free and reduced lunch status, English language learner status, special education status, and test scores for math and literacy. The student test scores came from the State Criterion Referenced Exams in both Math and Literacy. In this state, assessments are taken by students in 3 rd through 8 th grade. We use the prior year test scores as the baseline measure and compare outcomes of student achievement in math and literacy for 2012-13 and 2013-14. Scores are standardized within grade with mean equal to zero and standard deviation equal to one. There are two types of charter schools in this state: start-up and conversion charter schools. Conversion charter schools are schools that were formerly traditional public schools which later converted to charter. These conversion schools, while technically called charter schools in this state, are not always schools of choice as they are often the only school available to a community. Start-up charter schools are created from scratch to be their own school district and are open to all students who apply, regardless of residence. These start-up charters are the types of schools that most researchers and observers are referring to when they study and debate the pros and cons of charter schools. As a result, our analyses here focus solely on the start-up charter schools. Throughout the rest of this paper, when use the term charter schools, we refer to start-up charter schools, since these are the schools that are known as charter schools. Charter students represent about 1.6% to 1.8% of all K-12 students statewide depending on the year (Tables 1 and 2 below). Charter students share of total enrollment has increased over the two years covered by this study. Seventeen start-up charter schools, feeder schools, and approximately 5,500-7,000 students, respectively, were included in each year of analysis of the TPS-matching evaluation. Ten charter schools and approximately 2,500-3,200 students, respectively, were included in each year of analysis of the waitlist-matching evaluation. 9

Table 1. Student Demographics: Charters Students Compared to State, 2012-13 State (All Start-up Charter Students) Students Charter Students in TPS-Matching Analysis Charter Students in Waitlist Matching Analysis Enrollment 471,867 7,402 3,624 3,313 Charter as % 1.6% 0.7% 0.7% Total FRL % 61% 52% 49% 82% Minority % 36% 60% 57% 85% N of Schools 930 17 17 10 Table 2. Student Demographics: Charters Students Compared to State, 2013-14 State (All Students) Start-up Charter Students Charter Students in TPS-Matching Charter Students in Waitlist Matching Enrollment 474,995 8,346 4,748 3,439 Charter as % 1.8% 1% 0.7% Total FRL % 61% 51% 70% 76% Minority % 37% 57% 55% 99% N of Schools 929 18 17 10 The subpopulation of charter students differs in some observable ways from the state as a whole, in that it includes a smaller proportion of low-income students but a larger proportion of minority students. For the TPS-matching analysis, for which all start-up charters in the state were included, the numbers are much closer when comparing charter schools with their local traditional public school districts which serve as their feeder districts those districts where the students would have otherwise been assigned had they not attended the public charter school. Charter schools included in the waitlist-matching analysis have greater proportions of low-income and minority students than the subpopulations of charters and students statewide. (Baseline equivalency comparisons are displayed in the following section.) Appendix A shows some of the basic details for charter schools including the year the school opened, the grade levels served during the school years covered in this paper, the enrollment of each charter school, the percentage of minority students by race/ethnicity, the percentage of students who qualify for free or reduced lunch, and the number of feeder districts from which traditional public school students were drawn for comparison in the TPS-matching analysis. We now turn to the details of the different methods and model used for the analyses. 4. Methods 4.1 Quasi-Experimental Design 10

As we review the methods used, we should consider the main issues we are addressing: How well do charter students do relative to their peers and do the results of multiple QED methods converge? Both of our primary analyses here rely on developing a set of matching students who are not in charter schools for each individual student attending a charter. What does it mean to create a student match? The goal of this method is to create a set of students that are in traditional public schools but are essentially the same as the group of public charter school students when comparing observable characteristics such as income and race/ethnicity. Any differences will not be based on observable student characteristics (such as race, income, gender, or prior test scores) as matched twins will be intentionally selected to be nearly identical on these characteristics. Reasons for a specific student not being included in the analysis include, but are not limited to: not having test scores from both the baseline test year and the outcome year, being in an untested grade in either the baseline or outcome year, not being enrolled in a public school during either year, or if a student missed the test day, among other reasons. Thus, the results should be interpreted as the impacts for the matched student population, which may not generalize to the broader student population. 3 It should be noted that we report single year estimates and students were re-matched the second year to account for movement between the groups. For the TPS-matching analysis, the sample available for matching includes all charter students and traditional public school students in the geographic feeder districts. Not all students are included in the analysis however. For the TPS- comparison groups in particular, many students were not used for a one-to-one match. Charter students are matched to students from the same feeder district they would have attended had they not been admitted to a charter school. Seventeen of 20 charters in the state were included in this analysis. Of course, a clear weakness in this method remains. Despite the similarity in observable characteristics between the charter students and the comparison students, these students differ on one fundamental unobservable attribute the motivation (due to positive or negative traits or past experiences) to seek out a seat in a charter school. However, having access to the waitlist data from all oversubscribed charter schools in our sample allows us to attempt to address this shortcoming in our second analysis. For the waitlist-matching analysis, the sample consists of all charter students in the geographic area of the oversubscribed schools and all waitlist students. Most oversubscribed schools fall within a central urban metropolitan area, and one is located in a rural area. All charter students in these areas are included in the analysis as treatment schools, whether or not they allotted any open seats in the previous year using a lottery. Of the 17 total charters in the state, the 10 charters in regions with waitlists were included in this analysis. 3 In the TPS-matching analysis, 66-74% were actually matched and for the waitlist-matching analysis, 38-41% were actually matched. 11

The key difference (in addition to the fact that there are seven more schools in the broad matching analysis) is that, in the waitlist analysis, the population from which the matched twins were selected is drawn entirely form the charter school waitlists that have been provided to us. In the TPS-matching analysis, matched twins were drawn from the full population of students. When looking to compare the results of these two analyses we find that we are not comparing apples-to-apples as there are 17 schools included in the TPS-matching analysis and 10 schools in the waitlist-matching analysis. For greater comparability of results, and to check whether we can replicate results with these two different matching methods, we restrict the sample of the TPSmatching analysis to include the same 10 schools. As we may be concerned that by restricting the full TPS-matching sample we may be excluding very different types of schools, we examine the baseline characteristics below. We now turn to the means by which we created matched comparison groups of students. 4.2 Matching Method In order to complete the matching process for charter students, students who have received the treatment of being in the charter school are matched on observable characteristics from the previous school year, so that the academic growth they experience in each year can be properly studied. For those students who are not promoted from one grade to the next, accommodations are made to match properly, as described in step 1 below. Treatment students are matched with students in traditional public schools using the following matching procedure (fully outlined in Appendix B). This identical process is used for both the TPS-matching and waitlist-matching analyses and is outlined as follows: Matching Process (Conducted Separately for Math and Literacy) 1. Students are first matched with a student in the same grade in both the outcome year and baseline or matching year (always the year before). 2. For the math and literacy analyses, separately, all students are matched based on previous year scores on the same subject test, rounded to the nearest 0.01 z-score unit. The other subject test score is used as part of the propensity score in step 4 (below), as having a matched test score in the same subject is more relevant for controlling for prior performance. Therefore, the math analysis matches first on math examination scores and later factors in literary scores, while the literacy analysis matches first on literacy examination scores and later factors in math scores. 3. A propensity score is then created using free and reduced lunch (FRL) status, race/ethnicity (African-American, Asian-American or Pacific Islander, Hispanic- American, Native American, White, or Two or more races ), gender, and the other test score (literacy for the math analysis and math for the literacy analysis). 4. Finally, all matches are based on guaranteeing exact matches from step 1 and 2, and the closest available propensity score match from step 3. 12

In order to test whether or not this process worked for the purposes of generating an appropriate comparison group, we conducted baseline equivalency analyses to test how similar the two groups were to each other. The average measure of each of the observable variables is reported for both the charter treatment group and for the matched student comparison group. Any difference between the two is reported, and the statistical p-value is reported to show if that difference is statistically significant. P-values below 0.05 indicate statistically significant differences that might raise concerns about the comparability of our samples. For our major comparisons, shown in Tables 3 through 10, in some instances we had to use broader matches 4 in order to capture a large enough sample size for the analysis. For this reason, in all cases, and especially in cases where there are significant differences at baseline, our regressions will account for any of the small differences observed in these matching variables. 4.3 Student Samples in Each Method Tables 3 and 4 show the math and literacy baselines respectively, for schools included in the full TPS-matching analysis, for each year. For the combined set of matches for all charter schools included in the TPS-matching analysis, it appears that there were some significant differences in the percent of FRL students in 2011-12, although these differences were slight in size. In the tables that follow, Charter refers to the treatment group and TPS refers to the comparison group. Table 3. Baseline Equivalency for TPS-Matching Analysis in Math, 2011-13 N of Schools = 17 2011-12 2012-13 Charter TPS Difference Charter TPS Difference Number of Observations 2822 2822-3493 3493 - Grades Served 4-8 - 4-8 - Prior Year Math Z-Score -0.14-0.14 - -0.05-0.05 - Prior Year Literacy Z-Score -0.03-0.05 0.02-0.04-0.01 (0.03) % FRL 0.50 0.54 (0.04) *** 0.61 0.65 (0.04) % Minority 0.57 0.55 0.02 0.55 0.55 - % Female 0.51 0.51-0.50 0.49 0.01 Note: *** p<0.01, ** p<0.05, * p<0.1 Table 4. Baseline Equivalency for TPS-Matching Analysis in Literacy, 2011-13 N of Schools = 17 2011-12 2012-13 Charter TPS Difference Charter TPS Difference Number of Observations 2775 2775-3360 3360 - Grades Served 4-8 - 4-8 - Prior Year Math Z-Score -0.11-0.09 (0.02) 0.00-0.03 0.03 Prior Year Literacy Z-Score 0.02 0.03 (0.01) 0.03 0.03 - % FRL 0.48 0.56 (0.08) *** 0.78 0.79 (0.01) % Minority 0.57 0.57-0.54 0.54 - % Female 0.51 0.50 0.01 0.51 0.50 0.01 Note: *** p<0.01, ** p<0.05, * p<0.1 4 Broader matches were accomplished by relaxing the degree of similarity of the baseline test score for the two students. 13

Turning our attention to the ten schools included in the restricted TPS-matched sample, in Tables 5 and 6, again we see some slight but significant differences in the percent of FRL students in 2011-12. However, we find that the restricted TPS-matched sample has more minority students than the full TPS-matching sample for both years. Table 5. Baseline Equivalency for Restricted TPS-Matching Analysis in Math, 2011-13 N of Schools = 10 2011-12 2012-13 Charter TPS Difference Charter TPS Difference Number of Observations 2266 2266-2378 2378 - Grades Served 4-8 - 4-8 - Prior Year Math Z-Score -0.17-0.17 - -0.07-0.07 - Prior Year Literacy Z-Score -0.05-0.06 0.02-0.06-0.01 (0.05) * % FRL 0.56 0.55 0.01 0.56 0.55 0.01 % Minority 0.65 0.64 0.01 0.66 0.66 - % Female 0.51 0.51-0.51 0.50 0.01 Note: *** p<0.01, ** p<0.05, * p<0.1 Table 6. Baseline Equivalency for Restricted TPS-Matching Analysis in Literacy, 2011-13 N of Schools = 10 2011-12 2012-13 Charter TPS Difference Charter TPS Difference Number of Observations 2235 2235-2281 2281 - Grades Served 4-8 - 4-8 - Prior Year Math Z-Score -0.13-0.12 (0.01) -0.02-0.05 0.03 Prior Year Literacy Z-Score 0.01 0.01-0.03 0.03 - % FRL 0.54 0.58 (0.04) *** 0.54 0.55 (0.01) % Minority 0.65 0.63 0.02 * 0.66 0.65 0.01 % Female 0.52 0.51 0.01 0.51 0.51 - Note: *** p<0.01, ** p<0.05, * p<0.1 We consider that we may be excluding very different set of schools, so we next examine the baseline equivalencies of the seven omitted schools in Tables 7 and 8. Here we find significant and somewhat large differences in the percent of low-income students and modest but significant differences in minority students for 2011-12. Relative to both the full TPS-matched sample and the restricted TPS-matched sample, there are far fewer minority and low-income students in both years. Table 7. Baseline Equivalency for Omitted TPS-Matching Analysis in Math, 2011-13 N of Schools = 7 2011-12 2012-13 Charter TPS Difference Charter TPS Difference Number of Observations 552 552-1115 1115 - Grades Served 4-8 - 4-8 - Prior Year Math Z-Score -0.20-0.20 0.00 0.00 0.00 - Prior Year Literacy Z-Score 0.03 0.00 0.02-0.01-0.02 0.01 % FRL 0.27 0.49 (0.22) *** 0.52 0.53 (0.01) % Minority 0.23 0.22 0.01 0.31 0.31 - % Female 0.51 0.48 0.03 0.50 0.53 (0.03) Note: *** p<0.01, ** p<0.05, * p<0.1 14

Table 8. Baseline Equivalency for Omitted TPS-Matching Analysis in Literacy, 2011-13 N of Schools = 7 2011-12 2012-13 Charter TPS Difference Charter TPS Difference Number of Observations 540 540-1075 1075 - Grades Served 4-8 - 4-8 - Prior Year Math Z-Score -0.02 0.07 (0.08) 0.03 0.01 0.02 Prior Year Literacy Z-Score 0.07 0.07 (0.00) 0.04 0.04 - % FRL 0.25 0.49 (0.24) *** 0.52 0.52 - % Minority 0.24 0.30 (0.06) ** 0.29 0.31 (0.02) % Female 0.50 0.49 0.01 0.51 0.49 0.02 Note: *** p<0.01, ** p<0.05, * p<0.1 Tables 9 and 10 show the math and literacy baselines for schools included in the waitlist-matching analysis. It appears there were some significant differences in the percent of female students in 2011-12, otherwise the two groups are equally balanced. When we examine the similarities between the restricted TPS-matched sample and the waitlist-matched sample, we find that they are comparable. Table 9. Baseline Equivalency for Waitlist-Matching Analysis in Math, 2011-13 N of Schools = 10 2011-12 2012-13 Charter Waitlist Difference Charter Waitlist Difference Number of Observations 1257 1257-1428 1428 - Grades Served 4-8 - 4-8 - Prior Year Math Z-Score -0.16-0.16 - -0.09-0.09 (0.00) Prior Year Literacy Z-Score -0.02 0.00 (0.02) -0.07 0.01 (0.08) *** % FRL 0.53 0.53-0.57 0.56 0.01 % Minority 0.64 0.64-0.67 0.66 0.01 % Female 0.49 0.55 (0.06) *** 0.50 0.52 (0.02) Note: *** p<0.01, ** p<0.05, * p<0.1 Table 10. Baseline Equivalency for Waitlist-Matching Analysis in Literacy, 2011-13 N of Schools = 10 2011-12 2012-13 Charter Waitlist Difference Charter Waitlist Difference Number of Observations 1422 1422-1569 1569 - Grades Served 4-8 - 4-8 - Prior Year Math Z-Score -0.11-0.11 - -0.02-0.04 0.02 Prior Year Literacy Z-Score 0.05 0.06 (0.01) 0.07 0.07 - % FRL 0.52 0.51 0.01 0.54 0.54 - % Minority 0.62 0.61 0.01 0.65 0.65 - % Female 0.52 0.52-0.52 0.51 0.01 Note: *** p<0.01, ** p<0.05, * p<0.1 These summary statistics show that we were able to match our samples on most characteristics, though our charter students were slightly more likely to be economically disadvantaged, despite the fact that prior test scores are identical. This is due to the fact that our primary matching indicator was prior year academic ability. By comparing estimates based on the same schools, we can assess whether multiple methods generate similar conclusions. If they do, we will have greater confidence in our findings. 15

4.4 Regression Model Once the baseline equivalency is examined, the resulting matches were run through statistical testing to see how much of the academic growth for students can be attributed to attending charter schools. The method used is a straightforward Ordinary Least Squares regression analysis, with robust standard errors. The same regression model is used for both matching analyses. OLS Regression Model. y i = β 0 + β 1 charter + β 2 θ i + β 3 X i + β 4 δ + ε (1) Where, y i represents a given outcome of interest (math or literacy achievement) for student i, charter is an indicator for charter school treatment, θ i represents controls for prior year test scores (both math and literacy achievement), X i represents controls for student level characteristics (gender, race, FRL status), δ represents control for switched schools, and ε represents the error term. Our regression analyses statistically control for any minor differences in demographic characteristics. 4.5 Robustness Checks As a robustness check, we use the same sample of schools used for the waitlist-matching analysis to make an additional comparison. We use the full sample prior to matching, and run an OLS regression where we control for prior achievement, student characteristics, and schools for which students applied. As this model includes ALL students who applied to these charter schools (those attending and those on the waitlist), we refer to this as the Applicant model. Results of the TPS-matching model(s), waitlist-matching model, and applicant model are reported together in the following section. As a further check for robustness, we used the lottery analysis to compare to all models for 2012-13. Results of this robustness check can be found in Appendix A. 5. Findings The focus of our research was to answer the following questions: (1) Are charters effective in this state? (2) Should we believe these results? Does our strategy of using waitlist students as the comparison population yield similar results as a matching study comparing charter students to similar students in TPS schools? The academic impacts represented in Table 11 indicate that, for all 17 start-up charter schools in the state, students in public charter schools demonstrated positive and statistically significant impacts of 0.09 standard deviations on math scores and 0.07 standard deviations on literacy 16

scores, in 2012-13. In 2013-14, we find no clear effect of attending a charter school on math scores, while there was a somewhat negative and statistically significant impact of -0.03 standard deviations on literacy scores. To assess the extent to which we should believe these results, we consider results for the limited sample of students attending charter schools in the same region as charters with waitlists. Thus, we restrict the sample to the same schools included in the waitlist-matching and applicant analyses for comparison. If the results of the restricted sample provide similar estimates to those of the waitlist-matching, this convergence of the results of the overall TPS-matching analysis enhances our confidence that our matching strategy is not significantly threatened by self-selection. In the restricted TPS-matching analysis, we see the same results for 2012-13 as those found in the full TPS-matching sample, statistically significant impacts of 0.09 standard deviations in math and 0.07 standard deviations in literacy. For 2013-14, we find statistically significant impacts on both math and literacy scores of 0.03 standard deviations. Comparing these results to the findings from the waitlist-matching analysis, we find similar results with statistically significant impacts for both math and literacy of 0.05 standard deviations, in 2012-13. For 2013-14, we find no significant effects for math or literacy. This gives us greater confidence in the results of the full TPS-matching analysis. Looking to the applicant model analysis, we see that this analysis appears to support the findings of both matching analyses. For 2012-13, we see positive and statistically significant impacts in math of 0.11 standard deviations, and 0.06 standard deviations in literacy. For 2013-14, we find no clear effects in math, but positive and significant effects in literacy of 0.04 standard deviations. It appears that it may be possible to use applicant information to evaluate charter school effects. We find that both the waitlist-matching and the applicant analyses provided similar estimates to the TPS-matching results in most instances. With a combination of analyses, it may be possible for a combination of quasi-experimental methods to provide robust estimates in the absence of randomized experiments using data from oversubscribed schools. Furthermore, results from these analyses are consistent with the general patterns of modest charter school effects from the national studies reviewed in the literature. 17

Table 11. Charter Effect by Assessment, Model Comparisons, 2012-14 2012-13 2013-14 N of Schools= 17 10 10 10 17 10 10 10 TPS- Matching TPS- Matching (Restricted) Waitlist- Matching TPS- Matching TPS- Matching (Restricted) In the following section, we conclude with a discussion of the limitations and policy implications. Waitlist- Matching Model Applicant Applicant MATH Charter Effect 0.09 *** 0.09 *** 0.05 ** 0.11 *** -0.02 0.03 * 0.02 0.02 Treatment n= 2822 2266 1257 1838 3493 2378 1428 2279 N= 5644 4532 2514 3527 6986 4756 2856 3748 LITERACY Charter Effect 0.07 *** 0.07 *** 0.05 ** 0.06 *** -0.03 ** 0.03 ** 0.01 0.04 ** Treatment n= 2775 2235 1422 1838 3360 2281 1569 2279 N= 5550 4470 2844 3527 6720 4562 3138 3748 Control for prior reading/math tests x x x x x x Control for student characteristics x x x x x x Control for lottery (grade/school) Control for school applied x x Control for school switchers x x x x *p<0.10, **p<0.05, ***p<0.01 18

6. Discussion This evaluation sought to offer an overview of the academic impacts of charter schools in a medium-sized state for the 2012-13 to 2013-14 school years. Due to the limitations of data collection, we were not able to make firm conclusions about oversubscribed start-up charter schools through a Randomized Control Trial (RCT) analysis. Using a careful student matching method, charter students in each school were matched with similar students in their feeder districts in each of these years in the TPS-matching analysis. The waitlist-matching analysis was intended to complement the TPS-matching analysis. The same student matching method, in which charter students in each school were matched with similar traditional public school students who applied for charter schools but were not admitted (waitlisted) in the 2012-13 school year, was used to create approximately equivalent comparison groups. The applicant analysis was added as a robustness check for the waitlist-matching estimates. Separate matches and analyses were conducted for Math and Literacy (outcomes in grades 4-8). Given the data available, the quasi-experimental model is the best form of analysis available to us. Further, this study has the advantage of combining two matching methodologies thereby improving both the external and internal validity of the findings. Reasonable conclusions that can be drawn from this study are that charters in vibrant charter markets with waitlists in this state have a modest positive effect on student test scores in math and literacy, however, this finding is not consistent over both years of analysis. The school year 2012-13 appeared to be the strongest individual year for charter school performance, compared with 2013-14. With the evaluation that has been performed, there were certain limitations that can be improved upon in future studies. The key weakness of this study is the requirement that it be quasiexperimental, as not enough charter schools provided clear, reliable information about admissions lotteries for us to conduct a gold standard experimental evaluation. A second limitation of this study was our small sample of oversubscribed schools and our relatively low student match rates. Most oversubscribed charters are found within the central urban metropolitan area. Several charter schools, by design or for other reasons, maintain low student populations and therefore have low numbers of students tested. Researchers should continue to analyze the academic impacts of public charter schools. One of the most celebrated aspects of charter schools anywhere is that they are held accountable for their outcomes. This current evaluation seeks to add to that accountability and provide a means of checking the robustness of results found in the TPS-matching analysis. In doing so, we have increased confidence in our traditional matching results. While academic impacts do not encompass the entire mission of a charter school, or any school, these results help provide information on the performance of public charter schools. 19