Teacher Employment Patterns and Student Results in Charlotte Mecklenburg Schools:

Teacher Employment Patterns and Student Results in Charlotte Mecklenburg Schools: CONSTRUCTING TEST SCORE-BASED MEASURES OF TEACHER EFFECTIVENESS This appendix describes the calculation of the value-added teacher effects estimates used in many of our analyses. These teacher effects are one measure of a teacher s effectiveness in raising student achievement. Intuitive Introduction The effort to quantitatively measure a teacher s effect on student achievement growth often goes by the short hand names teacher effects or value-added. The objective is to measure student achievement growth and to isolate the part of that growth that is attributable to a particular teacher teaching skill or practice or teacher characteristic. How do you estimate value-added to student achievement growth? It is helpful to think of the calculating teacher effects or value-added scores in two steps. In the first step we estimate adjusted average test-score growth for a teacher s students. In this case each student s growth is defined as the difference between their actual test score and their predicted test score. The predicted test score (estimated using multivariate regression techniques) takes into account a number factors largely beyond the influence of the teacher: a student s prior achievement student demographic characteristics and the composition of peers in the student s class. Thus for example a student s predicted test score would be lower if he had been designated limited English proficient even if his prior test scores were similar to English proficient students. This particular construction of adjusted average test-score growth helps account for the non-random assignment of students to teachers. We can calculate adjusted average test-score growth for just one class of students all students taught by a teacher in a specific school year or all students taught by a teacher across multiple years. (When an average includes multiple classes we first calculate each class average then calculate a weighted average of the class averages.) This average test-score growth is for some statistical analyses an appropriate estimate of a teacher s effectiveness in raising student achievement. What are shrunken teacher effects? Other statistical analyses however benefit from a second step. In this second step we further adjust each teacher s estimate to account for the estimate s reliability (in the statistical sense) as a measure of the teacher s own effect on student achievement. Reliability itself is estimated by comparing multiple class averages for classes taught by 1

the same teacher and the variation of test scores for students within each class. Less reliable estimates (i.e. those based on fewer classes or students) are adjusted down (or up) to be closer to the mean of all teachers estimates. This process is sometimes called shrinking the estimates. The estimates are sometimes called the shrunken teacher effects. This second step has the benefit of reducing random measurement error associated with classes and students which makes these shrunken teacher effects preferable in some statistical analyses. What units are teacher effects reported in? Teacher effects are reported in standard deviation units; more specifically they are reported in standard deviations of student test scores. This metric is helpful both during the regression steps and when comparing the answers to different questions. To change each student s scaled test score to standard deviation units we subtract the overall mean scaled score and divide by the overall mean standard deviation. Thus the new average student test score will be zero and the standard deviation will be one. If for example we report that National Board Certified Teachers have an effect of 0.02 compared to all other teachers that means that after accounting for other factors students are scoring 0.02 standard deviations higher when they are assigned to National Board Certified Teachers for a year. To provide a sense of magnitude the black-white test score gap is generally estimated to be about 1.0 standard deviations and three years of a national reading program Success for All added about 0.2 standard deviations to student reading comprehension achievement. Figure 2 provides a sense of how to translate from standard deviation units to more familiar test metrics: scaled scores percentiles and AYP cut scores in North Carolina. 2

50th Percentile Level 3 362 Percent of Students Level 2 25th Percentile 75th Percentile Level 4 351 380 Level 1 337 8th Grade Math 0.67 Test Scores 0 (Standard 0.67 Deviation Units) 8 th Grade Test Scores (Standard Deviation Units) Note: Level cut scores and scaled scores provided as examples for a hypothetical school system. Empirical Details Model We begin by estimating the following student-level equation: (1) a i = Ai t nα + Si tβ + Pk tδ + Tk tγ + Ei t ρ + vi k t where v i = μ k + θ + ε i where the outcome of interest a i is the test score for student i taught by teacher k during school year t. In most cases a i is a state standardized test administered at the end of the school year. That outcome test score is modeled as a function of the student s prior achievement Ai t n other observable characteristics of the student S and their peers P taught by teacher k and the school more generally characteristics of the student s teacher k that are of analytic interest T and a fixed effect E for each of the different tests (e.g. Cambridge Public School students who took the MCAS 4 th grade math test for 2010) which are used to measure a i. The composition of the vectors A S T and k t P are composed as follows: k t A a vector of information regarding a student i s prior achievement includes: 3

o a student i s test score in the same subject (e.g. math when predicting math) the previous school year t-1 o the square and cube of a o i t 1 a student i s test score in the a different subject (e.g. reading when predicting math) the previous school year t-1. Sometimes a small number of students in a class will not have taken one or more exams the previous year and thus do not have values for a or a i t 1 or both. If a student is missing data for prior tests we exclude this student. t All test scores a and a i 1 along with a i are standardized with a mean of zero and a standard deviation of one. This standardization is calculated based on the student s score compared to other students who took the same test within the system (e.g. Cambridge Public School students who took the MCAS 4 th grade math test for 2010). S a vector of other observable characteristics of student i during school year t includes indicator variables for: o gender o each racial or ethnic subgroup o English language learner status o Exceptional Child status. P a vector of observable characteristics of student i s peers taught by teacher k and peers at the student s school included separately for teacher k and the school: o the means of the elements of o the means of a and a i 1. t S T a vector of teacher k s characteristics or practices is defined differently from analysis to analysis and unlike the other vectors the estimated coefficients γ are often the focus of the analysis. Thus T might as just a few examples include: o teacher k s scores on a classroom observation rubric or k t o indicator variables for various experience groupings (e.g. novice 1 year 2 years 3 years 4-9 years 10+ years) and teacher fixed effects to study the importance of experience on student achievement growth or 4

o indicator variables for whether a teacher is National Board Certified or not. For some analytic purposes we do not include T at all; in these cases we are interested in capturing a teacher s total effect on student achievement growth compared to all other teachers regardless of teacher characteristics. We do not generally include school fixed effects in Equation 1 but do check our inferences for robustness to the inclusion or exclusion of school fixed effects. When appropriate we highlight the differences and potential implications. Sample Our analysis sample is constrained by the availability of required data analytic choices to improve the estimation process and other choices to aid interpretation of the results. As implied above sample selection rules will depend on data availability and the intended use of the results. In general however we include student-by-year observations i (and the associated teachers k) when: a i a and all (or nearly all) of S are non missing we can identify one specific teacher k in which the student i received instruction in the subject content (i.e. standards) measured on the outcome test a i the teacher k taught at least five students for whom the first two bullet points are true the teacher k is not teaching a self-contained class providing instruction to IDEA students exclusively. Note that students observed with more than one subject specific teacher (e.g. two mathematics teachers) are excluded from the analysis. We believe this approach is appropriate for the inference objectives of the SDP Diagnostic but not necessarily for all efforts to estimate teacher effects (a statement that is true of most all of this document). Estimation We use Hierarchal Linear Modeling (HLM) (or Linear Mixed Models) to estimate Equation 1 with nested random effects μk and θ j for each set of students taught by teacher k in school year t. HLM provides empirical Bayes estimates of the teacher random effects μˆ k that are the best linear unbiased predictions. These empirical Bayes estimates are the shrunken estimates described as step two above; that is they account for differences in the reliability of the estimates from teacher to teacher by shrinking less reliable estimates toward the mean (Raudenbush and Bryk (2002)). This shrinkage reduces random measurement error that is associated with the class- and student-levels. Besides 5

interpreting the estimates of γ as discussed above we can study the relationship between teachers effects estimates μˆ k and other variables directly. 6