Stuart Yeh at the University of Minnesota does some analysis of the existing literature on value-added modeling and its use in high stakes decisions around teachers. What he finds is that VAM is unreliable and not valid for use in high stakes decisions like hiring/firing and assigning pay.
Here are some highlights:
“Results indicate that even the best feasible value-added models may be substantially biased” (Rothstein, 2009, p. 537).
This claim is based on Rothstein’s study of VAM in North Carolina in which he found:
…the estimated effect for fifth-grade teachers predicts their students’ prior performances. Since it is impossible for fifth-grade teachers to cause performance that occurred prior to the fifth grade, this result implies there is nonrandom selection of students into teacher classrooms that is not controlled through the inclusion of time-invariant student characteristics. Therefore, the central assumption underlying VAM appears to be invalid.
Absence of random assignment invalidates value-added models, in other words.
Rothstein’s results were supported by research in San Diego (Koedell and Betts, 2011).
Another study (Raudenbush, 2004) noted that VAM assigned to individual teachers can be contaminated by school effects:
…when VAM is used to estimate individual teacher effects and to rank teachers, these estimates are contaminated by effects that are properly attributed to schools, not teachers.
That is, teachers at schools with low or high VAM may suffer or benefit in individual rankings due to school-based, and not teacher-specific factors.
Yeh cites four different studies relative to the reliability of VAM teacher rankings over time:
In each study, VAM was used to rank teacher performance from high to low. In each study, a majority of teachers who ranked in the lowest quartile or lowest quintile shifted out of that quartile (or quintile) the following year (see Tables 1 and 2). Furthermore, a majority of teachers who ranked in the highest quartile or quintile shifted out of that quartile (or quintile) the following year (see Tables 1 and 2).
That is, using VAM to reward (or punish) teachers is inappropriate because the results are not reliable over time — a high performing teacher according to VAM in one year may be a lower performing teacher the next. This could be due to factors outside of school, student assignment, or the unpredictability of VAM models. However, no matter the cause, basing decisions like hiring/firing and pay on VAM is problematic because VAM results are not reliable over time.
Yeh notes that VAM-based decisions on teachers are “less reliable than flipping a coin.”
While some may argue that adding multiple years of data on teachers improves reliability, Yeh notes a long-term study (Lefgren and Sims, 2012) that:
found that more than half of all teachers who ranked in the bottom quintile shifted out of that quintile the following year, regardless of whether one, two, three, four or five years of data were used to predict future performance, regardless of the subject area (math or reading), and regardless of whether a simple or complex Bayes estimator was used to improve predictive accuracy.
VAM cannot be said to be a reliable predictor of teacher performance — this makes decisions based on VAM unreliable — districts could end up rewarding a low-performing teacher or failing to reward a high-performing teacher. Further, as noted by Yeh, teacher performance varies over time and from year to year depending on a variety of factors, many of which are beyond that teacher’s (or even the school’s) control. As noted:
Much of a teacher’s performance varies over time due to unobservable factors such as effort, motivation, and class chemistry that are not easily captured through VAM (Goldhaber & Hansen, 2012).
Value-added modeling may yield useful information on student performance and on school-wide performance. It fails, however, to provide consistently reliable information on teacher performance. As such, it should not be used in high stakes decisions (like hiring/firing and pay). Even the best VAM models show susceptibility to bias and error that make them invalid as measures of teacher performance.
For more on education policy issues, follow @TheAndySpears