Over at Bluff City Ed, John Alfuth has a post from a teacher’s perspective on value-added modeling. What’s interesting is that Alfuth study public policy and identifies himself as a supporter of using VAM. That is, before he became a practicing teacher. Now, it seems safe to say that while he still supports some uses of VAM, he’s a skeptic.
I’ve written before about the limitations of value-added modeling and the over-reliance on raw numbers — especially by policymakers and journalists. Alfuth offers three areas of concern for those proposing use of VAM for high-stakes decisions:
First, he cites an American Statistical Association statement on value-added models. He notes:
A highly respected professional organization, their statement highlights the fact that according to most studies, only 1-14 percent of variation in test scores can be attributed to individual teachers. The rest, the assert, belongs to factors outside the classroom such as “family background, poverty, curriculum and unmeasured influences.”
Next, he notes that different VAM models can yield different (and not consistent) results:
Notably, one study that I read from 2009 found considerable year-to-year changes in Florida teacher’s evaluation scores, where some teachers were marked as ineffective one year but effective the next. This same study found that up to 80 percent of the variation in test scores could be explained by “unstable or random components (i.e. not teachers).”
Finally, Alfuth notes that VAM scores are being used for high-stakes decisions, instead of for support for teachers and diagnostically for students:
Value-added results are being used in high stakes decisions such as teacher surplussing and hiring and firing. We’ve also seen efforts to tie teacher pay and licensure directly to value-added data by the legislature and state board of education. This is concerning given that these systems may be less predictive and less objective than the first appear. If the model’s own creators admit that not all models are created equal (read: some are worse than others) and a large percent of the variance in scores is attributable to factors other than teachers, is it really appropriate to base these high stakes decisions on their outcomes?
Alfuth’s concerns have implications for both teaching practice and for policymakers. First among them, in my view, is that policymakers shouldn’t over-simplify the value of VAM in an attempt to quantify teaching quality. Certainly, some element of student growth can be attributed to a teacher. And it seems VAM models are sometimes able to capture a portion of this impact. That information can be useful to teachers in improving their practice and to principals in providing teachers with more support. But, as noted above, 80% or more of what impacts student growth isn’t explained by teacher value-added scores.
Alfuth seems to be heading toward a policy recommendation that would suggest that VAM not be used for hiring, firing, or salary decisions. Further, a likely conclusion policymakers might draw is that VAM should count for a smaller portion of a teacher’s overall evaluation (currently, it counts for 35% of a teacher’s evaluation score in Tennessee). Including it as a smaller percentage and using the data diagnostically rather than punitively would be an improvement over the current system.
Acknowledging that VAM has flaws does not mean the numbers it produces are useless. It does mean those numbers should be used with caution. I’ll be interested to see Alfuth’s Part 2 and hear what he, as a practicing teacher, recommends as appropriate use of VAM.
For more from Jon Alfuth, follow @BluffCityEd
For more on education policy, follow @TheAndySpears