Note: This article was updated Jan. 8, 2016, with recent research.

Value-added models of teacher assessment use complex statistical techniques to try to estimate teachers’ effects on student achievement. Proponents view them as a primary tool for differentiating teachers based on performance/effectiveness. However, researchers have found that VAM results are unstable over time, subject to bias and imprecision, and rely solely on results from standardized tests that were not designed for that purpose. For these reasons, researchers say they are an invalid and unfair means of teacher assessment.

The weight of the research does not support using VAM. Instead, researchers find that VAM [value-added measurement] and performance-based evaluations of individual teachers remain inaccurate and inappropriate for evaluation, pay, or retention.1

“Chances are high for misclassification of teacher performance: A teacher classified as ‘adding value’ has a 25 to 50 percent chance of being classified as ‘subtracting value’ the following year(s) and vice versa. This can make the chances that a teacher will be identified as effective the same as the chances of winning a coin toss.”2

  • Yield year-to-year fluctuations in teacher ratings that make them unreliable3
  • Often invalid because they are often unreliable, and reliability is a qualifying condition for validity4
  • Can be biased because certain students are almost never randomly assigned to classrooms (like English language learners) and teachers of those students have more difficulties demonstrating value-added than their comparably effective peers
  • Fraught with measurement errors caused by missing data, variables that can’t be statistically controlled, differential summer learning losses/gains, etc.
  • Rely on standardized assessments not designed for use in VAMs5
  • Just as student performance is affected by the performance of peers, teacher performance is influenced by the quality of peer teachers and schools.6

VAM has outsized impact even if it’s just one of multiple measures

“In a system that places differential weights, but assumes equal validity across measures, even if the student achievement growth component is only a minority share of the weight, it may easily become the primary tipping point in most high stakes personnel decisions.”7

  • Research hasn’t yet established level of correlation needed between multiple measures to ensure validity.8
  • Even two measures of teacher quality cannot be trusted to determine whether a teacher is “effective” or “ineffective,” especially when one appears to influence or trump the value of the other.9

Even if validity is assumed, VAM has major limitations in its usefulness.

We need to make sure that we have evaluation measures that give us the information we need to improve that group of teachers. A value-added score doesn’t provide much information to teachers or administrators on how to improve a teacher’s performance.”10

  • Difficult to integrate into existing evaluation systems in a reliable way11
  • Limited to identifying top and bottom 5 percent of teachers at best12
  • Unable to provide information about why or how teachers may improve their practice13
  • Applicable to only 20 to 35 percent of teachers (for whom student test data from at least two points in time is available)14
  • Statistical models cannot fully adjust for the fact that some teachers will have a disproportionate number of students who may be exceptionally difficult to teach (students with poorer attendance, who have become homeless, who have severe problems at home, who come into or leave the classroom during the year due to family moves, etc.15

VAM is nontransparent and often used inappropriately.

“… the fact of the matter is that VAMs are so imperfect they should not be used for much of anything unless largely imperfect decisions are desired.”16

  • Uses complex statistical formulas not understood by those they affect
  • Used inappropriately to make high-stakes decisions about teacher compensation, promotion and termination17

VAM-based polices are not cost-effective for the purpose of raising student achievement.18

Achievement differences among students are overwhelmingly attributable to factors outside of schools and classrooms (60 percent); teacher impact is only 7 to 10 percent.19

  • Broadly, high-stakes implementation of VAM is certainly premature, and likely a significant waste of time and money better spent on problems more pressing and clearly defined.20
  • Measuring value-added is expensive and labor-intensive because it requires testing at every grade level, tracking of data and students, and the capacity to run data through complex statistical formulas.21

Examples of state and school district difficulties with VAM

  • New York City’s VAM system had a 35 percent error rate in classifying math teachers’ performance and a 53 percent error rate for language arts teachers.22
  • In a June 25, 2012, Open Letter of Concern to Georgia’s governor, a coalition of researchers, educators and teacher evaluation reform advocates expressed alarm about Georgia’s new (VAM) teacher evaluation system. They cited research that supported their recommendation that the state return federal monies related to the project and opt out of Race To the Top (RT3) as Idaho, Indiana, Kansas, Minnesota, Oregon, South Dakota, Virginia, West Virginia and Wyoming had.23
  • In Colorado, many districts have come to recognize that they do not have the capacity to do this work on their own and are concerned about the potential legal challenges that would emerge if personnel decisions are made by an evaluation system that is not valid and reliable.24
  • The Council of Chief State School Officers reported that a survey of the states’ education staff revealed major concerns about state agencies’ lack of capacity to implement VAM evaluation reforms. South Carolina, for example, was identified as a state that simply has “no resources” to design and implement new teacher-evaluation systems.25
  • Two-thirds of respondents to a February 2012 survey of Indiana school superintendents expressed concern about having the capacity to conduct the number of necessary classroom observations necessary to generate VAM evaluations of teachers and for securing training for the personnel responsible for completing them.26
  • The New York teachers’ union filed a lawsuit in 2012 over the details of the state teacher evaluation system. The case has since been resolved, but nonetheless, it has hampered implementation. Delaware delayed its new teacher-evaluation system by one year, giving the state more time to develop student growth measures. Some observers worry about the lack of capacity to execute complicated reform initiatives given tight deadlines. As one Florida reporter said: "Only a handful of districts feel like they’re prepared to do [new teacher evaluations]. Most feel like they’re rushing."27
  • More than 10 percent of the Ohio districts that originally signed on to receive the federal Race to the Top grant (requiring teacher evaluation based on student test performance) — about 60 districts — bailed out once grants began funneling down to the local level. Some cited concerns that the grants were too small to get the work done. They said, for example, that simply hiring staff to lead the work would eat up most of the money.28

Resources

  1. "A Reanalysis of the Effects of Teacher Replacement Using Value-Added Modeling," Stuart Yeh, 2014, Teachers College Record, Volume 116, Number 1, 2014; “Evaluating Value-Added Models for Teacher Accountability” (RAND Corporation, 2003) by Daniel F. McCaffrey, Daniel Koretz, J. R. Lockwood, Laura S. Hamilton. 

  2. “Error Rates in Measuring Teacher and School Performance Based on Student Test Score Gains,” (NCEE 2010-4004), Peter Z. Schochet and Hanley S. Chiang, National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education, 2010. See also “Bias of Public Sector Worker Performance Monitoring: Theory and Empirical Evidence from Middle School Teachers” Douglas N. Harris, Tulane University; Andrew A. Anderson, University of Wisconsin-Madison, 2012. “Teacher Quality at the High-School Level: The Importance of Accounting for Tracks,” (NBER Working Paper No. 17722), C. Kirabo Jackson, 2012. “Heterogenous Match Quality and Teacher Value-Added: Theory and Empirics,” Scott Condie, Lars Lefgren, and David Sims, Assoc. for Ed Finance and Policy, March 17, 2011.

  3. "Is it just a bad class? Assessing the long-term stability of estimated teacher performance," Goldhaber, D., & Hansen, M. (2012). (Working Paper 73). Washington, DC: National Center for Analysis of Longitudinal Data in Education Research; "A practical guide to designing comprehensive teacher evaluation systems," Goe, L., Holdheide, L, & Miller, 2011, Washington D.C.: National Comprehensive Center for Teacher Quality.

  4. "A Reanalysis of the Effects of Teacher Replacement Using Value-Added Modeling," Stuart Yeh, 2014, Teachers College Record Volume 116 Number 1, 2014.

  5. Rethinking teacher evaluation in Chicago: Lessons learned from classroom observations, principal-teacher conferences, and district implementation. Sartain, L., Stoelinga, S.R., & Brown, E.R. (2011). Chicago: Consortium on Chicago School Research

  6. “Value-Added Teacher Ratings: Should They Be Adjusted For Poverty?” Sarah Garland, The Hechinger Report. First posted: Nov. 22, 2011; “Teaching Students and Teaching Each Other: The Importance of Peer Learning for Teachers,” Clement (Kirabo) Jackson, Cornell University; Elias Bruegmann, Harvard University; Feb. 15, 2009. “Match Quality, Worker Productivity, and Worker Mobility: Direct Evidence from Teachers,” C. Kirabo Jackson, Northwestern University, IPR and NBER, Dec. 23, 2011. Cited in “What Value-Added Research Does And Does Not Show,” Matthew Di Carlo, Dec. 1, 2011.

  7. "The Legal Consequences of Mandating High Stakes Decisions Based on Low Quality Information: Teacher Evaluation in the Race-to-the-Top Era" Baker, B., Oluwole, J., & Green, P., III. (2013). Education Policy Analysis Archives, Volume 21, Number 5, January 28, 2013. ISSN 1068-2341
  8. "Houston, We Have a Problem: Teachers Find No Value in the SAS Education Value-Added Assessment System (EVAAS®)" Collins, Clarin. Education Policy Analysis Archives, Volume 22, Number 98, Oct. 27, 2014. ISSN 1068-2341.
  9. Id.
  10. “Evaluating Teacher Evaluation Systems,” Raven Hill, (quoting Thomas Toch, co-founder and co-director of Education Sector), Texas School Business, September 2012.

  11. Revamping the teacher evaluation process. Whiteman, R.S., Shi, D., & Plucker, J.A. (2011). Bloomington, IN: Center for Evaluation and Education Policy

  12. “Evaluating Teacher Evaluation Systems,” Raven Hill, Texas School Business, September 2012.

  13. “Overhauling Indiana Teacher Evaluation Systems: Examining Planning and Implementation Issues of School Districts,” Cassandra M. Cole, James N. Robinson, Jim Ansaldo, Rodney S. Whiteman, and Terry E. Spradlin, Center for Evaluation and Education Policy; Education Policy Brief , Vol. 10, No. 4, Summer 2012.

  14. “False Performance Gains: A Critique of Successive Cohort Indicators,” Steven Glazerman and Liz Potamites, Mathematica Policy Research, December 2011, p. 13. “A Survey of Approaches Used to Evaluate Educators in Non-tested Grades and Subjects,” Katie Buckley, Harvard University; Scott Marion, National Center for the Improvement of Educational Assessment; June 2, 2011 (citing Goe, 2010).

  15. “Problems with the use of student test scores to evaluate teachers,” (Economic Policy Institute 2010), Eva L. Baker, Paul E. Barton, Linda Darling-Hammond, Edward Haertel, Helen F. Ladd, Robert L. Linn, Diane Ravitch, Richard Rothstein, Richard J. Shavelson, and Lorrie A. Shepard.

  16. VAMunition, posted Oct. 30, 2012, by Audrey Amrein-Beardsley, Associate Professor, Arizona State University.
  17. A Reanalysis of the Effects of Teacher Replacement Using Value-Added Modeling, Stuart Yeh, 2014, Teachers College Record Volume 116 Number 1, 2014.

  18. Id.

  19. American Statistical Association Statement on Using Value-Added Models for Educational Assessment (2014); "Teacher effectiveness and student achievement: Investigating a multilevel cross-classified model," Heck, R. H. Journal of Educational Administration, 47, 227-249, (2009); “What Large-Scale, Survey Research Tells Us About Teacher Effects on Student Achievement: Insights from the Prospects Study of Elementary Schools,” Consortium for Policy Research in Education; Brian Rowan, School of Education, University of Michigan; November 2002; Nye et al. 2004). Cited in “What Value-Added Research Does And Does Not Show,” posted by Matthew Di Carlo Dec. 1, 2011; “Teacher Licensing and Student Achievement.” Goldhaber, Daniel, and Dominic Brewer. 1999. Cited in C. Finn and M. Kanstoroom, eds.: Better Teachers, Better Schools. Washington, D.C. Thomas B. Fordham Institute; “Teachers, Schools, and Academic Achievement,” (Working Paper 6691), Eric A. Hanushek, John F. Kain, Steven G. Rivkin, National Bureau of Economic Research, 1998.

  20. "What We Know Now (and How It Doesn't Matter)," Posted on August 19, 2013, by P.L. Thomas, Associate Professor of Education at Furman University

  21. A Reanalysis of the Effects of Teacher Replacement Using Value-Added Modeling, Stuart Yeh, 2014, Teachers College Record Volume 116 Number 1, 2014; “False Performance Gains: A Critique of Successive Cohort Indicators,” Steven Glazerman and Liz Potamites, Mathematica Policy Research, December 2011, p. 13.

  22. “Back to school: How to measure a good teacher,” Amanda Paulson, The Christian Science Monitor, Aug. 14, 2012.
  23. Open Letter of Concern Regarding Georgia’s Implementation of its New Teacher/Leader Evaluation System from GREATER (Georgia Researchers, Educators, and Advocates for Teacher Evaluation Reform), June 25, 2012

  24. “The State of Teacher Evaluation Reform: State Education Agency Capacity and the Implementation of new Teacher-Evaluation Systems,” Patrick McGuinn, Center for American Progress, Nov. 13, 2012.

  25. Id.

  26. “Overhauling Indiana Teacher Evaluation Systems: Examining Planning and Implementation Issues of School Districts,” Cassandra M. Cole, James N. Robinson, Jim Ansaldo, Rodney S. Whiteman and Terry E. Spradlin, Education Policy Brief, Vol. 10, No. 4, Summer 2012.

  27. “Race to the Top: What Have We Learned from the States So Far? A State-by-State Evaluation of Race to the Top Performance,” Ulrich Boser, Center for American Progress, March 2012.

  28. Id.

AttachmentSize
PDF icon Download a PDF of this article.1.06 MB