Tuesday, August 31, 2010

New report critiquing value-added measures

The Economic Policy Institute has released a new report, "Problems with the use of student test scores to evaluate teachers". The report is authored by a number of big names in the education and assessment field, including Dianne Ravitch, Robert Linn, and Linda Darling-Hammond. Click here for the Answer Sheet write-up about the report, which also includes the Executive Summary.


Sunday, August 22, 2010

Falsely identifying "bad" teachers

Here's a link to a recent study from the U.S. Department of Education on likely problems when using test scores to evaluate teacher (and student) performance (referred to in my previous post):

Error Rates in Measuring Teacher and School Performance Based on Student Test Score Gains

The study, by Peter Schochet and Hanley Chiang at Mathematica Policy Research (which develops schemes for value-added measures for school districts, including the District of Columbia Public Schools, where teacher evaluation came into play on the recent firings), has some interesting findings:
  • "Type I and II error rates for comparing a teacher’s performance to the average are likely to be about 25 percent with three years of data and 35 percent with one year of data." [Type I errors are "false positives" -- you think the hypothesis is true when really it isn't; Type II errors are "false negatives" -- you think the hypothesis is false when it isn't.]
  • "These results strongly support the notion that policymakers must carefully consider system error rates in designing and implementing teacher performance measurement systems based on value- added models, especially when using these estimates to make high-stakes decisions regarding teachers (such as tenure and firing decisions)."
  • And this powerful statement: "Our results are largely driven by findings from the literature and new analyses that more than 90 percent of the variation in student gain scores is due to the variation in student-level factors that are not under the control of the teacher."
To reiterate: The first point means that when using three years worth of student test data, chances are about 1 out of 4 that a teacher would be falsely identified as a "bad" teacher. Bad odds when a career and livelihood are at stake.


Some related links on the report:

Study: Error rates high when student test scores used to evaluate teachers from The Answer Sheet blog (very good blog!)

Rolling Dice: If I roll a “6″ you’re fired! from the School Finance 101 blog.

Thursday, August 19, 2010

L.A. Times: Lies lies and more lies

On Sunday, the L.A. Times began publishing an important series of articles on teacher evaluations (Who’s Teaching L.A.’s Kids?). Important because the Los Angeles Unified School District is the nation’s second largest (Chicago is third); important because it appears in a major newspaper; important because they printed teachers names with the data, upping the ante in attacks on teachers (Duncan came out in support of publishing teacher evaluations); important because it previews what I suspect will be the same kind of arguments that will be used by Huberman’s administration against Chicago teachers.

I am late to the party on this -- I first saw a note of it on the District 299 blog (required reading to keep up with CPS news), then on the Answer Sheet blog, and the L.A. Times piece story showed up on NPR yesterday. I’m slow (actually, ironically I suppose, or sadly, I have been mushing and sorting students by their ISAT scores for an upcoming area observation).

The Who’s Teaching L.A.’s Kids? article looks at local student standardized test scores on a by-teacher basis, using a "value-added" statistical model, and based on that model, identifies teachers as "effective and "good" teachers versus "ineffective" and “bad" teachers.

On reading the article, I was reminded of Mary McCarthy’s famous quip about Lillian Hellman (and unfair I think), "every word she writes is a lie, including 'and' and 'the'." Every word in the L.A. Times article is a lie, including "and" and "the". In this case, not unfair.

The article doesn’t have to be "true" of course -- we are in a propaganda war, after all. The tactic of the Times authors is to dress up some statistics bullshit in a pretty hat, and parade it around as science, ergo truth. Everyone is so wowed by the hat, that they fail to recognize that underneath the hat, it’s just, well, just bullshit. But if enough scientistic magic power words are folded into the story, words like "Rand Corp.", "senior economist and researcher", "reliable data", "objective assessment", "effective", the narrative sweeps along and reaches it’s obvious, stinking conclusion.

Here is the essence of the Times’s method:

The Times used a statistical approach known as value-added analysis, which rates teachers based on their students' progress on standardized tests from year to year. Each student's performance is compared with his or her own in past years, which largely controls for outside influences often blamed for academic failure: poverty, prior learning and other factors. Though controversial among teachers and others, the method has been increasingly embraced by education leaders and policymakers across the country, including the Obama administration.
The approach, pioneered by economists in the 1970s, has only recently gained traction in education.
Value-added analysis offers a rigorous approach. In essence, a student's past performance on tests is used to project his or her future results. The difference between the prediction and the student's actual performance after a year is the "value" that the teacher added or subtracted.

There have been a number of challenges raised to value-added measures on methodological grounds (in particular, misidentification of “bad teachers -- see Study: Error rates high when student test scores used to evaluate teachers from the Answer Sheet blog. Also from the same blog (which is Really Good by the way), Willingham: Big questions about the LA Times teachers project. I have started a list of links here.)

But I think there is a more fundamental, worldview-type fault with the general approach demonstrated in the LA Times article, the Big Lie that makes everything in the article a lie, even "and" and "the". It’s not merely the concept of "value-added" as a metric, but the overall economic approach to education.

Through the lens of economics, teachers go to the education factory. They work on human widgets. At the end of the day, teachers have hopefully added value to the widgets. Value is added if the widgets score higher on multiple choice tests. The greater the change in test scores, the more value that has been added, and the more productive the teacher is.

Implicit in the economic argument is that the education factory must strive to be as productive as possible (i.e. raise test scores as much as possible). Teachers have a greater effect on students than any other single factor, so education reform should focus of identifying the most productive teachers. School districts must then devise incentives to keep the most productive teachers (hence merit pay). Or researchers need to determine what makes for a productive teacher (e.g., Building a Better Teacher, from the New York Times Magazine last March), and teach that in teacher education programs (hence let Teach for America certify its own teachers), and/or suss that out in teacher recruitment or on the firing line in the first couple of years of teaching (see Malcolm Gladwell’s piece "Most Likely to Succeed: How do we hire when we can’t tell who’s right for the job?").

Most all of the research on this approach -- what gives this approach its academic patina of respectability -- points back to the work of Eric Hanushek, an economist at the Stanford's Hoover Institution. He has been working on quantifying the effect of individual teachers, and trying to isolate the teacher effect in education since the early 1970s, and continues to work on it today. His work, and the work of people around him, is the academic foundation, the theory on which most of the official education rhetoric, from Obama to Duncan to Huberman (from what I can tell anyway), is based.

This economic model is taken a step further with the "value-added" notion. I’m not sure where the concept arose but it is an obvious extension of Hanushek’s work. CPS uses a version from the University of Wisconsin’s Value Added Research Center, as does New York City, Milwaukee and Dallas. The LA Times study referred to above was done by researchers at the Rand Corp. A company called Mathematica Policy Research, Inc. was noted in the Answer Blog as the contractor for the Washington, DC teacher evaluation system, which uses a value-added component (and used in the firing of teachers there recently.)

As noted above, "value-added" can be calculated in different ways, but all approaches are based on standardized test scores. As is the case for Hanushek’s work. Here is Hanushek's (and collaborator Steven Rifkin's) statistical justification for saying standardized test scores mean something:

One fundamental question––do these tests measure skills that are important or valuable? –– appears well answered, as research demonstrates that standardized test scores relate closely to school attainment, earnings, and aggregate economic outcomes (Murnane, Willett,and Levy 1995; Hanushek and Woessmann 2008). The one caveat is that this body of research is based on low-stakes tests that do not affect teachers or schools. The link between test scores and high-stakes tests might be weaker if such tests lead to more narrow teaching, more cheating, and so on. (from Hanushek and Rivkin’s Using Value-Added Measures of Teacher Quality, p. 2; emphasis added)

The economic view of education, as the above indicates, assumes the goal in life is earnings and/or academic attainment. If that assumption and mindset is rejected, then the rationale of standardized tests having any meaning evaporates, and the whole argument collapses.


P.S. I skipped over their important caveat re: that the justifying research assumes that tests are low-stakes, which is not the case today with ISAT, ACT, Scantron, etc. The current cheating controversy in Atlanta speaks to the greater incentive to cheat as stakes get higher. In a more perfect world, tests would be part of a bigger assessment profile, and then they might mean something. In the words of Hanushek and Rivkin themselves, the standardized test score data is suspect.

Friday, August 13, 2010

Report on Ren 2010 and charters

The August, 2010 issue of Catalyst Chicago features a number of articles on Renaissance 2010 and charter schools in Chicago.

Also see the "Many Chicago Charter Schools Run Deficits, Data Shows" article by Sarah Karp, deputy editor of Catalyst Chicago, that appears in the New York Times. The article gives a peek at the finances of charters.


Wednesday, August 11, 2010

More testing coming

More standardized testing is in the works for my school next year. According to an area representative, Ron Huberman has directed the areas to produce assessments every five weeks (seven times over the school year) for grades 3 - 8 for "performance management" tracking. Areas are free to select an assessment from a list of nine vendors approved at a Board of Education meeting last June.

The Chicago Benchmark Assessment, done three times last year, is optional, and our area will not be requiring that.

In addition, because my school is on academic probation for low test scores, my school is being assigned a "performance management" representative from central office to work with our school to assist with required monthly school-based performance management reviews, and using the assessment data to (ideally) identify problem areas and strategies.

For our new 5-week assessments, we will be using an online test from Riverside Publishing, the authors of the Iowa Test for Basic Skills once used district-wide by CPS. Riverside was chosen because they have a lot of Spanish language test materials, and my area mostly covers the heavily Hispanic Little Village area, and a little bit of North Lawndale assigned to it when Area 8 was dissolved last year.

For grades 3 - 8, we will do online Scantron during the month of September (reading, math and science), Riverside (math and reading) in October and November, December, January, and Scantron again in January, ISAT (math and reading, science 4th and 7th only, writing is cancelled this year because of money) in early March, Riverside again in March, April and May, and Scantron again in May.

The Riverside tests will be done online. Students will need to do a reading and math test, and each test will last about 45 minutes. We will have a 3-day window (I think that is correct) to do the tests.

On the plus side, teachers will have immediate results. On the negative side, each test session which will require computer cart or lab time, meaning a technology support person for each test, at least to get things started, and be available to deal with the inevitable technical issues. Paper based assessments could be done simultaneously in all classrooms, online is strung out depending on technology and people resources. Scantron has been its own headache, and it has a three week window. Logistically, Riverside in three days is possible, but it will mean a lot of running around.

The Benchmark Assessment had the nice advantage of letting teaches keep the test booklet, or view the actual questions via CIM, CPS's online student test data tracking tool. This allowed teachers, if they had the time, to try to figure out why students got an answer wrong. (It turns out, from my observation, that, for math anyway, a lot of the problems stem from reading comprehension issues, or getting tripped up by misleading pr ambiguous questions, or problems with familiar words used in unfamiliar math contexts -- e.g., "the net of a cube".)

My understanding of Riverside is that we can see and use other questions from their item pool, but it's not clear if we can see the actual questions and responses from a particular assessment.

Besides the logistical problems of administering the tests, other issues include: the out-of-pocket cost to CPS of the tests, the opportunity cost of teacher and staff time administering the tests, and teacher time making sense of the data on top of all of the other regular (and more informative) classroom assessments, and the cost of lost instruction time for students.

An even bigger question is whether the tests will tell anyone anything useful. As I think I may have said before, testing students is not the same as taking their temperature. Students resist their test-heavy regime. They will resist Riverside to, spoiling the results.

Even if test scores accurately reflected student ability (which occurs only in some cases, skewed by demographics, prior academic success, breakfast, and many other factors), changes in student performance across tests given five-week weeks apart are statistically suspect. Scantron, e.g. will not produce gains reports for for tests given less than 12 weeks apart (see their Performance Series White Paper, for example, page 9), because statistically speaking, any differences in a shorter period are meaningless -- differences can be attributed to many factors other than student learning.

Unless a test is mapped to a pacing guide and so questions adjust throughout the year according to what students "should be" learning (per the pacing guide, but many other problems here, including that the pacing guide is dictating from on high what teachers teach when regardless of their students, and the tests would need to be aligned to curriculum), it seems unlikely to me that test results will vary significantly from test session to test session, unless... Since the 5-week data will be used for the monthly performance management inquisition, the obvious thing for a teacher to do, to succeed in the current test-crazy CPS environment, would be to study the Riverside item pool, and... teach to the test!


Monday, August 9, 2010

College Inc.

The PBS series Frontline aired a deep investigation of for-profit higher education last May, called College Inc. The tagline from the website gives an accurate hint of the show: "Investigating how Wall Street and a new breed of for-profit universities are transforming the way we think about college in America..."

I meant to post something on this earlier but. I also wrote some more on this (including the paragraph above) as it relates to the process of "commodification" -- turning things that used to be outside of the market, in the public sector or parts of our communal life at home or in the neighborhood, into pay-for services, and how that fits into the Big Processes going on now in the world economy, on another blog.


Sunday, August 1, 2010

Message from Rethinking Schools

The message below appeared in my mailbox, courtesy of the Chicago Teachers for Social Justice mailing list. It is from one of the editors of the magazine, Rethinking Schools, and is a good summary of the flurry of news about Race to the Top over the past week.

July 31, 2010

Dear friends,

In the past week there has been significant developments in the growth of anti-Race to the Top sentiment around the country. An impressive coalition of national civil rights groups issued a statement critical of the Obama/Duncan administration's educational policies: Framework for Providing All Students an Opportunity to Learn through Reauthorization of the Elementary and Secondary Education Act. It's worth the read.

Similarly, a coalition of 24 community groups organized by Communities for Excellent Public Schools issued a stinging critique of the federal government's "turnaround" strategies. "Our Communities Left Behind: An Analysis of the Administration's School Turnaround Policies" is a comprehensive critique that shows why those policies won't work and offers concrete suggestions as to what will turn around struggling schools. For a summary see an article in the Washington Post.

The lead segment on Democracy Now on July 30 was on the Race to the Top and included interviews with Diane Ravitch and Leonie Haimson from Class Size Matters and they respond to President Obama's recent speak at the National Urban League where he defended the Race to the Top program. It's worth the listen.

These critiques confirm what those of us who are in the classroom have seen for the past several years. Many federal education policies are actually hurting students, making it more difficult for teachers to provide quality education to those who need it most. It's time for a change.

Bob Peterson, Rethinking Schools