When Academics Attack: CELS and the Death Penalty
posted by Dave Hoffman
I’m writing from Ithaca, NY, where I’m attending the third annual conference on empirical legal studies. Much talk of heteroscedastic error, t-statistics, sampling bias, Gary King’s most recent programming innovations, and the need for peer review. And also one of the more vigorously contested academic panels I’ve ever seen.
The session title was Law and Criminal Procedure. The paper, by Hashem Dezhbakhsh and Paul H. Rubin, was From the ‘Econometrics of Capital Punishment’ to the ‘Capital Punishment’ of Econometrics: On the Use and Abuse of Sensitivity Analysis. The commentator was Justin Wolfers. If you know anything about the debate, you can imagine what followed. Dezhbakhsh and Rubin complained about (purported) errors in a paper Wolfers wrote with John Donohue, which in turn had criticized earlier D/R work finding that the death penalty deterred. Wolfers responded vigorously, though, to his credit, with much more poise that I would have, had I been so personally attacked.
The bones of contention are many, but I think boiled down to the following key points.
1. W-D critiqued D-R and others because their models were fragile, i.e., if you removed outlying data (like executions in Texas), or changed other seemingly crucial assumptions, the significance and even direction of the predicted effects would flip. D-R’s response was, basically, so what? We know that OLS is highly sensitive: the right response is not to drop observations (like Texas) but rather determine less radical ways to deal with outlying data. Moreover, D-R pointed out that only four states’ characteristics changed the model’s effect, two of which W-D relied on, and suggested that W-D data mined to find particularly bad examples for the model. To which W-D responded that if you look at the distribution of error terms, it was D-R, not W-D, who are guilty of mining.
Verdict: very hard to know without reading the D-R paper, but it struck me as significant that D-R were willing to admit that their model was fragile to manipulation and that Texas represented such a dominant cluster of data. This isn’t the right way to think about it, but what if it were true that the death penalty deterred, but only in cultures that looked like Texas? This openness to the fragility of their claim casts significant doubt on what I saw as the very aggressive rhetorical posture advanced by D-R in their earlier work, not to mention the ways such work has been enlisted politically. But maybe others had a different view of this concession, to the extent one was made.
2. D-R asserted that W-D had been able to replicate their findings, to which Wolfers conceded that under a very cramped definition of replication, i.e., using D-R’s data and D-R’s script on Wolfers’ computer, then indeed he had replicated the findings. But he had been unable to do so more broadly.
Verdict: I’m with Wolfers. Replication shouldn’t mean just re-running a .do file. On the other hand, later in the day Lee Epstein offered a nice speech about Exxon’s infamous footnote 17, in which she suggested the replication might mean nothing more than the ability to re-create work using the original author’s precise methods, and so perhaps D-R’s view dominates outside of the legal academy.
3. Wolfers asserted that D-R had used the same instrumental variables in multiple studies. Thus, for example, in one paper they assumed that republican vote share influenced homicide rates only through its effect on gun carry laws; in another, they assumed that vote share only influences through its effect on capital execution rates. (I think I have these relationships right, if I don’t, forgive me, it’s been a long day.)
Verdict: I don’t think that D-R had a complete response to this critique, apart from saying that it was common practice. At this point, if not earlier, the discussion became notably personal. D-R accused Wolfers of concealing W-D’s findings, of not submitting W-D’s work to peer review, of data mining, of manipulating findings, and a host of other sins. Wolfers rejoined that it was D-R who had not made data available (and produced some emails to that effect), and, much more significantly, had offered no response on the merits to the central critiques of the Stanford Law Review piece. In Q&A afterward, a criminologist said something like “this kind of dispute about methods makes me think that economists are full of nonsense., since it replicate a pattern we see often: strong claims, followed by methodological sniping, followed by animus and a retreat to theory.” D-R in response argued for more education of consumers of empirical work so consumers could tell good from bad work; Wolfers said that consensus did exist, if you asked a wide sample of econometricians.
I know that the above sounds pretty technical and dry, but it wasn’t in person. It was like watching a very elegant car wreck, or your parents fighting over the taxes. Technical jargon, buried normative moves, and emotion, all knotted together.