Summary, Judgment

"Chocolate Accelerates Weight Loss!" Or, why to worry about randomization.

Legal ScholarshipAdam Chilton

In 2008, a program was launched called “Secure Communities” that sent information about anyone arrested by local police departments to the Federal Government so that their immigration status could be checked. The program had been billed as an effort to “secure” our communities by increasing immigration enforcement, and in turn, reducing crime.

In a 2014 paper, Adam Cox and Tom Miles examined whether the Secure Communities program actually reduced crime. The paper leveraged the fact that the program was rolled out gradually county by county to test its effects. The paper made a big splash because it found that increased focus on deportations wasn’t accomplishing much.

Cox and Miles’ paper wasn’t just substantively important — the research design has become influential too. Why? Their paper showed that the roll out was haphazard in a way that made it a quasi-random source of county-level variation in government policy. This kind of variation is what makes causal inference, and thus publication in peer reviewed journals, possible.

So, naturally, other scholars started using the roll out of Secure Communities to study other topics. For instance, Ariel White wrote a paper looking at the effect of Secure Communities on Latino vote turnout; Marcella Alsan and Crystal Yang looked at the effect of Secure Communities on take up of social insurance programs; Nest at al. explored the effect of Secure Communities on employment patterns for low-education native workers; and Dee and Murphy looked at the effect of Secure Communities on school enrollment. The research design was even used by Hines and Peri to study the effect of Secure Communities on crime (which, if you’re thinking that sounds an awful lot like the original Cox and Miles paper, you’d be right).

Why am I bringing this up? The Regulatory Review has been running a series of essays about the very sensible idea of trying to encourage the government to incorporate more randomization into their policy implementation. The hope is that by randomizing—like the way that the roll out of Secure Communities was staggered—it will be possible for scholars to evaluate the effect of programs in a rigorous way.

In general, I’m totally on board with this idea. Randomization makes it possible to do causal inference, and causal inference makes it possible to know if policies are working. But we do need to be worried that the proliferation of studies that will follow will start to produce bogus results. Here’s why.

As I explained in my essay for the Regulatory Review series , when researchers look for the effect of of a policy in a lot of places, it runs the risk of a problem called Multiple Hypothesis Testing (“MHT”). The concern with MHT is that statistically significant results happen randomly 5% of the time, so if we look at the effect of an intervention 20 times, we’re likely to find 1 bogus result.

My favorite example of this is the chocolate weight-loss hoax. To prove that newspapers will publish anything scientific sounding without thinking, a scientist/journalist conducted a study where people were randomly assigned to eat chocolate. The researchers then measured 18 outcomes for the people in the study. The study, predictably, found that one of the 18 variables was statistically significant thanks to random chance. An academic paper was published in a “predatory” journal based on the study, and newspapers around the world published stories about the finding with headlines like “Chocolate Accelerates Weight Loss”.

What does this problem have to do with government randomizing policy? The worry is that researchers are drawn to randomized policy interventions like moths to a flame. So when policies are randomized, people study them from every possible angle. And a lot of people looking for outcomes from the same intervention means we are naturally going to start getting some results due to the multiple hypotheses testing problem.

For instance, if studies keep looking for the effect of Secure Communities in more and more places, some of the results are going to be bogus. Not because the researchers are being nefarious, but just because of random chance.

———

If you’re interested in the topic, check out my essay and the rest of the series being published by the Regulatory Review. And shout out to Colleen Chien for writing the essay that inspired the series and inviting me to contribute. Thanks Colleen!

Restatements Should be Transparent

Legal ProfessionAdam Chilton

In a 2015 case, Kansas v. Nebraska, Antonin Scalia wrote a separate opinion to offer a scathing assessment of the value of Restatements:

I write separately to note that modern Restatements—such as the Restatement (Third) of Restitution and Unjust Enrichment (2010), which both opinions address in their discussions of the disgorgement remedy—are of questionable value, and must be used with caution. The object of the original Restatements was “to present an orderly statement of the general common law.” Over time, the Restatements’ authors have abandoned the mission of describing the law, and have chosen instead to set forth their aspirations for what the law ought to be. Restatement sections such as that should be given no weight whatever as to the current state of the law, and no more weight regarding what the law ought to be than the recommendations of any respected lawyer or scholar. And it cannot safely be assumed, without further inquiry, that a Restatement provision describes rather than revises current law. [citations omitted]

Two years later, Will and I wrote a paper with our colleague Anup Malani arguing that anyone trying to make descriptive claims about the state of the law on a given topic should consider conducting a so called “systematic review.

The basic idea of a systematic review is to transparently lay out the process you used to reach your conclusion. For instance, if I wanted to know Scalia’s opinion was on Restatements, I could either just say “Scalia hated Restatements, see Kansas v. Nebraska.” Or, I could conduct a systematic review by searching every opinion and academic article written by Scalia for mentions of the Restatement, and then document whether he discussed Restatements favorably or not. Obviously, that kind of comprehensive review is a lot more time intensive, but it also reduces error. So if I really cared about Scalia’s opinion of Restatements (which, for the record, I don’t), it might be worth investing the effort.

In the article, we suggested that Restatements are exactly the kind of instance where the costs of conducting systematic reviews are worth it. When writing our paper, we were inspired in part by the methods that the reports for the Restatement of Consumer Contracts — Oren Bar-Gill, Omri Ben-Shahar, and Florencia Marotta-Wurgler — were using to code cases as part of their efforts to accurately describe the law.

We thought that reviews of the law would be more accurate because of this kind of coding, but also that the added transparency would allow replication. Or, as we put it at the time: “being explicit about the method is almost as important as the method itself, because transparency allows others to replicate the review author’s work, ensuring that the review was not manipulated and increasing confidence in the review’s conclusions.”

Since writing that, it’s been exciting to see that kind of replication play our with the Restatement of Consumer Contracts. The reporters were transparent about their process of coding cases, and Gregory Klass and others then sought out to reassess their claims through replication. In doing so, they raised a number of concerns with the original analysis, which in turn lead to a response from Oren, Omri, and Florencia explaining how the replication effort still came to the same substantive conclusions.

And although there has been some controversy over the Restatement of Consumer Contracts, their work demonstrates the kind of concrete academic exchange that is possible when reporters take the time to be rigorous. So although I’m not sure on the answer to every questions Will raised in his post, but I know that whatever reporters do, they should be transparent.

How Should Restatements Restate the Law?

Legal ProfessionWilliam Baude

Two weeks ago I spent the day at a meeting for the project to create a Third Restatement of Conflict of Laws. For the reasons I discussed on another blog long ago, I remain unsure that another Restatement of Conflict of Laws is a good idea, but working on the project has made me think about how tricky the project of Restatements is.

One much-discussed problem is the relationship between positive and normative analysis. Most of the time the restatements describe the majority rule, but sometimes they instead recommend a minority rule or a new rule. As I understand it, the general practice is that this is fine, so long as it is explicitly disclosed and normatively justified. But even putting that kind of explicit normative change to the side, a number of confusing problems come up in deciding what a Restatement should be restating:

One issue is the denominator problem. Restatements are very long and have lots and lots of detail. So for any given rule, it’s possible that most states simply haven’t clearly stated what their rule is on that specific topic. If so, the “majority rule” might actually be only a handful of cases. But those cases may or may not be representative of the logic or likely outcomes in most states.

A related issue is the combination problem. Suppose a majority of jurisdictions follow rule X. And a majority of jurisdictions follow rule Y. That doesn’t mean that a majority of jurisdictions follow rule X+Y. Indeed, depending on the denominator problem, it’s possible that none do. And of course this issue scales up across the whole Restatement.

A final issue is the relevance of statutes. Restatements generally focus on common-law topics, but sometimes the state legislatures have adopted statutes changing the common-law rule. Should one simply restate the common-law rule that would have applied in the absence of the statute, or use only the non-statutory states as the denominator? Or should one try to Restate a new common law rule that matches the prevalent statutory rule? My instinct is that the former is better, because it treats the decision whether to pass a statute as meaningful, but I think the latter is common.

Even the purely descriptive parts of a Restatement project can subtly transform the common law.

Citation Rankings and the Human Touch

Legal ScholarshipWilliam Baude

I was pondering Adam’s post last Friday about measurement error in law school rankings and then I thought about his posts earlier in the week about human v. computer judges and referees. I wonder if those latter posts provide the best approach to the citation/rankings problem.

Given the imperfections and transparency of citation rankings, they will be gamed in troubling ways. But they still provide important objective evidence that is missing from the current rankings system. Maybe the solution is this: Give the faculty citation counts to some humans, and ask them to use the citation counts to decide scholarly ranking. We could do this with the current survey group for scholarly reputation at US News, or we could do it with a different group of people if we trusted them more for some reason.

The advantages are obvious. The human beings could average, generalize, or combine across multiple rankings systems, and could take into account . They could make some of the tradeoffs Adam describes between junior and senior faculty. And they’d make it harder to game the rankings, because they’d be able to adjust for apparently strategic behavior.

Of course, the problem is that the humans probably wouldn’t be objective enough, and that plenty of humans probably don’t agree that citation counts are all that relevant to scholarly quality, so they might refuse to cooperate in the project. Still, just like asking judges to use data to assign sentences, it might be the best we can do.

Citation Rankings and Measurement Error

Legal ScholarshipAdam Chilton

There’s been a lot of recent debate about ranking law schools based on their faculties’ citations. The U.S. News and World Report has announced plans to incorporate citations into their overall ranking, and Paul Heald and Ted Sichelman have just released a new paper providing exactly that kind of ranking.

Both of these rankings rely on citation counts from HeinOnline. (note: Heald and Sichelman also use SSRN downloads in their rankings.) As many have pointed out, relying on HeinOnline does not measure all of law professors’ citations. Instead, it measures citations to articles published in HeinOnline by other articles published in HeinOnline. If an article published by the Fancy Law Review is cited 100 times by articles published by the Prestigious Law Journal, this isn’t a problem. HeinOnline would pick up all 100 citations. And because most law professors publish most of their scholarship in law reviews carried by HeinOnline, this isn’t a problem most of the time.

But it is a problem some of the time. For instance, if a law professor publishes a book that receives 100 citations, HeinOnline would not pick up any of them. So law schools that have relatively more professors writing books are going to be lower ranked than they should be just because of how the citations are measured for the new rankings. In other words, the proposed new rankings have measurement error.

Of course, measurement error is a reality for anyone working with data, and normally researchers typically don’t get bent out of shape about it. This is because measurement error that is random might lead to distortions, but it’s not going to lead to systematic problems. And when the measurement error is non-random, researchers can just explain to readers the ways that the error is going to bias their results.

But there are a lot of researchers getting bent out of shape about the measurement error in the potential U.S. News and World Report rankings. And I’m one of them. This is because non-random measurement error in rankings creates the potential for gamesmanship. If rankings systematically undercount the work of people that publish in books or in journals that are not indexed by HeinOnline, there will be less of a market to hire these scholars.

This problem is exacerbated by the fact that so many aspects of U.S. News and World Report rankings are extremely sticky. Law school deans can’t snap their fingers and change the median LSAT scores and GPAs of the students that attend their schols. These things move very slowly over time. But they can try to hire scholars with more HeinOnline citations at the margins. The result is that non-random measurement errors in rankings will transalte into distortions of the academic labor market. This will in turn distort our core mission: the production and dissemination of knowledge.

If you care about the ranking debates, Jonathan Masur and I recently posted a short paper on SSRN where we explain this concern and lay out a few more. You should also check out Paul and Ted’s own paper where they explain the numerous steps they’ve already taken to reduce measurement error, and laid our their plans to reduce it even further in the near future. And, although I’ve got concerns about current measurement error in citation rankings, I want to end by saying Paul and Ted are being extremely thoughtful about how to produce rankings as transparently and accurately as possible.

The Market for FedSoc

Legal ProfessionAdam Chilton

Will’s post arguing that the Federalist Society is a network reminds me of a long running debate about why there isn’t a liberal network of lawyers of similar stature. Sure, most law schools have American Constitution Society chapters; but it’s common to hear liberal students lament that joining the ACS isn’t even a path to getting a good fed courts outline, that alone a good clerkship. FedSoc, on the other hand, has always had the well-earned reputation of being extremely effective at opening doors for its members. (At least, that’s the impression from the outside—I’m not a member.)

What explains the difference?

In an article published in Politico earlier this year, Evan Mandery presented the standard explanation of why the FedSoc is more influential than the ACS. Mandery explains that the FedSoc has three advantages over the ACS: (1) it is older, (2) advances an agenda more appealing to rich donors, and (3) has a unifying ideological commitment (originalism) that brings conservatives together.

These arguments all miss the mark. Simply put, there is a market for FedSoc; there isn’t a market for a liberal equivalent.

The reason for this discrepancy is that the legal profession is overwhelmingly liberal. In The Political Ideologies of American Lawyers, my collaborators and I find that over 60 percent of lawyers are liberal. And our subsequent research on law clerks and law professors suggests that more like 75 to 85 percent of elite lawyers are liberal.

The result of this ideological skew in the profession is that anyone hiring lawyers without respect to ideology is going to hire liberals most of the time. This means that people trying to hire for ideological reasons are going to make a lot of “mistakes” if they hire without knowing the candidates ideology. Additionally, the people that are trying to hire for ideological reasons—e.g., the people in charge of selecting political appointees or picking new judges—know perfectly well that lawyers are smart enough to be coy about their ideological commitments when good opportunities depend on coming off as conservative for a short period.

So how can people trying to hire conservatives make sure they don’t accidentally give good jobs to liberal lawyers? Make conservative lawyers send a costly signal that demonstrates their ideological bonafides. Joining FedSoc and attending talks, debates, and social events for years is that costly signal. There was a demand for providing that signal, and the FedSoc met it.

Liberal decision-makers don’t need to rely on the same kind of costly signals when hiring; they can just assume lawyers are liberal and be right most of the time. And they especially don’t need to bother with costly signals once conservatives are taking on the heavy lifting of doing the ideological sorting. Want to hire a liberal law student as your law clerk? Just make sure you don’t hire someone in FedSoc.

So any explanation for why ACS is less influential than FedSoc that focuses on why it is tough for liberals to organize misses the point. The difference in influence isn’t because liberals can’t get their act together; it exists because they don’t need to.

FedSoc is a They, not an It

Legal ProfessionWilliam Baude

The Federalist Society has been in the news (and in my Twitter feed) a lot lately, as people criticize both things that happened at a national convention last week, and things that have been said and done by a couple of its officials, especially Leonard Leo and Steven Calabresi.

This has led to claims that the Society is in fact a partisan organization because of its supposed role in picking judges, to calls that the organization disavow or denounce various things, and to arguments that members of the society have some moral culpability for what other members of the society do.

I am a member of the Federalist Society, but I don’t see things this way and thought I’d try to explain why. As I see it, the Federalist Society is essentially a network that connects thousands of scholars, students, and lawyers. There is obviously some intellectual valence to that network — it is not a random network — but it’s usually a mistake to discuss the network as a collective noun.

Thus, I don’t think it’s right to say that the Federalist Society picks judges. Some judges have been members of the Federalist Society, and so have some people who participated in the selection process. And sharing a network may well make some of those judges more likely to be picked by others in the network. (This is not going to happen to me, to be clear.) But the society doesn’t do anything. Individuals like Leonard Leo and Don McGahn do.

Similarly, I think it’s a mistake to expect the Federalist Society to take official positions beyond, perhaps, its relatively open-ended mission statement. Because the Society is not a legislative, adjudicative, or deliberative body it doesn’t really have a mechanism for taking positions. The positions are held by members of the network. And for the same reason, the fact that one member of the Society, or even an official of the society, has taken a position doesn’t attribute it to the others or to the group.

Finally there is the question of collective responsibility. Unlike the previous two points, I don’t think we can dismiss that out of hand. Maybe there is some kind of collective responsibility to abandon a network or group if you disagree with enough people in the group over enough sufficiently profound issues. Or maybe there is at least a duty to publicly comment on the behavior of other members of the group. But I find thinking of the group as a network helpful in framing these questions. By being part of a network the main thing one is offering is not political power or official endorsement, but one’s own willingness to freely associate.

Agreed, Some Law Professors are Trying to Ruin Sports

Adam Chilton

I agree with Will that consistency in refereeing is a good thing. It’s infuriating to see a home team get nonsense calls when you’re routing for the road team.

But consistency is just one goal that sports leagues are trying to maximize; they are, and should, also be trying to maximize entertainment value. if getting every call right requires stopping games to view every play in slow motion from angles that the refs couldn’t see, at some point it’s just not worth it. If the games are too long and boring, they aren’t worth watching. Leagues know full well that balancing these competing goals is a reality they have no choice but to deal with.

The exact same thing is true in adjudication. It’s important to try and be consistent, we should be outraged when there is overt bias for one group at the expense of another, and we shouldn’t ignore clear evidence of violations (at least, most of the time).

But the judicial system is designed to promote values other than just consistency. We do, and should, care about administrative costs and social consequences when making trade offs about how to manage cases. The Stevenson and Doleac paper I blogged about yesterday suggests that judges understand this the same way that the NBA does.

How Being a Law Professor Ruined Watching Professional Sports for Me

Legal ScholarshipWilliam Baude

Adam’s post about the differences between umpires and referees reminds me of a provocative article by Mitch Berman, “Let ‘em Play” A Study in the Jurisprudence of Sport, as well as these two recent blog posts by Dave Pozen, What Are The Rules of Soccer?, and The Rulification of Penalty Kicks. These pieces all explore the gap between the “law in the books” and the “law in action” in certain professional sports. Though the rules don’t say so, many of us expect the officials to systematically deviate from or underenforce the rules under certain conditions. The analogy to law is natural.

I hate to be a spoilsport, but reading these articles helped me understand what I always found so frustrating about watching, say, the NBA finals. I love basketball, and I live in Chicago, but I still think Michael Jordan should have been held to the same number of steps as everybody else on the court. And I don’t like watching the rules get suspended during the tense final quarter. I think we all agree that the players, not the refs, should be the center of the action during the climax of a game. But in my view by deviating from the duly promulgated rules, even to avoid enforcing them, the refs make their own judgment all too central.

Of course, these views probably will not surprise people who know me, since I am a formalist when it comes to judicial interpretation too. And it seems like I’m in a decided minority. But we shouldn’t take the system of referee discretion for granted.

Judges as Referees

Legal ScholarshipAdam Chilton

Several recent papers have found that algorithms are better at predicting human behavior than judges. In one high profile example, Kleinberg et al. used an algorithm to re-evaluate decisions to grant defendants pre-trial release made by judges in New York City from 2008 to 2013. They showed that an algorithm given variables about “characteristics of the defendant’s current case, their prior criminal record, and age (but not other demographic features like race, ethnicity, or gender)” could dramatically out-perform the actual decisions made by judges. As the authors put it, relying on their algorithm could have produced “crime reductions up to 24.7% with no change in jailing rates, or jailing rate reductions up to 41.9% with no increase in crime rates.” 

Their finding is sufficiently thought provoking that Malcom Gladwell’s used it as a motivating example for his new book, Talking to Strangers. The relevance to Gladwell’s argument is that the judges have access to all the information the researcher input into the algorithm, but they also are able to look the defendant in the eye when assessing their character. But even though the judges have access to more information, the humans are just systematically worse at decisionmaking than computers.

This pessimistic evidence about the quality of judicial decisionmaking reminds me of John Robert’s analogy of judges as umpires. If what we are after is calling balls and strikes, it can now be done more accurately, quickly, and cheaply by computers than umpires.

But a paper released yesterday suggests that maybe judges are better thought of as referees than umpires. Megan Stevenson and Jennifer Doleac’s paper examines how judges in Virginia that were given algorithmic risk assessment scores changed the way they made sentencing decisions. They found that judges’ decisions were influenced by the information: judges gave defendants with higher risk scores longer sentences and defendants with lower risk scores shorter sentences.

However, the judges’ deviated from the risk scores in an important way. Despite high risks of recidivism, the judges systematically gave young defendants more lenient sentences. This deviation leads Stevenson and Doleac to persuasively conclude that the judges’ have goals other than just predicting recidivism in mind when they are making predictions.

This makes judges seem more like basketball referees than baseball umpires. When refs are calling a basketball game, most fans are open to the idea that the refs might call the game differently depending on the circumstances. When the stakes are high—at the end of the game, in the playoffs—we’re often cool with the umpires giving the players more leeway. If this is the right analogy, maybe it’s a little unfair to say that judges aren’t doing a great job of calling balls and strikes when they are actually playing a different game.

What is this?

Adam and Will

Once upon a time, the internet was saturated with law blogs. A lot of these dried up for one reason or another and much legal commentary has moved to Twitter and other social media sites. But we think law blogging remains useful and that the internet could use more of it.

We intend to use this blog to comment on three major topics -- legal developments, legal scholarship, and the legal profession. That will include some of our own scholarly interests, as well as law school and lots of other things.

We are friends and colleagues at the University of Chicago, so we have plenty in common, but we also have very different scholarly specialties and methodologies, so we will likely have some things we disagree about as well.

Watch this space for new posts every week.