KSJ Tracker March 18, 2014

Nate Silver's new FiveThirtyEight dishes out statistical nonsense on health coverage.

Nate Silver

Nate Silver's fivethirtyeight.com relaunched yesterday at its new home--ESPN--vowing to focus its coverage on five areas: politics, economics, life, sports--and science.

The inclusion of science was a surprise to me. And possibly a mistake, unless FiveThirtyEight can quickly improve the quality of the "science" it's publishing. The lead story on the relaunched site's first day--"Finally, a Formula for Decoding Health News"--was abysmal.

Silver's most famous achievement was calling 50 states correctly in the 2012 presidential election. But in a manifesto entitled What the Fox Knows, Silver says some others did nearly as well, and that his election forecasts "didn’t represent the totality, or even the most important part, of our journalism at FiveThirtyEight. We also covered topics ranging from the increasing acceptance of gay marriage to the election of the new pope, along with subjects in sports, science, lifestyle and economics." He continued:

Relatively little of this coverage entailed making predictions. Instead, it usually involved more preliminary steps in the data journalism process: collecting data, organizing data, exploring data for meaningful relationships, and so forth. Data journalists have the potential to add value in each of these ways, just as other types of journalists can add value by gathering evidence and writing stories.

Silver reports that he has expanded his reporting staff from two full-time journalists to "20 and counting," and that the coverage will span the five areas mentioned above. Interestingly, two of the three reporters identified on FiveThirtyEight's science page are academics: Emily Oster, an associate professor of economics at the University of Chicago, and Roger Pielke Jr., a professor of environmental studies at the University of Colorado, and a familiar name to those who cover the environment. The third reporter is Ritchie King, identified as "a visual journalist and science reporter" whose Twitter page says he "writing a book about JavaScript's D3 library" (and if you don't know what that is, go to your room).

Among the initial offerings are tepid stories on the protection afforded by toilet seat covers (not much); whether British teeth are worse than American choppers (they're not); and others on peer review, exercise, and how many calories you burn having sex, which must have strayed here enroute to the website of Woman's Day, which scooped Silver on this important topic.

One short piece by Mona Chalabi, the lead writer for FiveThirtyEight's DataLab, notes that recent headlines claiming that eating meat could be as deadly as smoking cigarettes vastly overstated the actual research findings. Her story doesn't rely on data analysis, and the criticism has been made by many others.

But it's when we turn to the lead science story that we find real trouble. The story, on evaluating health news, offers the usual cautions about overstated headlines. But the author, Jeff Leek, an associate professor of biostatistics and oncology at Johns Hopkins and a blogger at Simply Statistics, fell off a ledge trying to support his cautions with data analysis.

"As a statistician," he writes, "I use a simple computation based on Bayes’ rule to combine my gut feeling about a piece of health news with information about the study it comes from."  Impressive--an equation to tell us when a headline is overstated. How simple. How elegant.

And as we'll see, how silly.

Here's Leek's formula:

Final opinion on headline = (initial gut feeling) * (study support for headline)

Problem No. 1: How, Prof. Leek, do I measure my initial gut feeling? This is data analysis. This is FiveThirtyEight. We expect precision and statistical rigor here. Leek's answer: "If you think the odds the study is true based on your gut are 4 to 1, then your initial gut feeling will be 4. If you think the odds are 1 to 10 against the study being true, then your initial gut feeling will be 1/10."

My gut instinct is that Leek is up to no good here. But what number would I put on that? The odds that Leek's equation is flaky are, what, 100 to 1? 1,000 to 1? What if I'm certain that his equation is flaky; what number do I put on that?

Leek's six questions for how to determine the second factor--whether the study supports the headlines--are fine. "Was the study a clinical study in humans?" ("Clinical study" generally means humans, but OK; it's a good question.) Was the study randomized and controlled? Did it have a lot of patients? And so forth. All good criteria, but as far as I can tell, the answers are not something that should be put into an equation. Why these six questions, and not six others? Did Leek test them for their statistical validity? No. And he admits that the questions are "not the only important characteristics of a study." So why does Leek make them the defining quantities in his equation?

And how do we turn the answers to those questions into a number for Leek's formula? If the answers to all six questions are "yes," the study gets a 6. Then Leek multiplies every yes by 2, and every no by 1/2. He doesn't say why. Then he multiplies all of those numbers together, and multiplies that by his gut feeling.

This is as good an example of garbage in, garbage out as I've seen in a long time.

The thing I can't figure out is: Leek is a statistician at Johns Hopkins. And he's dishing out a lot of quasi-statistical nonsense that not only couldn't predict the outcome of a presidential election, it couldn't predict the winner of a second-grade talent show.

I applaud Silver for bringing science under his statistical umbrella, but if he wants to show the same rigor in his science coverage that he has shown in his electoral predictions, he has a very, very long way to go.

-Paul Raeburn


comments powered by Disqus