The Future is Yesterday

Saturday, March 21, 2009
By Sharad Goel

Independent teams from Yahoo and Google recently demonstrated that search volume for flu-related terms strongly correlates with CDC reported influenza levels, suggesting that search logs could be used to monitor public health. It is certainly tantalizing to think the same technology that shows Isaiah Thomas is “on fire” today (search-wise, that is), could also help detect the onset of a global pandemic. That possibility, in fact, prompted Steven Levitt to exclaim, “Google will save the world.” The problem is, you don’t really need a big, bad search engine to make flu predictions. Roughly speaking, as a predictor for flu, last week’s flu levels outperform today’s search queries.

So what’s going on? Well, first, neither study is actually predicting (or claiming to predict) future flu levels. Rather, they are predicting what will be reported in future CDC announcements. Each week the CDC collects epidemiological statistics from health care providers, and collecting, processing, analyzing and posting these data usually take about 1-2 weeks. So the actual flu level for the week of March 15, for example, may not be widely reported until the end of the month. What the flu papers show, then, is that search queries lead CDC reports by about 1-2 weeks, but still lag actual flu cases by about a day.

In predicting CDC reports, using search queries appears to do quite well–so well that no one seems to have asked how the search engine approach performs relative to other, more conventional methods. It turns out, perhaps surprisingly, that predicting flu levels is not as hard as it might seem: This week’s and last week’s flu levels together yield a better prediction of next week’s flu level than do next week’s search queries. Ok, this is a little confusing, but hopefully it’ll become clear with an example. Suppose I want to know the flu levels for the week of March 15-21. The CDC won’t report statistics for that week until late March or early April. On the other hand, by looking at query logs, on March 22nd I would have a pretty good sense of flu activity for that week. But, on March 22nd (or perhaps a few days later), the CDC will have already released statistics for the two weeks of March 1-7 and March 8-14. And data from these two weeks are a comparable (in fact, a bit better) predictor of flu levels during the week of March 15-21 than are search queries from that same week. The plot below shows how remarkably well the simple autoregressive model (i.e., the model based on the latest available CDC reports) predicts the real, retrospectively reported CDC data.

Predictions from an autoregressive model based on two weeks of retrospective data outperform a predictive model based on search query volume. Flu levels are for the South Atlantic Region, one of the nine CDC designated geographical regions.

Predictions from an autoregressive model outperform predictions based on search volume. Flu levels are for the South Atlantic Region, one of the nine CDC designated geographical regions.

To be clear, search data are still potentially useful for disease surveillance, particularly in countries where public health infrastructure lags behind Internet access. And a predictive model based on search and historical data does (slightly) outperform a model based on historical data alone. But in light of how well one does by simply looking to the past, predictions based on querying behavior are somewhat less impressive.

Tags: , , ,

  • Robin Hanson

    The sad thing is that the CDC takes so long to process their data. Seems pretty inexcusable to me.

  • dreeves

    Robin, true, which kind of drives home Sharad’s point: even such lagged CDC reports beat current search engine volume as a predictor of flu levels.

  • J DeLong

    I really like this type of analysis. It was pounded into my head in grad school to always compare a novel idea to the next best novel idea. You really do that well here. Good job. I’m enjoying your blog. Keep up the good work.

  • IanS

    Very cool. I saw a while back but wasn’t sure if they were comparing it to predictions CDC/WHO already makes. Quick question – do you know if those models are cross-validated? And good looking blog, btw!

  • Sharad Goel

    Ian, yes, training and testing are on separate data sets (the Google study trains on flu data from September 2003 – March 2007, and tests on March 2007 – May 2008 data).

  • Bart

    What about long term trends? At some point people querying the net for health-related searches at t=0 will have become knowledgeable about the epidemic, especially with a recurring disease like influenza. This could effect the number of queries, but not the incidence of the disease.

  • ZBicyclist

    I like this example, showing that sometimes older, simpler methods still work remarkably well.

    It seems to me that a hybrid system could be useful in a lot of health contexts — use the search engine results to suggest online polling that could be done on a more scientific basis. [and, yes, I did use “online polling” and “scientific” in the same sentence — I’m amazed as well].

  • Rajan

    I like this example too, because it shows how simple models can do really well.

    However, in my experience, sometimes you have to watch out with AR models, because sometimes they just say that the best least squares estimator (given an AR model) is to forecast that the next value is basically the same as the last. The error of such a model can be very good in numerical terms, but as a forecast it might not be so good, because it can “miss” (i.e. the forecast lags) every turning point in the time series. So, the forecast curve can look like the actuals shifted to the right by some time period. It’s hard to tell if that’s what is happening in the figure.

  • Sharad Goel

    Rajan, this is a great comment. Implicitly I’m assuming (as did the Yahoo! and Google studies) that squared error is the pertinent loss function, and by this metric, the AR and search models perform comparably. As you say, however, you may in fact only care about predicting turning points.

    The autoregressive model uses the last two reported flu levels. It essentially predicts that this week’s flu level is the same as last week’s level adjusted by either an upward or downward trend as inferred from the earlier week.

  • Rajan

    Thanks for the response Sharad.

  • Abhishek Tiwari

    Nice article and I am really enjoying this blog. I was curious if similar kind of predictions are possible for the next emerging pandemics like what Nathan Wolfe is trying to do through Global Viral Forecasting Initiative (

  • Pingback: celine bag replica

  • Pingback: waterproofing outdoor wood

  • Pingback: naruto shippuden manga

  • Pingback:

  • Pingback: daftar taruhan judi euro

  • Pingback: bandar euro 2016