<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: What Can Search Predict?</title>
	<atom:link href="http://messymatters.com/2009/11/30/what-can-search-predict/feed/" rel="self" type="application/rss+xml" />
	<link>http://messymatters.com/2009/11/30/what-can-search-predict/</link>
	<description>Bring Your Own Data</description>
	<lastBuildDate>Sat, 17 Jul 2010 16:50:31 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Revisiting predictions: Google and Avatar: Oddhead Blog: Prediction Markets, Gambling, Electronic Commerce, Artificial Intelligence: David Pennock: Yahoo! Research</title>
		<link>http://messymatters.com/2009/11/30/what-can-search-predict/comment-page-1/#comment-2195</link>
		<dc:creator>Revisiting predictions: Google and Avatar: Oddhead Blog: Prediction Markets, Gambling, Electronic Commerce, Artificial Intelligence: David Pennock: Yahoo! Research</dc:creator>
		<pubDate>Sun, 14 Mar 2010 04:06:28 +0000</pubDate>
		<guid isPermaLink="false">http://messymatters.com/?p=520#comment-2195</guid>
		<description>[...] swim, Slate journalist Josh Levin asked us to predict its opening weekend box office earnings using our models. We projected between $65 and $84 million. The actual number? $77 [...]</description>
		<content:encoded><![CDATA[<p>[...] swim, Slate journalist Josh Levin asked us to predict its opening weekend box office earnings using our models. We projected between $65 and $84 million. The actual number? $77 [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: How much is a song play worth? &#171; Music Machinery</title>
		<link>http://messymatters.com/2009/11/30/what-can-search-predict/comment-page-1/#comment-2043</link>
		<dc:creator>How much is a song play worth? &#171; Music Machinery</dc:creator>
		<pubDate>Mon, 01 Mar 2010 11:48:24 +0000</pubDate>
		<guid isPermaLink="false">http://messymatters.com/?p=520#comment-2043</guid>
		<description>[...] and millions of queries for music) along with deep expertise in analyzing and understanding what search can predict while The Echo Nest brings our understanding of Intenet music activity such as playcount data, [...]</description>
		<content:encoded><![CDATA[<p>[...] and millions of queries for music) along with deep expertise in analyzing and understanding what search can predict while The Echo Nest brings our understanding of Intenet music activity such as playcount data, [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bob Carpenter</title>
		<link>http://messymatters.com/2009/11/30/what-can-search-predict/comment-page-1/#comment-1105</link>
		<dc:creator>Bob Carpenter</dc:creator>
		<pubDate>Wed, 02 Dec 2009 19:51:15 +0000</pubDate>
		<guid isPermaLink="false">http://messymatters.com/?p=520#comment-1105</guid>
		<description>There are companies like &lt;a href=&quot;http://www.hmsinc.com/&quot; rel=&quot;nofollow&quot;&gt;Health Monitoring Systems&lt;/a&gt; that use natural language classifiers over emergency room chief complaints (short text descriptions of symptoms) to add predictors for bio-surveillance (e.g. tracking flu outbreaks, localizing botulism outbreaks, etc.)  

I wonder if search load would be a useful predictor given chief complaint data.  Or the richer data feed that CDC&#039;s now getting.

The problem I&#039;ve seen discussed relative to using search for bio-surveillance is that if a celebrity gets sick, searches for whatever they have spike.  That may not matter for forward prediction, or may itself be predictable given co-searches for the celebrity.</description>
		<content:encoded><![CDATA[<p>There are companies like <a href="http://www.hmsinc.com/" rel="nofollow">Health Monitoring Systems</a> that use natural language classifiers over emergency room chief complaints (short text descriptions of symptoms) to add predictors for bio-surveillance (e.g. tracking flu outbreaks, localizing botulism outbreaks, etc.)  </p>
<p>I wonder if search load would be a useful predictor given chief complaint data.  Or the richer data feed that CDC&#8217;s now getting.</p>
<p>The problem I&#8217;ve seen discussed relative to using search for bio-surveillance is that if a celebrity gets sick, searches for whatever they have spike.  That may not matter for forward prediction, or may itself be predictable given co-searches for the celebrity.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sharad Goel</title>
		<link>http://messymatters.com/2009/11/30/what-can-search-predict/comment-page-1/#comment-1094</link>
		<dc:creator>Sharad Goel</dc:creator>
		<pubDate>Tue, 01 Dec 2009 13:15:04 +0000</pubDate>
		<guid isPermaLink="false">http://messymatters.com/?p=520#comment-1094</guid>
		<description>Even though the CDC may have a 1-2 week delay in reporting ground-truth flu caseloads, there is still a lot of (non-search) information available for estimating current flu levels (e.g., CDC reports from the last few weeks). So the relevant question is, &lt;em&gt;At any instant in time, how much does search boost performance over a baseline  tracking model?&lt;/em&gt; As it turns out, assuming a 1 week reporting delay, the boost is negligible; and with a 2 week delay, there is real -- but still relatively small -- improvement. In other words, the real-time information (i.e., search volume) does not add much to the stale information (i.e., actual flu caseloads from a week ago). See &lt;a href=&quot;http://messymatters.com/2009/03/21/the-future-is-yesterday/&quot; rel=&quot;nofollow&quot;&gt;The Future is Yesterday&lt;/a&gt; for more discussion.

In any case, flu reporting lags may be a thing of the past, as the CDC has recently adopted a &lt;a href=&quot;http://science.slashdot.org/story/09/11/06/1217203/CDC-Adopts-Near-Real-Time-Flu-Tracking-System&quot; rel=&quot;nofollow&quot;&gt;near real-time flu tracking system&lt;/a&gt;.</description>
		<content:encoded><![CDATA[<p>Even though the CDC may have a 1-2 week delay in reporting ground-truth flu caseloads, there is still a lot of (non-search) information available for estimating current flu levels (e.g., CDC reports from the last few weeks). So the relevant question is, <em>At any instant in time, how much does search boost performance over a baseline  tracking model?</em> As it turns out, assuming a 1 week reporting delay, the boost is negligible; and with a 2 week delay, there is real &#8212; but still relatively small &#8212; improvement. In other words, the real-time information (i.e., search volume) does not add much to the stale information (i.e., actual flu caseloads from a week ago). See <a href="http://messymatters.com/2009/03/21/the-future-is-yesterday/" rel="nofollow">The Future is Yesterday</a> for more discussion.</p>
<p>In any case, flu reporting lags may be a thing of the past, as the CDC has recently adopted a <a href="http://science.slashdot.org/story/09/11/06/1217203/CDC-Adopts-Near-Real-Time-Flu-Tracking-System" rel="nofollow">near real-time flu tracking system</a>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bheema V</title>
		<link>http://messymatters.com/2009/11/30/what-can-search-predict/comment-page-1/#comment-1091</link>
		<dc:creator>Bheema V</dc:creator>
		<pubDate>Tue, 01 Dec 2009 10:43:28 +0000</pubDate>
		<guid isPermaLink="false">http://messymatters.com/?p=520#comment-1091</guid>
		<description>I thought the whole deal about the search volume based predictor was the &#039;real-time&#039; part.

The &lt;a href=&quot;http://research.google.com/archive/papers/detecting-influenza-epidemics.pdf&quot; rel=&quot;nofollow&quot;&gt;paper&lt;/a&gt; in Nature on Google&#039;s effort to track Flu status makes this point a number of time (one day lag when using search trend, vs. 1-2 week reporting lag when waiting for CDC to respond).</description>
		<content:encoded><![CDATA[<p>I thought the whole deal about the search volume based predictor was the &#8216;real-time&#8217; part.</p>
<p>The <a href="http://research.google.com/archive/papers/detecting-influenza-epidemics.pdf" rel="nofollow">paper</a> in Nature on Google&#8217;s effort to track Flu status makes this point a number of time (one day lag when using search trend, vs. 1-2 week reporting lag when waiting for CDC to respond).</p>
]]></content:encoded>
	</item>
</channel>
</rss>
