This Post Won’t Go Viral

Sunday, July 31, 2011
By Sharad Goel

This picture is definitely going viral

Sometime during the late 19th or early 20th century, a simian immunodeficiency virus that infects wild chimpanzees made the jump to humans who hunted the animals. The mutated human strains spread from one individual to the next through intimate contact — usually unprotected sex or needle sharing — often leaving carriers absent of symptoms for extended periods while they continued to transmit the virus. By the early 1980s, large numbers of injection-drug users and gay men exhibited signs of compromised immune systems. These first clinically recognized cases of AIDS, later traced back to HIV, were the start of a global pandemic that has claimed the lives of more than 25 million people to date.

For every book or album purchased because of a personal recommendation, how many were bought after simply browsing the stacks, reading a review, or seeing an advertisement?

HIV/AIDS, like many other contagious diseases, exemplifies the common view of so-called viral propagation, growing from a few initial cases to millions through close person-to-person interactions. (Ironically, not all viruses in fact exhibit “viral” transmission patterns. For example, Hepatitis A often spreads through contaminated drinking water.[1]) By analogy to such biological epidemics, the diffusion of products and ideas is conventionally assumed to occur “virally” as well, as evidenced by prevailing theoretical frameworks (e.g., the cascade and threshold models) and an obsession in the marketing world for all things social. The view of adoption as a contagious process is quite appealing. We have all, for example, solicited our friends for book and music recommendations, affirming the role of social ties in product purchases. For every book or album purchased because of a personal recommendation, however, how many were bought after simply browsing the stacks, reading a review, or seeing an advertisement? Despite hundreds of papers written about diffusion, there is surprisingly little work addressing this fundamental empirical question.

In a recent study, Duncan Watts, Dan Goldstein, and I examined the adoption patterns of several different types of products diffusing over various online platforms — including Twitter, Facebook, and the Yahoo! IM network — comprising millions of individual adopters.[2] The figure below shows the structure and frequency of the five most commonly seen diffusion trees in each case. In all six domains the dominant diffusion event, accounting for between 70% to 95% of cascades, is the trivial one: an individual adopts the product in question and doesn’t convert any of their contacts. The next most common event, again in all six domains, is an independent adopter who attracts a single additional adopter. In fact, across domains only 1%-4% of diffusion trees extend beyond one degree.

Most frequently occurring diffusion trees in each domain

The vast majority of adoptions occur either without peer-to-peer influence or within one step of an independent adopter.

At this point you might wonder about the relatively rare trees not depicted above. What if, for example, one out of every thousand independent adopters spawned ginormous viral cascades? In that case, while it would still be true that most trees are duds, most adoptions would be part of a viral component. In such a world, the usual theoretical models of diffusion would be reasonably accurate. Alas, the world is not so. We find that across the six domains only 1%-6% of adoptions take place more than one degree from a seed node, meaning that the vast majority of adoptions occur either without peer-to-peer influence or within one step of such an independent adopter. Put another way, the cascade structures above account not only for most trees, but also for most adoptions.

In all the examples we study, diffusion seems remarkably un-viral, rarely spreading far from an independent adopter. Our results thus call into question the dominant, epidemic-like models of diffusion, and also the value of viral marketing campaigns. On a positive note, this observation makes life a lot easier. Instead of needing to describe, predict, or trigger a complicated viral process, one can focus on the much easier case of adoptions that spread at most one hop before terminating. It turns out that diffusion is not nearly as messy as you might think.

For more details, see our paper.

Bonus Puzzle

You have 27 vats of your new prototype, X-treme Water, exactly one of which is contaminated with the rare Hepatitis Q virus that kills you within a day. Fortunately, you also have 3 expendable marketing executives who managed your last viral advertising campaign. Find the contaminated vat in 2 days. (With m marketing executives and d days, how many vats can you handle?)

(We’ve also posted the official answers to “pawns on a chessboard” and “crashing Italian cars”.)

Footnotes

[1] Even HIV has a non-“viral” transmission route via contaminated blood transfusions, though it’s relatively uncommon.

[2] We study six examples. In the first three, we directly observe interpersonal diffusion, whereas in the remainder we infer diffusion from the underlying network of interpersonal connections and the temporal sequence of adoptions.

  1. Yahoo! Kindness was a website created by Yahoo!’s philanthropic arm that asked users to create status updates describing acts of kindness they had performed, after which these updates were propagated via Yahoo!, Facebook, Twitter, and other means in order to attract new users to visit the site and post updates of their own. We tracked diffusion of the website by associating each user with a unique site URL.

  2. Zync is a plug-in for Yahoo! Messenger built by Ayman Shamma that allows pairs of users to watch videos synchronously while sending instant messages to one another. We define adoption in this case as having initiated a video sharing session, not simply having participated in one, a choice that eliminates spurious dyads.

  3. The Secretary Game, built by Dan Goldstein, is a variant of the classic “secretary problem”. As with Yahoo! Kindness, user-specific URLs tracked player-to-player diffusion.

  4. Twitter. We collected all 36 million tweets containing bit.ly links that were first introduced during the month of September, 2009, and then traced the diffusion of each of these links over the Twitter follower graph.

  5. Friend Sense was a third-party Facebook app that queried respondents about their political views as well as their beliefs about their friends’ political views.

  6. Yahoo! Voice is a paid service that allows users to make voice-over-IP calls to phones through Yahoo! Messenger. Diffusion in this case is considered to occur over the Yahoo! IM network.

Illustration by Kelly Savage.

  • William

    Solution to the vats problem: With d days and m marketing executives, you can test (d+1)^m vats. For each k from 0 to n inclusive, select nCk groups of d^(m-k) vats, associate each group to a combination of k executives, and give water from each vat in the group of vats to each executive in the combination of executives. At the end of the day, you will know that the contaminated vat is in the group associated with the set of executives who died. Recursively apply the formula with the group of d^(m-k) vats on the m-k surviving executives.

  • http://markmaunder.com/ Mark Maunder

    Interesting research. I wonder if you could elaborate on the following:

    “whereas in the remainder we infer diffusion from the underlying network of interpersonal connections and the temporal sequence of adoptions.”

    Is it possible you’re ignoring brand/product awareness that 10 twitter invites from friends creates resulting in a delayed sign-up some time after the invites? Could you share what cut-off time you used in your calculations after which you considered the connection to not be the effective cause of a sign-up?

    Thanks.

  • john

    with 3 executives and 27 vats you will have to drink one of the last 2 remaining vats yourself …

  • http://friggeri.net Adrien Friggeri

    Did you take a look at something we’ve done 3 years ago : http://www.happyflu.com/ ?
    Creating a widget and tracking its diffusion on ~500 blogs.
    Cheers.

  • Ben Little

    Interesting findings. Do you have any way to account for diffusion outside the tools themselves (i.e., discussion offline or outside of the tools themselves)? Also, is there any look at these findings compared with content only?

    For example, my conversion to twitter came after a discussion with a colleague, but I never received an invitation. Similarly, I set my parents up on gmail, which is very different than how I came to gmail in the invite-only era of gmail youth.

    I’m also curious if these networks studied are the real success stories of viral spread. Other than Twitter, which brought me here, I can’t say I’m too familiar with the other products.

    I’m no evangelist for viral marketing; but I do want to understand your work in a bit more detail before I accept and digest the findings.

  • Dan

    I think perhaps that the viral model is still valid.

    In the biological realm, there are probably new strains of viruses being created continuously through random mutation. However, if these new strains don’t have all the right traits, they go nowhere and do not “spread virally”. We never hear about them because they’re dead in the water inside some individual that never gets sick and never spreads it to anyone else.

    I suspect the only difference between biology and marketing in this regard is that we can observe all the marketing failures more easily than we can the failed biological viruses.

  • http://bactra.org/weblog/ Cosma Shalizi

    Did I miss the link to your paper?

  • https://twitter.com/#!/ChetanChawla Chetan Chawla

    Super interesting, have 2 points to add:
    1) Long tail/ thin demand side: This ties into the hypothesis that the internet as a distribution medium supports the long tail of demand. More people can tap into their own idiosyncratic demands for thin slivers of products that don’t reach mass market acceptance. Hence the prevalence of individual adoption over all others.

    2) Viral/ heavy demand side: This also supports Nassim Taleb’s notion of Black Swan events. As “winner takes all” markets become more prevalent due to low cost net distribution. The top revenue generators lead to herding effects that dominate the wider market. Hence, its not surprising that when measured relative to overall diffusion patterns, the broad diffusion ones are rare. After all we have only 1 Harry Potter franchise and millions of books that never reach anywhere close to that level of herding/viral effects on the demand side.

  • Pingback: Is the idea of virility unwell? – Peter | Brian and the Juice

  • Frank

    Hmm. I can only figure out how to test 20 = 4 (tested by none on the first day) + 3*3 (tested by one) + 3*2 (tested by 2) + 1 (tested by all) given two days and three executives. I guess what john said above must be right, though I don’t consider myself expendable…

  • Pingback: Messy Matters – This Post Won’t Go Viral – Yostivanich

  • http://dreev.es dreeves

    Regarding the puzzle, john is incorrect! As proved rather abstractly, if not nonconstructively, by William.

  • http://www.mostlymaths.net Ruben Berenguel @mostlymaths.net

    Interesting research, found this via R-bloggers and the title caught. Any link for the paper?

    Ruben

  • Mehmet

    I guess I would start this problem by thinking about what each death tells us.
    The set of marketing deaths would be x,y,z,(x,y),(x,z),(z,y),null,(x,y,z), so we have six arrangements that can give us information. However, if all 3 go, we must have the answer. The Chief is not expendable :-)
    So if we go one branch down the tree, all three must drink from vat 27, and each set would need a unique group? x 5,y 5,z 5,xy 2, yz 2,zx 2, null 7 to bear the 26 remaining vats equally leaving optimal choices based on the results.(x, y or z deaths leave choose 4, combo leaves choose 2, null leaves choose 6, all leaves choose zero).
    From here 5 unique with choose 4 leaves 1
    2 choose 1 leaves 1
    7 choose 6 leaves 1
    Seems like in the end you must combine 2 vats and have someone drink, or I have done something wrong? Who cares, we loose one vat of product..

  • Pingback: Viral Videos: The 3 Myths and the Reality of Hitting the Motherlode | Mark Kithcart on Social Media & Online Marketing

  • http://blog.robwhelan.com Rob Whelan

    I see how the puzzle works. Expressing in long form, since my math is very rusty… though I imagine this is what William said above, rather more tersely.

    Each combination of executives is useful for a test, so if our execs are named Al, Bill and Chuck, on the morning of day 1 you have 7 possible groups to test (ABC, AB, AC, BC, A, B, C) — then let’s make it 8 groups, to cover the case where no one dies.

    Then we need to adjust the number of vats in each batch, because once an exec is dead, he can’t test on day 2.

    So ABC can only test 1 vat. If they all die, we have to know “it was that vat”.
    The pairs can each test 2 vats (a single remaining exec can find out which was the tainted vat, on day 2).
    The individuals can each test 4 vats, since a surviving pair of execs can make 4 groups (AB, A, B, none) the next day.
    And we can leave 8 vats to the case where none die.

    So for day one, we have
    ABC tests 1 vat
    (AB,BC,AC) test 2 vats, each
    (A,B,C) test 4 vats each
    8 vats untested for the “no one dies” case
    … that makes 27!

  • mehmet

    I miscounted, lol

  • Pingback: Twitted by danielsouza

  • Pingback: Going viral — not! « Statistical Modeling, Causal Inference, and Social Science

  • Pingback: “Diffusion seems remarkably unviral” « Epanechnikov's Blog

  • Ewan

    Perhaps there isn’t a paper yet, but any brief description of the methodology? What does the data look like? What does the graphical model look like?

  • Ben

    Some questions that stand out, having not read any paper and so therefore obnoxiously nitpicking something sight unseen:

    * Facebook: is it reasonable to use one app’s content diffusion on Facebook as a proxy for content diffusion on Facebook as a whole? I’d argue that it isn’t, there are different distribution channels on Facebook, different ways to implement an app, differently compelling app content, all of which should result in differing levels of “virality”.

    Twitter:
    * Is it possible there was a large amount of spam, diluting the # of diffusion trees (because no one chose to diffuse the spam links)? What does a sampling of the studied Tweets look like?
    * Ditto, but for content that just wasn’t successful – there was only one adopter of the bit.ly link (in other words – that no diffusion at all occurred).
    * What is the definition of an “adoption”? By “the vast majority of adoptions occur either without peer-to-peer influence or within one step”, do you mean that one person posts a link, someone who is not following that person reposts the link, and that constitutes an independent adoption? Or is the first posting of the bit.ly link itself an adoption, meaning that the case where two people independently post a bit.ly link = two adoptions? If so the pool of successful adoptions would be diluted by garbage content, or just un-viral content – content that had no distribution tree at all, viral or otherwise.

  • David Phillips

    As someone who has been living with HIV just shy of 29 years, I find the thoughtless and ignorant re-telling of the origin of the HIV pandemic very offensive. Specifically, you write “These first clinically observed cases of AIDS, later traced back to HIV, were the start of a global pandemic that has claimed the lives of more than 25 million people to date.” Actually, that sentence is both ignorant and in conflict earlier sentences.

    The cases of advanced HIV disease observed in 1981 were, in fact, the first to be clustered and presented to Western doctors who would “put the pieces together,” not those that began the pandemic. The wasting effects of HIV, commonly known as “slim,” had already been recounted in the oral histories of sub-Saharan Africa for over two generations, particularly among female sex workers in urban brothels. Economic conditions in Africa, driven by the hunger of colonial powers for timber, minerals, and other natural resources, drew people into urban centers early in the 20th Century as HIV was making its way from first human contact to the cities. There, HIV found ideal hosts among local sex workers and their transient partners, men who would often carry the virus to other cities and back to smaller villages. Unwittingly, these non-drug-using heterosexuals became the first cases of HIV disease that would spark the pandemic.

  • http://missinghumanmanual.com Rob Paterson

    I wonder if you may have overlooked the TRUST or closeness element? To get HIV Aids I have to have sex with another person or be born from an infected mother. To get cholera I have to drink infected water or get another’s feces on me and ingest it.

    Much of the connection on Twitter etc is very superficial. BUT if a real friend of mine was excited about an idea and told me I would listen.

    For instance – I have been on a Paleo diet for 9 months now. I blog a lot about it but don’t push it. Many of my real friends have seen how well I am – see that they share my original risk – and now are on it too. But I have had very little luck in persuading any who are more distant. They on the other hand are having some success in persuading their other real friends…

  • Alberto

    very interesting! Is it possible to have the paper reference??

  • Pingback: My Blog - Rethinking Information Diversity in Networks

  • Pingback: The Role of Social Networks in Information Diffusion | Webfinds of Bas Prohn

  • Pingback: Rethinking Information Diversity in Networks | Blog de Joaquin Gonzalez

  • http://adamgurri.com AdamGurri

    Yesterday, I tweeted a link, which was retweeted by my friend, and because of that, was retweeted by his professor who follows him. His professor is Tyler Cowen, so that resulted in many people who followed HIM retweeted it/sharing the link. That’s 2-3 degrees already, right? Or more?

    Today, a friend of mine shared a post with me, which I shared with the friend who retweeted it yesterday. He in turn shared the link on Twitter, which Tyler Cowen saw and then linked to on Marginal Revolution.

    It seems to me that either I live an extraordinarily improbable life, your study doesn’t correctly capture the odds that something will jump more than one or two degrees, or I’m completely misinterpreting your results.