Birds of a Feather Shop Together

Wednesday, September 1, 2010
By Sharad Goel

(Cross posted at Decision Science News.)

Do you know what the #$*! your social media strategy is? Perhaps it’s “to facilitate audience conversations and drive engagement with social currency”? Or maybe “to amplify word of mouth by motivating influencers”? Well, given all the lies and damned lies being told about social, fellow yahoo Dan Goldstein and I decided to enter the fray with statistics. We measured the extent to which your friends’ behavior predicts your own, and found that in several consumer domains the effect is substantial, complementing traditional demographic and behavioral predictors.

That friends are similar along a variety of dimensions is a long-observed empirical regularity—a pattern sociologists call homophily. As McPherson et al. write in their canonical review on the subject, “homophily limits people’s social worlds in a way that has powerful implications for the information they receive, the attitudes they form, and the interactions they experience.” Turning this statement around, where there is homophily, one can in principle predict an individual’s behavior based on the attributes and actions of his or her associates.

To assess the quality of such network-based predictions, we merged a large social network (based on email and IM exchanges) with offline sales data at an upscale, national department store chain. Thus, for each of over one million users, we had their past purchase amounts in dollars, and had the same information for each of their network contacts. Think about this for a minute: we not only know how much these individuals themselves spent at an offline retailer, but also how much their social contacts spent, a testament to how profoundly the Internet is changing the way we study human behavior. (Despite bolstering social science research, these newfound tools raise serious privacy issues. We left the matching to a third party that specializes in doing this securely, so neither we nor the department store had access to the other’s complete customer database.)

The plot below summarizes our findings. First, as indicated by the top line, consumers whose friends spent a lot, also spent a lot themselves, consistent with the hypothesis that homophily extends to consumer behavior. When friends (alters) on average spent $400 during the six-month observation period, the consumer herself (ego) spent nearly $600, more than twice the typical consumer (indicated by the dotted line). As our aim is prediction, however, the relevant question is not just whether friends are similar in their purchasing behavior, but rather how much information is conveyed by social ties relative to other attributes. One might conjecture that ties simply indicate demographic (i.e., age and sex) similarity, that those who spend a lot are more likely to be middle-aged women—the primary market segment for this department store—and that friends of middle-aged women tend also to be middle-aged women. To test this hypothesis, we first paired each individual with a randomly chosen consumer of identical age and sex. The bottom line shows that this demographically matched group is, perhaps surprisingly, pretty ordinary. In other words, looking only at age and sex, you can’t identify consumers whose friends spend a lot (and who we know spend a lot themselves).

Though it’s standard marketing practice to target consumers based on their demographics, it’s an admittedly noisy profiling technique. So, to put social through the wringer, we next took the “socially select” group—consumers whose friends spent a lot—and matched them to random consumers with identical age, sex, and past purchases. Each social candidate, that is, was matched to a consumer not only of the same age and sex, but one who spent approximately the same amount as the social candidate during the previous six months. Even relative to this formidable baseline, social cues still provide considerable information. As the middle line indicates, knowing a consumer’s age, sex, and past purchases, but not that their friends are shopaholics, one would still underestimate their future sales.[1]

We repeated this analysis for two other domains—examining signups for Yahoo! Fantasy Football, and clicks on ten online banner ads for movies, apparel, government programs, and beyond—again finding that the predictive power of social persists even after adjusting for age, sex, and past behavior. Lest you run off to rejigger your social strategy, I should mention a couple of caveats. First, we have shown that consumers with big-spending friends tend to spend a lot themselves—more, in fact, than their demographics or past purchases alone would suggest. But since most people, even premium customers, don’t have shopaholic friends, social cues do not substantially boost average predictive performance. Second, though social signals help predict how much consumers spend, they don’t always help identify which consumers will spend the most. Those who recently spent fifty grand on sartorial elegance are likely to be habitual top spenders, regardless of what you know about their friends.

Assessing the value of social, as with most things, is a messy affair. On the one hand, network ties convey information not captured by the usual egocentric metrics, a conclusion that at the very least I find scientifically interesting. On the other hand, it’s not immediately obvious how to use that knowledge to take over the world. Well, rest assured that an army of social strategy gurus are waiting in the wings with a game-changing, technology-disrupting way to, you know, leverage the social graph to deliver personalized experiences or something.

N.B. Thanks to Randall Lewis and David Reiley for acquiring the sales data, Jake Hofman for assembling the email data, and Duncan Watts and Dan Reeves for comments. The plot above was generated with ggplot2. For related work in the telecom domain, check out the paper, “Network-Based Marketing: Identifying Likely Adopters via Consumer Networks,” by Shawndra HillFoster Provost, and Chris Volinsky.

Illustration by Kelly Savage

Footnotes

[1] It’s perhaps tempting to conclude from these results that shopping is contagious (i.e., to assert causation where only correlation has been shown). Though there is probably some truth to that claim, establishing such is neither our objective nor justified from our analysis.

Tags: , ,

  • http://had.co.nz Hadley Wickham

    It’s neat that you’re using ggplot2 – but would you mind citing it in your papers? Citations really help me to show to my colleagues that producing software is useful and does have an impact on the practice of statistics. Thanks!

  • http://www.cam.cornell.edu/~sharad/ Sharad Goel

    @Hadley: Sure, I’m happy to cite ggplot2. It’s a shame that the utility of such tools isn’t self-evident.

  • Cong Yu

    Is it true that you analyze subset of the friends who are in the offline retailer registry, but not those who are not? Do you know the percentage of each and will the result still be significant among _all_ friends?

  • http://www.cam.cornell.edu/~sharad/ Sharad Goel

    @Cong: Yeah, that’s right, we only have data on friends who are in the registry. But the predictions can only get better by including data on more friends — if they didn’t then we could always exclude the extra data and we’d be left with our same old predictions.

  • ebrosh

    Hi Sharad,
    Thanks for the interesting (and fun) article.
    I’m not clear on something though: you write referring to the top line: “When friends (alters) on average spent $400 during the six-month observation period, the consumer herself (ego) spent nearly $600”.
    Yet if I’m reading the plot correctly, when the alters spent on average $200 (i.e. less than the typical spender) the Ego still spent nearly $500. For prediction, if there’s a correlation for high spending, shouldn’t there be one for low spending? Shouldn’t the leftmost plot point on the top line be lower than the dotted line?
    I guess I’m not reading the graph correclty? Perhpas you could explain the dot-size and that would shed some light on it?
    Thanks.
    PS – excuse my late comment :) I only happened to come across this post now.

  • http://socio-shop.com Guy Azar

    This is a brilliant research in terms of data collection.
    Well done & thanks for the share

    however we KNOW that social attributes cannot draw a vendor the entire picture of what products is a person likely to purchase, when and for how much.

    It is fairly intuitive to conclude that social is a part of a multi-variable algorithm, bringing into calculation the usual demographics as well as social shopping / engagement variables people manage with one another and the brand.
    you have the data to make a step in that direction

  • http://socio-shop.com Guy Azar

    This is a brilliant research in terms of data collection.
    Well done.
    however we KNOW that social attributes cannot draw a vendor the entire picture of what products is a person likely to purchase, when and for how much.

    It is fairly intuitive to conclude that social is a part of a multi-variable algorithm, bringing into calculation the usual demographics as well as social shopping / engagement variables people manage with one another and the brand.
    you have the data to make a step in that direction