How Small the World

Friday, September 30, 2011
By Sharad Goel

small world

In the late 1960s, Stanley Milgram conducted one of the most famous — and perhaps one of the most misinterpreted — experiments in the social sciences. He enlisted volunteers from far off lands (Kansas and Nebraska, in his case) to route a package to one of two target individuals in Massachusetts: a stockbroker in Boston, and the wife of a divinity school student in Cambridge. The catch was that participants could only forward the package to individuals that they personally knew (i.e., those with whom they were on a first-name basis), with the hope that this chain of personal contacts would eventually connect the source with the intended destination. Milgram’s stated aim was to study social interconnectedness, to establish whether two essentially random individuals inhabiting vastly different social spheres were in fact connected by short chains of intermediaries, or whether they existed in isolated communities separated by unbridgeable gaps. His elegantly designed experiment provided surprising support for the former hypothesis: Among the 44 chains that successfully made the trip from Nebraska to the Boston stockbroker, the journey was on average six steps long, a counterintuitively short stretch and one that suggests a certain social egalitarianism.

The distribution of completed chain lengths, from Milgram's original 1967 paper

The view that six steps is “small” is the first persistent misconception, despite Milgram’s efforts to overthrow it. As Milgram points out, the completed chains are the “end product of a radical screening procedure” that filters out not only the hundreds of people any given individual is acquainted with, but that also excludes the friends of those unchosen individuals, and their friends of friends, and so on. Such exponentially growing sums are famously counterintuitive, as illustrated by the fable of the rice and the chessboard. If a chessboard were to have rice placed upon each square such that one grain were placed on the first square, two on the second, four on the third, and so on (doubling the number of grains on each subsequent square), how many grains of rice would be on the chessboard at the finish? A heap of rice larger than Mount Everest. The “small” six degrees that separate two random individuals, or the often only three steps separating one from the President, may shed light on how quickly a contagion can spread through a population, but it does not reflect the substantial social and psychological distance between them.

“A subtle misinterpretation of the small world phenomenon is conflating the statement that short paths exist between individuals, with Milgram’s more provocative suggestion that people can effectively navigate these paths.”

The second, and perhaps more subtle, misinterpretation of the small world phenomenon is conflating the statement that short paths exist between individuals, with Milgram’s more provocative suggestion that people can effectively navigate these paths. In the age of ubiquitous network data, describing whom we email, IM, or are connected to on Facebook, it has become possible to definitively assess the so-called topological question, with multiple studies confirming that typical individuals are in fact connected by paths of around six steps. In Milgram’s original experiment, the majority of started chains never reached their target, leaving open the possibility that those individuals who successfully completed the challenge were precisely those connected to the target by short paths, while those who didn’t were topologically distant. While I think it’s likely these chains died due to uninterested participants rather than a more fundamental roadblock, rigorously establishing the navigability of these paths has proven difficult. In a clever online replication of Milgram’s small-world experiment by Peter Dodds, Roby Muhamad, and Duncan Watts, tens of thousands of participants attempted to locate more than a dozen targets around the world. While the completed chains again exhibited short path lengths, attrition — and thus the potential selection bias problem — was even worse, with only a tiny fraction of started chains reaching their destinations.

In a new project, Lars Backstrom and Cameron Marlow at Facebook, and Duncan Watts and myself at Yahoo are again replicating Milgram’s experiment, this time on top of Facebook’s network of over 750 million people. By comparing the paths that participants ultimately select with their topological distances to the targets, we aim to mitigate the tricky selection bias problem. Hopefully, we’ll learn not only whether the world is small, but how well we can navigate our way around it.

To participate in our small world experiment, please visit sixdeg.net.

Bonus Puzzle

Say a pair of siblings each get married — to two people who are also siblings. Now if both couples have kids (if it sounds like anything incestuous is going on, reread the previous sentence!) lets call those kids übercousins. So we have a simple genetics question this time: How related are übercousins? (Here’s a geneticist failing to come up with the answer; let’s see how Messy Matters readers do…)

Illustration by Kelly Savage

Tags: , ,

  • David

    I didn’t have the biological knowledge to answer the question so I built a quick and dirty simulator. According to my simulations, the ubercousins are about a quarter related based on chromosomal match probability.

    In the interest of scientific open disclosure, you can check out my simulator here:

    http://pastie.org/2621140

  • http://www.alandix.com/ Alan Dix

    About the ubercousins … hoping I have got my sums right :-/

    For any child you think of a single gene-pair on an ordinary chromosome(not on X or Y chromosomes), one copy came from father (say F1,F2), one from mother (M1,M2). Siblings have a 1/2 chance of sharing the identical fathers gene and 1/2 of sharing the mothers gene., so have 1/2 chance of sharing 1 identical gene and 1/4 chance of sharing two identical genes and 1/4 chance of having neither identical genes. For the non-identical genes, the likelihood that they are the same depends on the relative proportions of genes in the population, but is no better than for unrelated individuals.

    Of course for identical twins all genes are identical (with occasional mutations).

    Now looking down two generations for cousins and ubercousins.

    Any child’s father’s copy of a gene is equally likely to have come from either of the paternal grandparents (call these GF1, GF2, GM2, GM2) and the mother’s copy from the maternal grandparents (GF1′, GF2′, GM1′, GM3′).

    For ordinary cousins one or other of these is completely different (say the maternal, sharing a common grandfather), so there is a 1/4 chance of having exactly one identical gene.

    For ubercousins, they have 1/4 chance of sharing maternal grandparent gene ans 1/4 of sharing paternal grandparent gene, so overall 1/4 chance of sharing exactly one gene (like cousins), but also 1/16 chance of having both genes identical.

    As with siblings the non-identical ones could still be the same, just chances like non-related individuals.

    In summary:
    both identical one identical neither identical
    identical twins 1 0 0
    siblings 1/4 1/2 1/4
    ubercousins 1/16 1/4 5/16
    cousins 0 1/4 3/4

    That is ubercousins are precisely more related than cousins and less related than siblings as one would guess.

    The picture is slightly different for XY chromosome related genes.

    The boy’s Y chromosome comes entirely form the father, which in turn comes directly form the grandfather. So, as long as there is no incest, the Y chromosome of both boy cousins and ubercousisn, are no more likely to be the same than any unrelated boys. In contrast boy siblings always have identical Y chromosomes.

    Similarly for boy cousins and ubercousins, their X chromosomes come entirely from their respective mothers and hence ultimately maternal grandparents, so (again with no shared grandparents), as unrelated as anyone in the population.

    In summary:
    Genes on Boys’ Y or X chromosomes
    identical not identical
    identical twins 1 0
    brothers 1 0
    ubercousins 0 1
    cousins 0 1

    For girls the situation is a little more complex, as their X chromosomes do come form both mother and father, and while the sums are a little different it turns out these share in the same way as ordinary genes.

    As noted in all cases the non-identical genes may be the same randomly depending on proportions of different genes in population.

    In addition, mutations as Barry Star discusses will sightly reduce level of sharing identical genes.

    Of course many features come not from individual genes, but gene combinations, others are to do with upbringing, so the non-genetic commonality or differences may be greater or less.

  • http://reader.differentialist.info/ Mark Adams

    This is actually a pretty well-solved problem in the general case, even if there are things like inbreeding going on.

    Sewell Wright developed path models through a pedigree to calculate the relatedness between arbitrary individuals. A quicker way for a whole set of individuals involves a recursive tabular method, where the relatedness between two individuals is a function of the relatedness amongst their 4 parents.

    See the section on the numerator relationship matrix in http://www-personal.une.edu.au/~jvanderw/Genetic_properties_of_the_animal_model.pdf

    Here is some R code that constructs a pedigree matching that described and relies on a function that will quickly calculate this relationship matrix:

    https://gist.github.com/1256404

  • http://www.levreyzin.com Lev Reyzin

    Ubersiblings share 1/4 of the same genes. This makes them somewhere between cousins (1/8) and siblings (1/2).

    Say ubercousin C12 has parents M1 and F2 and ubercousin C21 has parents M2 and F1. M1 & F1 are siblings, as are M2 & F2. C12 and C21 share all 4 grandparents and get 1/4 of their genes from each. W.l.o.g. take a grandparent, say the father of M1 & F1. On the portion of genes from him, M1 and F1 overlap on 1/2. It’s easy to see that C12 & C21 will overlap on 1/4 on their respective portions from him. So 1/4*1/4*4 = 1/4. Geneticists should be able to do this…

    I think the math doesn’t get too fun without incest.

  • http://www.levreyzin.com Lev Reyzin

    Oops I meant ubercousins, not ubersiblings…

  • Arthur B.

    Draw the family tree. There are four paths between the two ubercousins of length 4 each. A path of length four represents a 1/2^4 chance of sharing a gene. They are thus 4/16 = 1/4 related.

    If their parents were two sets of twins they would be as related as siblings.

  • http://mikekr.blogspot.com zbicyclist

    Not a geneticist, but here goes: From father to child is 1/2. From father to father’s brother is 1/2. From father’s brother to father’s brother’s child is 1/2. So ordinarily cousins would be (1/2)^3 = 1/8, assuming the mothers are completely unrelated. But here we have a similar path through the mothers, so 1/8 + 1/8 = 1/4.

    Answer: 1/4.