Tuesday, August 12, 2008

How well does your genome predict your postcode?

Well, it's far from GPS precision, but the concordance between this genetic map of Europe (below left) and the physical sampling locations of populations throughout Europe (below right) is pretty good for a first draft:

The genetic map was constructed using data from over 300,000 genetic markers in 2,514 individuals from 23 European subpopulations, making it the most comprehensive analysis of European genetic variation performed to date. The map was constructed using purely genetic data without information on spatial location, so the concordance between the two maps indicates the degree to which genetic ancestry correlates with physical location - in other words, how well your genes predict your address.

Dienekes has an excellent discussion of the technical details, while Razib has labelled a plot showing all of the individuals in the study to make it easier to assess the degree of scatter and overlap.

The take-home message: rather than being one homogeneous mass, Europeans in fact show considerable population substructure, such that genetic information can be used to roughly predict geographical ancestry. An analysis of just a few hundred thousand genetic markers (i.e. less than is currently offered by personal genomics companies 23andMe or deCODEme) would be more than adequate in most cases to distinguish a Pole from a Parisian, or a Swede from a Spaniard. (To be more precise, it would be sufficient to discriminate between individuals for whom most ancestors were natives of these regions; recent migrants will obviously be misclassified.)

What drove these genetic differences? Mostly it will have been chance - random increases or decreases in the frequency of markers throughout the genome accumulated over a few millennia of genetic isolation. But at least some of these differences have been driven by natural selection: for instance, the lactase gene LCT, which has been subject to strong selection to allow lactose digestion in adults in populations reliant on dairy agriculture, represents 9 out of the top 20 most differentiated markers; a marker in the gene HERC2, which is associated with eye colour variation and has been under selection in Europeans and Asians, comes in at number 19.

This indicates that at least some of the genetic - and thus physical and possibly behavioural - differences between the various European populations stem from evolutionary adaptation to their local environments.

I'll leave the technical commentary to Dienekes, but I do want to make one important point: the accuracy of the map will have been limited by the fact that the markers used in this study represent sites of common variation; data from large-scale genome sequencing will generate far, far better maps. The major reason for this is that sequencing will provide information on rare, highly spatially-restricted variants - many of which will be limited to single families and thus be extremely informative about geographical ancestry.

Basically, if you had complete genome sequences from enough Europeans you could reconstruct the genetic map of Europe with exquisite precision. In addition to empowering genetic genealogists, researchers could use deviations between the genetic and physical maps to make powerful inferences about historical migration events and recent episodes of natural selection. With any luck, this is the sort of data that will simply fall out from large-scale population genomic studies being conducted over the next decade or so.

Update: Kambiz at Anthropology.net puts these results in a broader scientific context.

Lao et al. (2008). Correlation between Genetic and Geographic Structure in Europe. Current Biology DOI: 10.1016/j.cub.2008.07.049

Image source: Figure 1 from Lao et al.

 Subscribe to Genetic Future.

1 comments:

Razib said...

jut an FYI, he prefers the appellation sandman, or, more informally just sandy.