Wednesday, February 27, 2008

Why do we have common risk variants for metabolic diseases?

ResearchBlogging.orgI've had a half-finished post on this article sitting in my "to blog" pile for some time now, until finally a post by Yann prodded me into actually finishing it off.

The hypothesis underlying this study is straightforward: common variants in human genes associated with metabolic diseases arose due to recent adaptation to new climates.

This hypothesis rests on a chain of logic that it's worth spelling out in full. Basically, as modern humans migrated out of our warm African homeland into novel environments outside of Africa, natural selection favoured genetic variants that allowed them to adapt to these environments. This much is certainly true; such variants are responsible for many of the more visible characteristics that differentiate human populations, most notably skin colour (PDF) and body shape. What the authors of this study go on to hypothesise (quite reasonably) is that selection for adaptation to novel non-African climates would also have acted on a set of other, less visible characteristics: metabolic traits related to things like energy balance and nutrient retention.

Gathering the data
To test this hypothesis the authors needed a set of genes that were likely to play a role in influencing these traits. This list contained three types of genes: (1) a set of 39 "seed genes" that emerged from a quantitative literature review for genes metabolic diseases (type 2 diabetes, obesity, hypertension and lipid abnormalities) - the assumption being that genes and genetic variants that are involved in metabolic diseases may also have played a role in metabolic adaptations to climate; (2) a set of 35 other genes that may be functionally related to the seed genes, based on an algorithm that looked at known gene-gene or protein-protein interactions to identify; and (3) a further 8 "wild card" genes "with strong evidence for involvement in metabolic syndrome phenotypes".

The authors then used data from the HapMap project to identify 873 "tag SNPs" - genetic variants that capture most of the common genetic variation within these genes. In addition, they chose 210 "control SNPs" from non-protein-coding regions of the genome that were considered a priori unlikely to be targets of selection. All of these variants were then analysed in 964 individuals from 52 populations from the Human Genome Diversity Panel (yes, the same panel that was analysed in those two massive studies published last week), as well as two other populations from Africa.

That's a huge amount of genotyping work, but to test their hypothesis the authors needed one more set of data: information about the climate that each of their 54 populations evolved in. Unfortunately it's very hard to know the exact values of different climate variables over the last 50-100,000 years, so the authors substituted in modern values (more on this later) for six major variables: rainfall, humidity, minimum, maximum and average temperature, and short wave radiation flux. Using the magic of statistics, they could reduce these six variables down to just four summary parameters, called summer PC1 and PC2, and winter PC1 and PC2.

Mining the data
The underlying principle of the analysis is quite intuitive: for each SNP, compare the allele frequency in each population with the climate variables experienced by that population, and see if there is any correlation - for example, a genetic variant that has been selected for cold tolerance should be present at a higher frequency in populations living in colder climates.

The authors apply this basic principle in several different ways, and basically show that their metabolism genes as a whole show stronger correlations with climate variables than do the 210 control SNPs. That means that some of the variation in these genes can't be explained simply by the historical movement of modern humans, but is likely to be partially the product of natural selection for climate adaptation.

The authors then point out some interesting examples of correlation within their dataset: for instance, a protein-altering variant within the leptin gene, which is known to play a major role in regulating appetite, energy balance and (crucially) the generation of heat by muscle, is associated with winter climate variables, while a non-coding variant within the TCF7L2 gene, the best-replicated gene associated with type 2 diabetes, is more weakly associated with summer climate.

However, the strongest associations were seen for the RAPTOR gene, which plays a role in "nutrient signalling, mitochondrial oxygen consumption and oxidative capacity". Variation in the RAPTOR gene is strongly correlated with both latitude and with winter climate variables; the pie-charts on the chart below show the frequency of one particular variant in this gene in the 54 surveyed populations, laid over a colour-coded map of winter maximum temperature. You can see immediately that the black variant in the pie graphs has become more common in the colder parts of the world.



Altogether, the authors make a convincing case for variation in metabolism-related genes being correlated with climate variables to a greater degree than would be expected by chance. That suggests that the guiding hand of natural selection has played a role in shaping the pattern of variation in these genes as modern humans moved from continent to continent over the last 100,000 years.

Some caveats
In any study of this complexity there will be something to complain about, and I intend to seize the opportunity to do so! One of my criticisms is fairly trivial, and the other is potentially more serious.

The first potential problem - one that the authors acknowledge early on - is the use of modern climate data as a proxy for the climates experienced by our ancestors, which have changed considerably over the period that modern humans have been moving around outside Africa (there was that whole Ice Age thing, for example!) This probably has relatively little impact on the results reported here: what matters most is the relative climate in different parts of the world, which I suspect hasn't changed that much - whether it's 18,000 BC or 2008 AD, Africa is still hotter than northern China. However, it would be interesting to know if estimates of historical climate variables are high-resolution enough to use in this type of study - any historical climate experts out there care to comment?

A second, more serious problem is alluded to by the authors in the final paragraph of their discussion: "it is unclear whether all the signals of spatially varying selection reported here are the result of adaptations to climate rather than other environmental variables". In other words, climate variables are known to be closely associated with other features of the environment, and these other features could in fact be the underlying drivers of the effects on the metabolic genes. The authors mention two such possible confounding factors: the diversity of parasitic and infectious disease species, and resource availability, both of which obviously vary with latitude and with climate variables such as temperature and rainfall. For instance, the map below (from this paper, via GNXP) shows the global distribution of vector-borne pathogens (infectious disease carried by non-human animals, e.g. malaria) - there's a non-coincidental concordance with the climate map shown above.



What's my point? Simply that a substantial proportion of the selection observed in this study may be due to some of these confounding variables, rather than with climate per se. Given that we're talking about metabolic genes, differential resource availability (and thus altered diet and food-growing practices) is a particularly powerful alternative explanation for the observed correlations, but effects of pathogens can't be completely ruled out.

This criticism is by no means disastrous for the study - selection is still interesting, whatever its cause - and the authors raise this criticism themselves, but I think the point could have been driven home more clearly. The title of the paper, for example, implies to me that adaptation to climate is the major selective factor here, and that's simply not a conclusion that can be drawn from these data.

What next?
This study has done an admirable job of opening up this topic for discussion and analysis, but there's a lot more to be done in this area. The first step will be to examine in detail the evolutionary history of these genes in a worldwide panel of populations, which will require someone to perform genome-wide genotyping of a whole bunch of humans - oh wait, somebody already did that! I think we can safely expect that the massive genetic surveys published last week will be already being pored over for evidence of local positive selection in any number of genes. A careful analysis of signatures of selection around the genes identified in this study would be a great help in untangling the issues here.

Secondly, we need more genome-wide association studies looking specifically at the traits that are most relevant to our recent evolutionary history. This study looked at some carefully picked candidate genes. A much better study further down the line would be to look at a more objective set: all of the genetic variants associated with obesity, type 2 diabetes and other metabolic diseases based on massive genome-wide studies in multiple populations. We'll have a pretty good early version of this set within the next few years.

Citation: Hancock, A.M., Witonsky, D.B., Gordon, A.S., Eshel, G., Pritchard, J.K., Coop, G., Di Rienzo, A. (2008). Adaptations to Climate in Candidate Genes for Common Metabolic Disorders. PLoS Genetics, 4(2), e32. DOI: 10.1371/journal.pgen.0040032

0 comments: