Thursday, August 21, 2008

The gene for Jamaican sprinting success? No, not really.

Anyone who has walked past a TV set over the last few days will have seen footage of the remarkable Jamaican sprinter Usain Bolt, who comfortably cruised to victory (and a world record) in the Olympic 100 metre sprint, and as I write this has just done precisely the same thing in the 200 metre sprint. The interest in Bolt stems not from the fact that he wins his races, but rather from the contemptuous ease with which he does so.

And Bolt is not the only Jamaican to impress in short distance events in Beijing: the country's women's sprint team took all three medals in their 100 metre dash.

Naturally, these performances have provoked widespread speculation about the basis of Jamaica's sprinting success, and the short-distance prowess of other populations of West African ancestry. One controversial suggestion has drawn the most headlines: that sprinting is in their genes, or rather in one gene in particular - variously referred to as "Actinen A" or "ACTN3".

This gene has been the subject of a recent rash of news stories sparked by Bolt's victories, all of which refer to comments by Jamaican academic Errol Morrison in the Jamaica Gleaner over a month ago. The Gleaner article summarised the (unpublished) results of a collaboration between Morrison and a group at the University of Glasgow:
At the base of sprint speed are the fast-twitch muscle fibres stocked with the speed protein Actinen A. And early data indicate that 70 per cent of Jamaican athletes have the gene for Actinen A. Only 30 per cent of Australian athletes studied had the gene.
(The Gleaner reporter, Martin Henry, astonishingly went on to speculate that this gene may help to explain why Jamaicans are "also disproportionately aggressive and violent".)

The Daily Mail followed up on the story two weeks later with a marginally more coherent account:
What they have found - and Morrison emphasises the findings are preliminary - is that fast men have a special component called Actinen A in their fast-twitch muscles, which determine whether humans are sprinters or plodders. It is found in 70 per cent of Jamaicans. In a control study of Australians, only 30 per cent were found with it.
The "preliminary" nature of the findings didn't stop the Daily Mail reporter from following this paragraph with the conclusion that this result "would seem to explain why Jamaicans punch above their weight among sprinters". Similarly definitive statements were made by other reporters continuing the story after Bolt's 100 metre victory; one rare exception was a fairly well-balanced piece in Slate.

The stories take advantage of a widespread perception - by no means totally unjustified, but nonetheless controversial - that Jamaicans and other groups of West African ancestry have a genetic advantage when it comes to raw muscle power. Having apparent scientific evidence to support this perception is a reporter's dream; the headlines write themselves.

So, how good is this scientific evidence? Does the "Actinen A" gene (whatever that is) actually influence sprinting performance? And if so, does it explain the difference in explosive power between Jamaicans and the rest of the world? The answers, as it turns out, are "probably" and "not really".

The ACTN3 gene and muscle performance
At this point I probably should confess to having a more than casual interest in this story: I was one of the authors on the first study showing an association between this gene and elite athlete status back in 2003, and this gene has been the central focus of my research for a good part of the last six years. (The opinions I express here are purely my own, by the way, and in no way are meant to represent the views of my research institute.)

The ACTN3 gene encodes a protein called α-actinin-3 ("Actinen A" is a misnomer of uncertain origin propagated by lazy reporters), which is found within the fast fibres of muscle - the cells that are required for generating rapid, forceful contraction in activities such as sprinting and weightlifting. Interestingly, the human ACTN3 gene comes in two forms in the general population: there's a normal, functional version called 577R, and a "defective" version called 577X, which contains a single base change that prevents the production of α-actinin-3. People who have two copies of the 577X version (I'll refer to them as X/X) produce absolutely no α-actinin-3 in their fast muscle fibres.

These people don't suffer from muscle disease as a result of this deficiency - in fact, there's a pretty good chance that you're one of them. The frequency of the 577X variant differs around the world, but overall somewhere between one-sixth and one-quarter of the world's population (at least a billion people worldwide) are X/X, and therefore completely deficient in α-actinin-3.

So lack of α-actinin-3 clearly doesn't destroy your muscle; however, over the last five years we and other groups have assembled evidence suggesting that it does influence how good your muscle is at generating explosive power. We first showed in 2003 that X/X individuals are significantly under-represented among elite Australian sprint/power athletes, suggesting that the absence of α-actinin-3 in X/X individuals is detrimental to optimal muscle power generation. This association has since been replicated in four separate athlete studies by groups in Europe and the US; there is also weaker but reasonably consistent evidence that α-actinin-3 deficiency results in slightly higher endurance capacity, both in human athletes and in a mouse model generated by our group. In addition, several groups have reported that X/X individuals in the general population display lower muscle strength and reduced sprint performance.

Importantly, the latter two studies suggest that the proportion of the variance in strength and sprint performance in the general population explained by the ACTN3 variant is around 2-3%. So for most of us lazy slobs this gene has a pretty trivial effect - almost completely drowned out by noise from the effects of diet, exercise levels and other genes. (Certainly there are dozens or even hundreds of other genes influencing physical performance, some of which - like the ACE gene - have been fairly well-studied, but most of which are completely unknown and uncharacterised; and environmental factors play about as large a role as genes do in traits like muscle strength and cardiorespiratory performance.)

However, even 2-3% can make a striking difference at the very elite level: of the 51 Olympic-level sprint/power athletes analysed in our original study and a follow-up analysis in Greek athletes not a single individual was X/X (compared to about 10 expected). In fact, X/X Olympian sprint athletes are unusual enough that identifying a single Spanish Olympic short-distance hurdler with α-actinin-3 deficiency was enough to warrant its own publication.

So the absence of α-actinin-3 means very little to most of us, but to a young athlete craving 100 metre Olympic superstardom it could make all the difference in the world. The same could be said of many other genetic variants, of course; Olympic sprinters, essentially, are those unlikely individuals at the vanishing edge of the probability distribution for whom nearly every genetic coin has come up heads.

Does the ACTN3 gene explain Jamaican sprinting prowess?
The underlying argument here is intuitively simple: (1) variation in the ACTN3 gene is strongly associated with elite sprint athlete status; (2) the "sprint" version of ACTN3 is more common in Jamaicans than in individuals of European ancestry; therefore (3) this variant may well play a role in the increased sprinting prowess of Jamaicans relative to Europeans. At first blush this sounds pretty convincing; however, while ACTN3 may play some role in the disproportionate success of Jamaican sprinters, I'd argue that it's likely to be a pretty small one. Here's why:
  1. The difference in frequency between Jamaicans and Europeans is not as great as it would appear. The articles quoted above describe the proportion of individuals who have two copies of the 577R ("sprint") version of the gene; a more appropriate comparison is the proportion of individuals who have at least one copy of 577R (that is, including both R/R and R/X individuals), since it's only the complete absence of α-actinin-3 that is reliably associated with reduced sprint performance. This starts to look less impressive: it's 98% in Jamaicans compared to about 82% in Europeans. In other words, in both populations a sizeable majority of individuals have an ACTN3 status compatible with elite sprint performance.

  2. The ACTN3 frequency reported for the Jamaicans by Morrison is not unique to Jamaicans, nor is it particularly surprising - our group has previously reported virtually identical frequencies in individuals from both West Africa (the ancestral source of the bulk of the Jamaican gene pool) and East Africa, in a collaboration with the same group at the University of Glasgow that Morrison has been working with on the Jamaican study. In fact, that study showed that an even higher frequency of α-actinin-3 expression (99%) is found in Kenya - in members of tribes whose members dominate international long-distance events, but have a notable dearth of representatives in track sprinting; we have more recently found similarly low frequencies in populations across sub-Saharan Africa. There's simply no clear relationship between the frequency of this variant in a population and its capacity to produce sprinting superstars.

  3. Finally, when Usain Bolt was pacing restlessly at the starting line of the 100 metre sprint - even in the very first round of Olympic heats - the very low frequency of X/X individuals among Olympic sprinters means he was lined up against a group of athletes who almost certainly all express α-actinin-3! In other words, while the ACTN3 variant may have played a small role in getting Bolt to the Olympics, it can't possibly explain the astonishing advantage he has over his competitors.
I'll concede that the small difference in the frequency of α-actinin-3 expression between Jamaicans and Europeans may result in a slightly larger fraction of the Jamaican population being suitable for elite-level sprinting (all else being equal), but it's a tiny piece of the overall explanation at best - and it can't possibly explain why Bolt is so much better than his fellow West Africans and other Olympic-level sprinters. Clearly there are other factors at work.

Beyond "the gene for speed"
I'm certainly not arguing here that genetics doesn't play any role in Bolt's success - or in the remarkable over-representation of West African descendents in Olympic short-distance track events, or the similarly impressive skew towards East Africans among marathon runners. In fact I think most geneticists would be staggered if this was the case, even though direct evidence for underlying genes is currently very thin on the ground.

Rather, my point is that an excessive emphasis on ACTN3 as a major explanation for Jamaican success does a grave disservice to the complex interplay of genetic and environmental factors required for top-level athletic performance. This suggestion goes against everything we've learnt about the genetics of complex traits from recent genome-wide association studies, which have revealed that quantitative traits (like height and body weight) are frequently influenced by dozens to hundreds of genes, each of small effect; if anything, it's likely that athletic performance will be even more genetically complex than these traits. The ACTN3-centred argument also dismisses the importance of Jamaica's impressive investment in the infrastructure and training system required to identify and nurture elite track athletes, the effects of a culture that idolises local track heroes, and the powerful desire of young Jamaicans to use athletic success to lift themselves and their families out of poverty.

It is almost certainly true that Usain Bolt carries at least one of the "sprint" variants of the ACTN3 gene, but then so do I (along with around five billion other humans worldwide). Indeed, I'm fortunate enough to be lugging around two "sprint" copies - but that doesn't mean you'll see me in the 100 metre final in London in 2012. Unfortunately for me, it takes a lot more than one lucky gene to create an Olympian.


(Image: Phil McElhinney.)

Subscribe to Genetic Future.

Tuesday, August 19, 2008

Misha Angrist reviews personal genomics

Blogger, Personal Genome Project participant and Assistant Professor Misha Angrist has a concise and extraordinarily readable article on the current state of personal genomics at Technology Review. Here's the penultimate paragraph:
This is where we are in the era of personal genomics: some modest amusement, a few interesting tidbits, a bit of useful information, but mostly the promise of much better things to come. The more people are allowed--encouraged, even--to experiment, the sooner that promise can be realized.
I find myself in complete agreement. Anyone interested in the field should go read the rest.

Subscribe to Genetic Future.

Saturday, August 16, 2008

Venter's exome, and the challenge of rare variants for personal genomics

A team led by J. Craig Venter from the J. Craig Venter Institute has just published another paper on J. Craig Venter's favourite topic: J. Craig Venter.

This study follows up on last year's publication of the complete sequence of Venter's genome, this time reporting a detailed analysis of a small but quite informative fraction of the genome: the exome, which consists of all of the pieces of DNA (called exons) that directly code for protein molecules.

The exome is a favoured target of geneticists. There are two major reasons for this: firstly, the exome is enriched for functional sequence, whereas non-coding DNA has a much higher fraction of non-functional junk; and secondly, we understand protein-coding DNA much better than we do non-coding DNA. If a novel mutation alters a protein sequence, we have algorithms that can predict (with moderate accuracy) how likely it is to alter the function of the cell. In contrast, for most mutations in non-coding DNA we have almost no way to predict whether they are functional or not. So, like the drunkard looking for his keys under the lamp-post because the light is better there, geneticists are inclined to look hardest at the regions where they actually have some chance of finding something they can understand.

Venter's mutations
The article (which is open access, so you can read it yourself) has a number of interesting factoids about Venter's protein-coding genome that are highly relevant to personal genomics:
  1. The authors identified 10,389 variants predicted to alter protein sequences;

  2. Of these, most are common (they estimate that 80-85% are present at a frequency of over 5% in the general population);

  3. About 1,500 of these variants are likely to actually significantly alter protein function, based on the SIFT prediction algorithm - these are the variants most likely to play a role in shaping human variation and common disease risk;

  4. A variant is twice as likely to be functionally damaging if it is rare (frequency less than 5%) than if it is common (frequency over 5%);

  5. Several quite unambiguously protein-damaging mutations were also found (74 would introduce an abnormal "stop" signal, while others create "frame-shifts" that alter large regions of an encoded protein), but many of these fall in genes with poor annotation that may well be non-functional;

  6. Venter carries seven known disease-associated variants, all present in only one copy (i.e. heterozygous);

  7. The interpretation of all of these data in terms of making actual health predictions is remarkably problematic, an ominous sign for the ~20 wealthy folks getting their genome sequenced by Knome this year.
The authors raise some interesting discussion points about the implications of their results for personal genomics; this paragraph is particularly sobering:
Even if a gene is known to be involved in disease, it is difficult to understand if a variant in the gene will have a phenotypic effect. We found that 99% of the [protein-altering variants] in disease genes could not be characterized by current literature. Different mutations in the same gene can cause different phenotypic effects [49], thus making it difficult to interpret possible phenotypes. Furthermore, some variants have phenotypic effects only under certain environments (see SOD2 and BDNF in Table 2 and [48]). Also, when looking at complex phenotypes, multiple variants in coding and non-coding regions are likely to be involved [63][66]. This genetic complexity, as well as exposure to various environmental factors, will need to be taken into account in assessing risk for various diseases.
In other words, it will be quite some time before we can use a genome sequence to make realistic predictions about overall health (except for the unlucky few who carry mutations unambiguously associated with disease, such as a CAG repeat expansion in the HTT gene - in which case the predictions will tend to be dire). The next few years will be interesting times indeed for personal genomics companies, as their ability to generate oodles of genetic data with cheap sequencing increases exponentially faster than their capacity to explain what the data actually mean.

The challenge of rare variants
I want to draw particular attention to the implications of point 4 above (the fact that rare mutations are the most likely to alter protein function, and thus to have an effect on disease risk). The evolutionary basis for this association is trivially clear: if a variant has a serious negative effect on health then in most cases natural selection will keep it at a low frequency in the population, since really sick people tend to have fewer kids. Disease-causing variants can reach high frequencies under certain conditions (if they also provide benefits under certain situations, or if the disease only hits its victims after they've already reproduced, for instance) but all else being equal, evolution's scythe means that you're far more likely to find disease-causing variants at the rare rather than the common end of the spectrum.

The reason this is so problematic is that rare disease-causing variants are also the hardest to find and characterise. I've mentioned a few times that the current crop of genome-wide association studies (GWAS), while reasonably well-powered to detect common disease-causing variants, have virtually no ability to find rare causal variants - even if these variants explain the majority of disease risk. This probably goes some way to explaining why even massive GWAS are capturing only a small proportion of the overall genetic risk for most common diseases.

This arises primarily because the chips used in current GWAS only efficiently "tag" common variants. However, even once this technological barrier is lifted it will still be fiendishly difficult to assign function to rare variants: because there will be many millions of these variants, each at a low frequency, the sample sizes required to find those few associated with disease risk will be mind-bogglingly large - we're talking cohorts of millions of people, all with large-scale sequence data and well-collected information on environment and health. I have no doubt such studies will eventually be done, but it will take many years before we see the results.

And of course, even with such massive cohorts, the rarest variants (those restricted to single families, or even just a few isolated individuals) will still slip through the statistical cracks - but such variants may well be the most important features in the genome sequence of any given individual, the ones disrupting that crucial tumour-suppressor gene or messing with neurotransmitter expression levels. If you have one of these nasty variants, you'll want to know about it, and you'll want to know what it does.

Beyond genetics
Ultimately, geneticists will have to deal with such variants using non-genetic methods. For instance, for many genes it may eventually be possible to create experimental assays that allow researchers to rapidly test whether a novel variant disrupts protein function; the mouse embryonic stem cell assays that can be used to test novel variants in the breast cancer gene BRCA2 are a proof of principle, as well as a demonstration of just how challenging this process will be.

More broadly and ambitiously, we need to build and refine models of how human beings operate at a molecular level, integrating data from many fields of biology. If we understand which proteins interact within which cells, how these interactions influence protein dynamics, and where the binding sites for each interaction lie, we will have a much better chance of inferring the effect of an isolated change in protein sequence on overall cellular function and thus human health. Moving beyond the exome into non-coding DNA will require even more subtle and complex models including protein-DNA binding, the regulation of DNA modification and conformation, and the effects of non-coding RNA.

In other words, ultimate personal genomics - the extraction of every byte of useful predictive information out of an individual's genome sequence - will require nothing less than an atomic-level understanding of the operation of the human machine. Now that is an effort I'd like to see Google throw its weight behind...


(Venter image from Wikimedia Commons.)

Ng, P.C., Levy, S., Huang, J., Stockwell, T.B., Walenz, B.P., Li, K., Axelrod, N., Busam, D.A., Strausberg, R.L., Venter, J.C., Schork, N.J. (2008). Genetic Variation in an Individual Human Exome. PLoS Genetics, 4(8), e1000160. DOI: 10.1371/journal.pgen.1000160

 Subscribe to Genetic Future.

Tuesday, August 12, 2008

How well does your genome predict your postcode?

Well, it's far from GPS precision, but the concordance between this genetic map of Europe (below left) and the physical sampling locations of populations throughout Europe (below right) is pretty good for a first draft:

The genetic map was constructed using data from over 300,000 genetic markers in 2,514 individuals from 23 European subpopulations, making it the most comprehensive analysis of European genetic variation performed to date. The map was constructed using purely genetic data without information on spatial location, so the concordance between the two maps indicates the degree to which genetic ancestry correlates with physical location - in other words, how well your genes predict your address.

Dienekes has an excellent discussion of the technical details, while Razib has labelled a plot showing all of the individuals in the study to make it easier to assess the degree of scatter and overlap.

The take-home message: rather than being one homogeneous mass, Europeans in fact show considerable population substructure, such that genetic information can be used to roughly predict geographical ancestry. An analysis of just a few hundred thousand genetic markers (i.e. less than is currently offered by personal genomics companies 23andMe or deCODEme) would be more than adequate in most cases to distinguish a Pole from a Parisian, or a Swede from a Spaniard. (To be more precise, it would be sufficient to discriminate between individuals for whom most ancestors were natives of these regions; recent migrants will obviously be misclassified.)

What drove these genetic differences? Mostly it will have been chance - random increases or decreases in the frequency of markers throughout the genome accumulated over a few millennia of genetic isolation. But at least some of these differences have been driven by natural selection: for instance, the lactase gene LCT, which has been subject to strong selection to allow lactose digestion in adults in populations reliant on dairy agriculture, represents 9 out of the top 20 most differentiated markers; a marker in the gene HERC2, which is associated with eye colour variation and has been under selection in Europeans and Asians, comes in at number 19.

This indicates that at least some of the genetic - and thus physical and possibly behavioural - differences between the various European populations stem from evolutionary adaptation to their local environments.

I'll leave the technical commentary to Dienekes, but I do want to make one important point: the accuracy of the map will have been limited by the fact that the markers used in this study represent sites of common variation; data from large-scale genome sequencing will generate far, far better maps. The major reason for this is that sequencing will provide information on rare, highly spatially-restricted variants - many of which will be limited to single families and thus be extremely informative about geographical ancestry.

Basically, if you had complete genome sequences from enough Europeans you could reconstruct the genetic map of Europe with exquisite precision. In addition to empowering genetic genealogists, researchers could use deviations between the genetic and physical maps to make powerful inferences about historical migration events and recent episodes of natural selection. With any luck, this is the sort of data that will simply fall out from large-scale population genomic studies being conducted over the next decade or so.

Update: Kambiz at Anthropology.net puts these results in a broader scientific context.

Lao et al. (2008). Correlation between Genetic and Geographic Structure in Europe. Current Biology DOI: 10.1016/j.cub.2008.07.049

Image source: Figure 1 from Lao et al.

 Subscribe to Genetic Future.

Thursday, August 7, 2008

BREAKING NEWS

Hopefully I now have the attention of at least a small proportion of my RSS subscribers; here's a friendly reminder:

Genetic Future has moved and you need to update your RSS feed by clicking HERE.

This feed will be inactivated shortly, and this domain will become an archive site.


Daniel.

Subscribe to Genetic Future.

The challenges of psychiatric genetics

Back in April I posted on the elusive genetics of bipolar disorder, a crippling psychiatric condition affecting over 2% of the population in any given year.

The major message from that article is that although bipolar disorder is massively influenced by genetic factors (around 85% of the variation in risk is thought to be due to genetics) we still don't really have the faintest idea exactly which genes are involved. This is despite three reasonably large genome-wide association studies involving over 4,000 bipolar patients in total, which generated weak and contradictory results and failed to provide a single compelling candidate for genetic variation underlying this disease.

This disappointing result has also held largely true for other psychiatric conditions with strong genetic components, such as schizophrenia, major depression and autism. Genetic studies of these conditions have had some success identifying rare mutations that underlie severe cases, but the vast majority of the genetic variants contributing to risk remain undiscovered.

There are several reasons why genome-wide association studies can fail to yield significant harvests of disease-associated genes. I summed these up with respect to bipolar disease as follows:
The researchers are surely hoping that small effect sizes are the major problem, since this is the easiest problem to remedy (simply increase sample sizes). Disease heterogeneity - in other words, multiple diseases with distinct causes that all converge on a bipolar end-point - also seems like a particularly plausible explanation given the complexities of mental illness. It's also likely that various types of genetic variants that are largely invisible to existing SNP chips, like rare variants and copy-number variation, are important.
The same story probably holds largely true for other psychiatric conditions. In this week's issue of Nature, a news article and an editorial both tackle the challenges of psychiatric genetics, and lay out the ambitious strategies currently being pursued by researchers around the world to overcome them.

Small effect sizes
The first hurdle that I describe above is the fact that most of the variants underlying these conditions probably have very small effect sizes (only increasing risk by less than 20%). Such variants will only be identified by cranking up sample sizes immensely, an approach that has yielded some limited success for other genetically complex traits such as height and obesity. The Nature news feature has a table listing some of the major collaborative efforts currently collecting genetic information from the very large cohorts required to dissect out the basis of these conditions:



In most cases, these samples are being built up by pooling results from multiple different studies, often gathered by groups from around the world. As sample sizes increase the power of studies to detect small-effect variants grows. The effect of sample size on the power of genome-wide association studies is illustrated in the graph below from a recent review by Peter Visscher*:

Take a single genetic variant that explains just 0.5% of the variance in the risk of a psychiatric disorder. With a sample size of 5,000 individuals with that disorder you still have a mere 50% chance of detecting that variant. Double your sample size, and that probability jumps to a near-certainty of detection - and your power of detecting even smaller-effect variants (explaining, say, 0.2% of the risk) starts to climb to respectable levels.

By staring at those curves for a while, and bearing in mind that many of the variants found by recent genome-wide association studies explain well under 0.2% of the risk variance, you will quickly start to appreciate why researchers are pushing for ever-larger disease populations to work with. With truly enormous samples on the order of 50 to 100 thousand patients - not out of the question for international consortiums studying reasonably common diseases such as bipolar - the power to detect even very weak risk variants becomes reasonable.

If there are common genetic variants contributing to the risk of these diseases, such large collaborative studies will eventually find them; so long, of course, as they can tackle the next (and potentially far more serious) problem of disease heterogeneity.

Complex, heterogeneous diseases
The second major problem I mentioned with analysing the genetic basis of these diseases is that they are complex, multifactorial, and extremely difficult to diagnose and classify. Psychiatric conditions are probably the most difficult area of medicine to draw hard boundaries: many symptoms are shared by multiple conditions, and many patients display a diffuse constellation of clinical signs that makes a clean diagnosis impossible.

This complexity and heterogeneity is the basis of considerable tension between geneticists and neuroscientists, which is explored in the Nature editorial. Basically, to build up those massive sample sizes shown above geneticists are forced to lump together patients with a variety of clinical symptoms, thus essentially ignoring the complexity inherent in these conditions - a failure that neuroscientists find inexcusable. In turn, geneticists (like myself) get seriously annoyed by the tendency of neuroscientists to make big, bold claims about disease mechanisms based on studies with tiny sample sizes.

Both sides make reasonable criticisms. As I said in the quote from my previous article above, it seems likely that disease heterogeneity - that is, multiple diseases states with the same broad end point being simplistically lumped together - plays a major role in the failure of genome-wide association studies of psychiatric conditions; at the same time, the scientific value of much of the "sexy" neurobiology currently being published (e.g. functional MRI finds that conservatives have lower activity in "compassion" centres of the brain, or whatever) is sometimes highly questionable. Both sides of this scuffle have something to learn from their opponents.

The editorial argues, sensibly, that geneticists and neuroscientists just need to start getting along. The ideal situation is one in which rigorous clinical assessments are used to generate patient cohorts that are as homogeneous as possible that can then be subjected to large-scale genetic analysis. One especially promising avenue is the use of "endophenotypes" - that is, simple and easily quantifiable traits that are sometimes but not always associated with a particular disease. Cleanly defined endophenotypes, such as very specific dysfunctions of brain activity, may prove much more amenable to genetic dissection than the larger, more complex diseases they are associated with.

Comprehensively tackling the genetic of psychiatric conditions will require a forceful and combined approach drawing on the clinical expertise of neuropsychiatrists and the experience of geneticists in unravelling the genetic mechanisms of complex traits. To some extent this is happening already (no large genetics consortium would be naive enough to embark on a multi-million dollar project without consulting clinical experts) - but obviously there is considerable room for improvement.

Moving beyond common SNPs
Current genome-wide association studies currently rely largely on the use of single-letter variations in DNA called single nucleotide polymorphisms (SNPs), mainly because these are easy to analyse and can be simultaneously analysed in their hundreds of thousands using chip-based assays. For various reasons almost all of the SNPs on current genome-wide association chips are common sites of variation, present at a frequency of 5% or more in the population. However, recent studies have made it look increasingly likely that a large proportion of the genetic risk of common diseases lies in types of genetic variation that cannot be detected using common SNPs: rare variants, and large-scale rearrangements of DNA known as structural variation.

The approaches required to capture these variants are already pretty well-known, although they remain expensive and technically challenging. In an ideal world, genome-wide association studies would be truly genome-wide - in other words, they would utilise the entire DNA sequence of all of the patients and controls in the sample to find every possible genetic variant that might contribute to disease. Unfortunately, such an approach is currently out of reach, for several reasons:
  1. The cost of DNA sequencing is still too high;
  2. The computing power required to analyse the unbelievable volumes of data generated by such a project would be astronomical;
  3. Statistical issues associated with examining so many data-points from each patient and control would greatly increase the required sample sizes, driving costs and computational requirements up even higher; and
  4. Our ability to predict the effects of most genetic variants on human biology - which would be important for understanding which of the millions of rare variants found in such a study are actually harmful - is still far too weak.
Each of those challenges (particularly the costs of sequencing) will ultimately be overcome, but this will take time. In the meantime, there are some less powerful but technically feasible approaches that are likely to yield some useful results over the next few years. One of these is carefully targeted sequencing of a small fraction of the genome, consisting of candidate genes considered a priori likely to have some role in the disease in question (an approach strongly advocated by Walter Bodmer in a recent perspective article in Nature Genetics), which will identify at least some of the rare variants that underlie disease risk. To nail the structural variation, new chips that can pick up even small regions of variable copy number (that is, duplications and deletions) have already had some success in identifying potential genetic changes underlying sporadic cases of schizophrenia.

Both approaches have their limitations. The success of the candidate gene approach will be constrained by researchers' ability to identify the genes most likely to be involved in a particular disease - but in fact our currently severaly limited understanding of disease genetics is precisely why we need to study this issue in the first place! (In the Nature news piece, Harvard's Steven Hyman memorably describes this approach as "like packing your own lunch box and then looking in the box to see what's in it.") And while chip-based detection of structural variation is rapidly increasing in resolution, it's extremely difficult to determine which of the variants identified in a study are disease-causing and which are harmless polymorphisms - this is currently done probabilistically, by showing that there is an enrichment of new variants in disease cases compared to controls, but this approach cannot tell you which of the identified variants are actually causative.

From psychiatric genetics to genetic psychiatry?
There are several important reasons researchers are interested in the genetics of mental illness: identifying causal genes helps to dissect out the molecular pathways involved in disease, and may help to pull out otherwise invisible sub-types of a disease; studying "extreme" mental phenotypes may illuminate the genetic basis of variation in cognition and personality traits in "normal" people; and, perhaps most importantly, by identifying the genes underlying psychiatric diseases we may be able to target at-risk individuals for monitoring and intervention, potentially heading off severe disease before it takes hold.

In the headlong pursuit of these goals the field of psychiatric genetics has developed an unfortunate reputation built on bold claims made with limited evidence, and literally hundreds of reported associations that have completely failed to stand up to replication. Just a couple of years ago the shiny new tools of large-scale genomics promised an end to this ignoble period in the history of the field; unfortunately, the introduction of larger samples, higher genomic coverage and increased statistical rigour has not brought the desired clarity to the field, but rather seems to have increased the levels of confusion and uncertainty.

If anything, that crucial third goal - using genetic to predict the risk of mental illness - now appears further away than it did just a couple of years ago. Back in early 2007 we didn't have many convincing genetic predictors of mental illness, but at least it was possible to imagine that emerging genomic technologies might identify a small core set of large-effect variants that would help clinicians to predict disease risk. Right now we still don't have many useful genetic predictors, and that illusion of hope is gone.

In summary: while there's no doubt that these conditions do have a strong genetic basis, it's now abundantly clear that this basis is frighteningly complex, with common variants of moderate-to-large effect - the types of variants that would be most useful for risk prediction - being essentially absent. It's going to take many years, massive cohorts, the clever application of new genomic technologies, and a willingness from both neuroscientists and geneticists to listen to one another to move this field forward.

(Brain scan image from Science Photo Library.)

* Thanks to reader Chris for providing me with the citation, which I had carelessly misplaced!


 Subscribe to Genetic Future.

Saturday, August 2, 2008

Genetic Future is moving, and so am I

Genetic Future is moving to a shiny new home at ScienceBlogs. This domain will remain as an archive site, but for fresh content you will need to update your links as follows:

New URL: http://scienceblogs.com/geneticfuture/

New RSS feed: http://feeds.feedburner.com/scienceblogs/geneticfuture

Some of you familiar with the ScienceBlogs network might be wondering if this move heralds a transition into left-wing political blogging, but don't worry: my articles will continue to be focused on reporting advances in human genomics and critiquing the genetic testing industry.

Just a few weeks after the transition I'll also be physically moving from Sydney to a new life in Cambridge, UK. Posting on the new site will be light during this move and regulars will notice a few recycled posts to fill in the awkward silences, but bear with me - in a couple of weeks there will be plenty of fresh human genetics goodness.

Hope to see you all at the new domain,


Daniel.

 Subscribe to Genetic Future.