Sunday, April 27, 2008

The human genome is old news. Next stop: the human proteome

A Nature News article describes the initial plans for an ambitious effort to begin mapping the complete human proteome: the set of all human proteins expressed in all of our cells at all points during our development and adult life.

This is a project of vastly greater magnitude and complexity than the sequencing of the human genome. Unlike the genome, which remains essentially static between cell types and over time, the proteome is tremendously dynamic, changing constantly in response to cell-cell signalling and environmental stimuli. Thus even though -with some small exceptions - every cell in your body carries the same genome, the proteome can be wildly different between different tissues and can change rapidly over time (the image on the left is the result of proteomic analysis of a single tissue, the human kidney; each spot represents one protein). In addition, the function of proteins can change depending on where they localise within the cell, and which other proteins are around for them to interact with.

The complete mapping of the human proteome would require analysing the expression, localisation and interactions of all proteins in human tissue samples from all tissues at all stages of development, and following exposure to all possible forms of environmental stimulus. That's completely impossible with current technology, so the architects of the human proteome project have drawn up a more realistic wish-list:

The plan is to tackle this with three different experimental approaches. One would use mass spectrometry to identify proteins and their quantities in tissue samples; another would generate antibodies to each protein and use these to show its location in tissues and cells; and the third would systematically identify, for each protein, which others it interacts with in protein complexes. The project would also involve a massive bioinformatics effort to ensure that the data could be pooled and accessed, and the production of shared reagents.

It's unclear exactly which tissue samples will be used for the first phase of the project, but it appears that this stage will rely heavily on pooling data from pre-existing studies. After that, the project may move onto a detailed analysis of the expression levels, cellular localisation and interaction partners of proteins encoded by genes on chromosome 21 (the smallest human chromosome); alternative suggestions include a comprehensive analysis of all of the proteins found in a specific cellular location such as the mitochondria or the cell membrane.

There are some daunting technical obstacles to overcome for this project to be successful. Given that the project will be carried out by multiple laboratories around the world, there needs to be a serious attempt at standardising the protocols used to extract and characterise proteins. The article notes that "results from the Human Plasma Proteome project and other proteomics efforts showed that different laboratories — and even the same lab — often identify very different sets of proteins from exactly the same sample".

The project will be complicated by the fact that many genes encode for multiple different proteins, differing from one another in various regions, through a process known as alternative splicing. The proposed solution to that problem is to ignore it altogether:

[...] the group plans to focus on only a single protein produced from each gene, rather than its many forms. “We got rid of all this complexity,” Bergeron says.

That may simplify the analysis, but it will also significantly reduce the power of the project. The single protein isoform selected by the project will not necessarily be the most important isoform produced by that gene (this is likely to differ substantially between different tissues). That means that the project will miss crucial information about the function of many of the proteins it analyses.

Actually, there are caveats of varying severity for nearly all of the currently available technologies for separating, identifying and characterising proteins. It's extremely difficult to develop methods that can accurately examine both low- and high-abundance proteins in a single run. Generating antibodies that reliably and specifically bind to each protein in the proteome will be a mammoth undertaking, and will be confounded by the alternative splicing issues mentioned above. High-throughput methods for detecting protein-protein interactions, while they have been used extensively (for instance in characterising the yeast protein interaction network), still suffer from a range of problems that can result in both false-positive and false-negative findings.

However, these are largely technology-driven constraints. Similar negative arguments were thrown at the human genome project, and look how that turned out! If anything, it seems likely that a proteome project of this magnitude would provide strong incentives to overcome the technical hurdles and standardisation problems that currently plague proteomics in general.

As a useful side-effect, this project (or its successors) will provide information that will help in interpreting the results of whole-genome sequencing. As I've noted before, we still know so little about our own genome that it's likely that most of us will have complete genome sequences well before we really have the tools and understanding to decipher what that sequence actually means. In order to have any chance of figuring out what effects a rare variant in an unannotated gene might have on our health we will need to call on data from many different fields of biology.

At the very least, large-scale analysis of the human proteome should allow researchers to tentatively place many of our currently anonymous genes into functional pathways. That's a step forward for personal genomics: knowing that you have a loss-of-function mutation in a gene that may be involved in cholesterol biosynthesis is a lot more useful (in terms of guiding further clinical testing) than simply knowing that you have a mutation in hypothetical gene C11orf68.



 Subscribe to Genetic Future.

Wednesday, April 23, 2008

David Altshuler on personal genomics

The Boston Globe has a fairly well-balanced article on the current state of personal genomics: a field with tremendous promise that is yet to really deliver. I particularly liked these two contrasting quotes from David Altshuler of the Broad Institute:

"From a clinical point of view, [current genome scans are] just noise," he said. "No one knows how to use such information to improve health."

and

In coming years, Altshuler says, he believes genomics will be as transformative as the Internet.

The first quote is only slightly exaggerated; the second is spot-on.


Subscribe to Genetic Future.

My genes made me do it

An article in the Washington Post discusses new uses of genetic testing in the courtroom that go far beyond standard forensic DNA profiling:

[...] defense attorneys are asking judges to admit test results suggesting that their clients have a genetic predisposition for violent or impulsive behavior, adding a potential "DNA defense" to a legal system that until now has held virtually everyone accountable for their actions except the insane or mentally retarded.

Some gene tests are even being touted for their capacity to help judges predict the likelihood that a convict, if released, will break the law again -- a measure of "future dangerousness" that raises questions about how far courts can go to abort crimes that have not yet been committed.

The article correctly notes that these tests are still very much at the fringes of science - behavioural genetics is a complex field, and the current associations are generally pretty weak. However, there's little doubt that many of the traits underlying a predisposition to criminal behaviour, such as a fondness for risk-taking or susceptibility to addiction, are substantially influenced by genetic factors, and it's only a matter of time before the major genes responsible are identified and characterised.

Although genetic testing will only ever allow for a probabilistic prediction of susceptibility to criminal behaviour (unlike the tortured psychics in Minority Report), society needs to prepare itself for the consequences of these findings. For instance, do "criminal genes" excuse someone from criminality, or do they simply provide an even better reason to lock such people away for the good of society? Should expensive family monitoring and support programs be targeted towards individuals who are genetically susceptible to antisocial behaviour in the presence of abuse or neglect? Given a limited budget, should rehabilitation programs focus on criminals who lack these susceptibility genes and may thus be less inclined to re-offend?

Update: The Genetic Genealogist has a great tangentially-related post on forensic genetics.

Subscribe to Genetic Future.

Sunday, April 20, 2008

A new model for genetic privacy: you don't have any

In a perspective piece in Nature Reviews Genetics (subscription required, I think), Personal Genome Project leader George Church and colleagues advocate a revolutionary new approach to research subject privacy. Essentially, they argue that "the reality of the new genetics and genomics urges us to abandon the traditional concept of medical confidentiality". In other words, research participants must learn to accept the fact that the privacy of their genetic and health information cannot be guaranteed.

When I first heard of this concept in the context of the Personal Genome Project it struck me as pure insanity - who would volunteer for a project if there is a significant risk of your genetic and health information being accessed by (say) insurance companies? Having thought it over, though, the need for such an approach is becoming more and more clear to me. The basic argument goes something like this:
  1. Your DNA sequence (or any sufficiently large set of genetic markers, like those used in modern genome-wide association studies) is enough by itself to unambiguously identify you.

  2. Thus even "anonymous" participants in large-scale genetic studies are vulnerable to having their identity revealed - all it would take is someone to have a sample of your DNA, and access to the individual data-points from the study, and they would then have access to any health or life-style information recorded about you as part of that study.

  3. As such, there simply cannot be guarantees of anonymity given to participants in such studies, fundamentally undermining the traditional model of confidentiality.

  4. The best solution to this problem is to abandon the illusion of research subject privacy, and instead recruit participants with the explicit condition that all of the data collected about them as part of the study may in fact be revealed to the public.

The authors aren't advocating a complete dump of participant genetic and health records on a publically accessible website - although volunteers in the Personal Genome Project have the option of doing just that, should they choose to. Rather, they argue for a strategy of "maximizing data protection while informing people about its limits". In other words, doing your best to limit disclosure of individual health data, while clearly informing participants of the fact that their privacy can't be guaranteed.

It certainly is an audacious paradigm shift, and I'm having trouble predicting its consequences. For instance, will such a policy discourage people with a clear family history of genetic disease from participating in large-scale cohort studies (for insurance reasons), thus reducing the power of such studies to detect disease-associated variants? Will it create a generation gap in research participation, with conservative older people shunning studies while the children of the Facebook era - who engage in public disclosure of information with a wilfulness that seems shocking to their elders - embrace participation? I don't know, but I guess we'll all find out sooner rather than later...

Anyone interested in the Personal Genome Project (which is calling for volunteers for whole-genome sequencing, by the way) should check out their informative web-site. Misha Angrist, one of the "First Ten" participants who will have their genomes sequenced by the PGP, also has a blog that's well worth adding to your RSS reader.

Subscribe to Genetic Future.

DNA Perspectives

A while back I discussed a Nature editorial calling for a public registry of disease-gene associations. This would provide potential consumers with objective information about the scientific evidence underlying commercial gene tests, helping them to make an informed decision amidst the hype, overstated claims (and occasionally sheer lunacy) that unfortunately characterises a large swathe of the genetic testing industry at the moment.

I think a broadly-backed international genetic association registry would be a fantastic resource, whether it is built on the foundations of an existing model such as SNPedia or (more likely) assembled from scratch. However, it's unlikely that a single monolithic registry will emerge: rather, we'll probably see an array of competing databases, some with official backing, some Wikipedia-like community-based annotation projects, and more than a few set up by genetic testing companies themselves. Consumers will certainly have more access to information on genetic associations, but it will unfortunately be hosted by a plethora of organisations with different goals and target audiences.

At Eye on DNA, Hsien-Hsien Lei points to a new initiative called DNA Perspectives. DNA Perspectives is funded by DNA Direct, Hsien's employer (as Hsien is commendably scrupulous in pointing out whenever the topic arises, I should add). The aim is to develop "a collaborative site developed by a wide range of industry experts to objectively evaluate the clinical validity and utility of genetic markers as well as commercially available genetic tests" - an admirable goal.

DNA Perspectives will be based on annotation by invited experts in genetics, with all information freely available to the public, and a forum for consumers to add their comments and personal ratings. I think this model is a good one, treading the line between the semi-structured anarchy of free-for-all community resources like Wikipedia and the slow-moving, cumbersome centralised bureaucracy of many official databases (there are plenty of other viable models, of course). However, it will be interesting to see if it can overcome two potentially major obstacles.

The first is community apathy, which I think will be familiar to anyone working in an expert-curated database who has tried to recruit researchers to annotate material. Return rates tend to be low, and most experts who do visit the site will make perfunctory corrections at best. The problem is basically that most experts are busy people - writing grants and papers will always be a higher priority than annotating a database, which is a considerable effort that typically has minimal (or zero) pay-off. (I write this rather guiltily, looking back on my seriously mediocre track record of participating in such efforts.)

The second problem is the overt link to a genetic testing company. No matter how hands-off DNA Direct attempts to be, there will always be a conflict of interest when the body running a genetic association registry is simultaneously relying on sales of genetic tests to pay the rent. DNA Direct certainly appears to be one of the most evidence-based genetic testing companies out there, so it hopefully doesn't have much to hide - but nonetheless, if an expert reviewer offers a scathing critique of a test that DNA Direct offers, how will the company feel about hosting that review on its own server space? Even if the company fastidiously avoids censoring the reviews, it will always be very hard to overcome consumer perceptions of bias.

Ultimately, potential genetic testing customers will probably feel much more comfortable sourcing their information from a registry with no financial ties to the testing industry. I also suspect that expert reviewers will also be easier to attract to a database backed by major funding bodies and research institutions, who can offer both the small carrot of officially-sanctioned kudos for their efforts, and (potentially) the more effective stick of making funding partly conditional upon participation in the review effort.

Of course, at this stage no such official and comprehensive registry exists - and while I don't think it's the ultimate solution, DNA Perspectives is at least a step in the right direction at a time when consumers are desperate for guidance through the murky waters of the DTC genetic testing field. I look forward to seeing how it progresses.

Subscribe to Genetic Future.

Thursday, April 17, 2008

Watson's sequence: gloomy news for personal genomics?

It's hard to imagine how the publication of the complete genome sequence of Jim Watson - assembled with unprecedented speed and cheapness using next-generation sequencing technology - could possibly be bad news for the field of personal genomics. But in an opinion piece accompanying the publication in Nature, genome evolution guru Maynard V. Olson makes that argument:

If Watson took his sequence to a genetic counsellor, there would be little to discuss. The sequence seems to show that he is a carrier for a handful of mutations that might catch a counsellor's interest. But these mutations have no known effects on Watson himself, and would confer risk on offspring only in the highly unlikely event of a marriage between two carriers. None of these mutations is ever likely to be considered an appropriate candidate for screening in the general population — of which, for these purposes, Watson is a representative member.

Recognition of the thin clinical value of this sequence may cause some investors in the new sequencing methods to take pause, given that the major capital investments required to commercialize these technologies have been motivated more by their perceived medical potential than by research applications.

Well, our current ignorance of the functional significance of most genetic variants make a good argument for not getting your genome sequenced right now. But it's not like you need that argument - the best reason for not getting your genome sequenced now is that it's ludicrously expensive (unless you have $350,000 to burn, like Dan Stoicescu).

By the time whole-genome sequencing becomes affordable - in perhaps five years - our understanding of the functional effects of human genetic variation will be dramatically better than today. With each genome that gets sequenced that understanding will grow. And best of all, a genome sequence never becomes obsolete (unlike the SNP chips currently used by personal genomics companies like 23andMe and Navigenics, which will really start to lose their usefulness over the next year or two).

In any case, while we can't predict the functional impact of every single variant in Watson's genome, even our limited current knowledge is enough to reveal some potentially important sites. For instance, Watson carries at least 10 mutations that have previously been associated with severe diseases in humans (in most cases he only carries one copy of a mutation, where two would be required to cause disease). Given that known mutations are only a small fraction of the total sequence changes that could result in severe disease, this suggests that each of us may carry quite a large number of mutations that could potentially result in serious disease in our children, should we be unlucky enough to mate with someone carrying mutations in the same gene.

In addition, the researchers predict that almost 300 of the protein-altering variants in Watson's genome are "probably damaging" to the function of the protein. These types of variants may potentially play a role in susceptibility to disease, although we don't yet know enough to be able to pick them out with any real confidence.

Anyway, it's a start. Olson is certainly correct that we still know far too little about the function of our genome for large-scale sequencing to be used as a population screening tool, but Watson's sequence illustrates that - once our knowledge has improved - there will be plenty of potential functional information to explore in a typical genome.

Update: MassGenomics has a great break-down of the Watson data.


Subscribe to Genetic Future.

Monday, April 14, 2008

Genome-wide association studies taken to the next level

The Wellcome Trust Case-Control Consortium, the group responsible for a massive study of genome-wide associations (GWAS) in seven different common diseases published last year as well as a wide range of other projects in disease genetics, has just announced plans for a mind-bogglingly large expansion of their GWAS efforts.

The numbers are truly impressive: 120,000 participants, 25 different diseases, and a total cost of £30 million (nearly US$60 million). Patients and controls will be screened for up to 1 million genetic variants, as well as being subjected to analysis of genome-wide copy-number variation (insertions and deletions of DNA).

The diseases aren't all listed in the press release, but I've managed to get a breakdown from the Wellcome Trust:

Visceral leishmaniasis
Bacteraemia susceptibility
Human prion disease
Ankylosing spondylitis
Multiple sclerosis
Ulcerative colitis
Psoriasis
Coeliac disease
Asthma
Glaucoma
Schizophrenia
Psychosis endophenotypes
Parkinson’s disease
Partial epilepsies
Ischaemic stroke
Abdominal aortic aneurysms
Myocardial infarcation
Coronary artery disease
Extreme and early onset obesity
Response to statin treatment
Barrett’s oesophagus and oesophageal adenocarcinoma
Breast Cancer
Adult glioma
Pre-eclampsia
Endometriosis

There's something in that for nearly everyone, as well as an interesting addition that isn't a disease at all: reading and mathematics abilities in 12-year-old children enrolled in the UK Twins Early Development Study. This marks an interesting (and potentially extremely controversial) foray into the world of cognitive genetics. Watch this space - the media coverage of this aspect of the project is unlikely to be universally positive.

I haven't yet been able to find out the sample sizes for each disease, but it's clear from the total number quoted in the press release that at least some of these cohorts will be quite well-powered - assuming they use a large, shared group of controls, the average sample size is likely to be more than 4,000 patients.

I've recently spent quite a bit of my time talking down the power of genome-wide association studies. Nonetheless, a study of this magnitude - combining SNP data with copy number variation - is likely to capture a sizeable (albeit by no means complete) chunk of the genetic risk variants for many of these diseases. Having comparable data-sets from so many different diseases will also facilitate the identification of common variants that influence risk for multiple diseases, as has already been demonstrated for IL23R in several auto-immune conditions.

In addition, the WTCCC and its partners are assembling an enviable collection of well-characterised DNA samples from patients and controls that can be rapidly deployed for large-scale sequencing approaches once the cost of sequencing drops far enough.

Exciting times...


Subscribe to Genetic Future.

Sunday, April 13, 2008

Navigenics vs 23andMe: drawing the battle-lines

Well, the debut of Navigenics has certainly been a lot more interesting than I anticipated. Far from being just another genome-scan product limping along in the wake of 23andMe (like, say, SeqWright's rather depressing effort), Navigenics is brazenly attempting to re-define the entire industry in a way that suits them.

At the very least, the company is staking a solid claim over the lucrative well-paid over-30 non-geek market niche, which has been surprisingly poorly tapped by the current players. But Navigenics seems to want to go further than this: in fact, they appear to be trying to reshape the personal genomics industry as being first and foremost about the sober provision of evidence-based health information, and simultaneously position themselves as the most respectable provider of this information. If in the process they can create a perception of their competitors (particularly 23andMe) as frivolous and over-hyped, so much the better.

Over at Genetics and Health, Elaine Warburton has a long interview with Navigenics' Medical Director Michael Nierenberg. This is by no means a probing critique - in fact, it reads suspiciously like an extended advertisement for the company - but there are some interesting snippets from Nierenberg about the image Navigenics wishes to present:

Navigenics is no way a ‘recreational’ genomics company and does not wish to contemplate entering any ‘recreational’ field. It is a company focusing on the wellness and prevention aspects of health. Our service focuses on actionable entities and things of substance such as cardiac disease, not eye colour or such like. We welcome regulation and make heavy use of genetic counseling.

The sub-text is abundantly clear: we'll give you accurate information about the really important stuff like cancer and heart disease, whereas our competitors (they know who they are!) mess about with trivial information about athletic performance and ear-wax consistency.

Navigenics' well-orchestrated marketing campaign revolves around this central theme of seriousness and competence, and I'm sure the message is sinking in with their apparent target audience (well-paid, highly-educated, time-poor executive types old enough to start fretting about their long-term health); having the reliably earnest Al Gore spruik the company certainly didn't hurt. To emphasise their trustworthy seriousness, Navigenics has launched a joint study with the Mayo Clinic into the effects on patients of receiving genetic information, is partnering with Medscape to provide physician education, and proposed a set of standards for personal genomics companies (a clear attempt to re-define the industry in their own image, while simultaneously seizing the moral high ground).

Through these activities, as well as their use of CLIA-certified genotyping facilities and provision of 24-hour access to genetic counselling, the company no doubt hopes to avoid many of the criticisms thrown at other personal genomics companies.

This all seems quite admirable, on the whole. However, the Navigenics model is also deeply regressive: they are taking the currently exciting, somewhat anarchic but intrinsically empowering field of personal genomics (in which individuals are free to explore their own genetic data however they wish) and cramming it back into the tightly-regulated, paternalistic environment of the standard medical framework. Where 23andMe talks about guiding customers through their own journey of genetic discovery, Navigenics appears to be more about giving clients the information that Navigenics thinks is medically relevant, and protecting them from all the non-essential details that might overwhelm or confuse them.

Nowhere is this regressive paradigm more evident than in Navigenics' refusal reluctance to give their customers access to more than a tiny fraction of their own genotyping results. Unlike 23andMe and deCODEme, who both freely provide clients with access to their complete, raw genotyping data, Navigenics customers must sign a waiver to receive their results on an encrypted disk (presumably without an easy-to-navigate interface); Navigenics ominously warns that "without our blessing, the potential for misinformation is extremely high" (updated thanks to Hsien). Elaine puts a positive spin on this reluctance:

Imagine the confusion and furore if Navigenics were to provide its members with their full 1 million marker analysis! Navigenics’ (and others) sensible, if somewhat patriarchal approach of ‘drip feeding’ results to members as and when the research is robust enough to bring the SNP into the public domain, is one that should be applauded not derided.

In other words, customers shouldn't need to worry their pretty little heads over all these confusing As, Cs, Gs and Ts - they can just let Navigenics decide what they need to know. Ouch.

I can only assume that Navigenics' focus group research suggests that their target audience finds this attitude reassuring rather than profoundly insulting; either way, it's both patronising and unnecessary. After all, it's not like 23andMe simply punt any old genetic association out there for their customers to sift through - they carefully code the associations to indicate how reliable they are (based on a pretty reasonable set of criteria [PDF], I might add). Customers are allowed to analyse their own data for both gold-standard and lower-reliability associations, but are given information to help them decide how much weight they should place on each. In my opinion this sort of informed freedom is a far more enlightening (and vastly less insulting) model that the constrained "need-to-know" approach of Navigenics.

Anyway, it will be interesting to see how Navigenics alters the long-term tone of the personal genomics market. Perhaps the early pioneering feel of personal genomics was just a temporary aberration, and we are now seeing the beginning of a general regressive shift towards the standard medical model. More optimistically, I suspect these early battle-lines mark the beginning of a diversification of the industry, with some products targeting the individualistic and curious spirit of a younger, information-savvy generation, and others appealing to the more serious health-centered focus of individuals moving towards middle age.

Either way, it will be fascinating to watch 23andMe, Navigenics and their current and upcoming competitors struggling to define an entire industry as they battle for market share.


Subscribe to Genetic Future.

Thursday, April 10, 2008

Ready or not, personal genomics is here

A new editorial in Nature comments on the rapidly expanding field of personal genomics. The appearance of this industry has taken many observers by surprise; indeed, the authors note, "Rarely have basic discoveries morphed into a commercial product quite so swiftly."

The speed of the industry's growth has led to many calls for heavy regulation, which (I think) would be a disastrous approach for consumers. Nature agrees, and offers a positive alternative solution:

If consumers are to reap the benefits that genetic testing can offer, they need understandable information about the basis, validity and limitations of the tests. One proposed structure for providing this information is a publicly accessible registry into which test-makers would be required to upload data about their tests and the studies that back them. This information should be updated as genetic risks are changed or refined, as inevitably they will be.

There are already some similar databases that currently exist (such as the Wikipedia-like SNPedia) or are being planned (e.g. GEN2PHEN [PDF]), although none of them are yet comprehensive or rigorous enough to fulfil the needs of genetic test consumers. It would be great to see these and similar efforts promoted and funded, or perhaps even combined in a central registry that supplements slow, careful expert annotation with the faster but looser community-driven SNPedia approach. It would almost certainly be more cost-effective to build on existing projects rather than developing a new registry from scratch.

However the registry develops, Nature's point is that the solution to shonky genetic test vendors isn't just legislation (which, if too heavy-handed, will also negatively affect legitimate companies and limit consumer choice), it's also information. Providing potential customers with reliable data about the efficacy of genetic tests and allowing them to make their own decisions protects consumers without sacrificing their autonomy. This is certainly my philosophy, and the motivation behind Genetic Future - it's very reassuring to see that this sentiment is shared in the lofty reaches of the Nature editorial board.

The article finishes with pertinent advice to consumers:

In the meantime, online shoppers who buy genetic tests would do well to keep asking themselves whether the science is, indeed, ready.

Before buying any genetic test, research widely about its pros and cons, and think hard about whether the information you receive will really be worth the money you spend, or whether you'd be better to save your money until better tests are available.


Subscribe to Genetic Future.

Personal genomics: getting your money's worth

Over at Eye on DNA, Hsien-Hsien Lei has an entertaining list of the variety of personal genomics services that could be purchased for the $2,500 cost of a full Navigenics scan. There's some tough decisions in there: would I prefer two paternity tests, or sixteen genetic tests predicting my risk for baldness?

Hsien points out that at the current going rates of the sole company offering commercial whole-genome sequencing, $2,500 would buy you only 0.71% of a whole genome. That sounds small, but it's still more than 20 million base pairs - twenty times the paltry one million sites interrogated by the Navigenics chip for the same price!

Of course, the Navigenics SNPs have been carefully selected to provide as much information as possible about common genetic variation, so they're still a better purchase right now than a random 0.71% of a genome sequence. Nonetheless, this comparison provides some insight into just how cheap sequencing technology is becoming; it certainly won't be long before it's commercially competitive.

As I've emphasised in recent posts, the chip technology currently used to analyse genetic variation by researchers and personal genomics companies (23andMe, deCODEme, SeqWright and now Navigenics) will only ever capture a fraction of your total genetic risk for common disease: the fraction that consists of common small-scale variants.

In contrast, whole-genome sequencing will give you information about the types of genetic variation - such as rare variants and large-scale structural variation - that are completely invisible to current chips. Since these variants probably constitute a substantial fraction of genetic risk for common diseases, sequencing (when it becomes affordable) is likely to give you a lot more useful information than current genome scans. And best of all, since whole-genome sequencing gives you information on every variation in your genome, it won't ever become obsolete - whereas chips will be periodically replaced by new, higher-resolution models that capture a larger (but still incomplete) snapshot of your genetic variation.

In other words, while genome scans are the best affordable technology we have right now, they have profound limitations and will become rapidly outdated as researchers begin to focus on the rare variants and structural variation that contribute to variation in complex traits and common disease risk. For those who care about value for money, my suggestion is that you put your $2,500 in a bank account with a good interest rate and don't take it out until whole-genome sequencing becomes cheap enough to buy that instead.

Of course, that's advice for those who are mainly interested in health prediction. For genetic genealogists current genome scans provide some powerful information about genetic ancestry; if that's your interest, you'd probably be best off investing in a deCODEme scan, which gives you the same number of SNPs as Navigenics for 40% of the price, and (unlike Navigenics) allows you to download your complete raw data.


Subscribe to Genetic Future.

Tuesday, April 8, 2008

Some early thoughts on Navigenics

It's been a long, long wait, but Navigenics has finally officially entered the personal genomics arena. Like the Me Two (23andMe and deCODEme), Navigenics will be offering to determine the sequence at hundreds of thousands of commonly variable positions throughout your genome, which it will use to make predictions about your risk for a variety of common diseases such as heart disease and type 2 diabetes. The service will cost US$2,500, compared to US$1,000 for the two major competing offers.

The Wired blog has the facts. A few early thoughts:
  1. Navigenics would probably struggle to compete head-to-head with 23andMe, which has a much stronger public profile, a funkier website, and offers a significantly cheaper service (albeit with fewer markers); BUT

  2. It's pretty clear they're aiming at a different market niche altogether. Just compare the two websites (23andMe, Navigenics): 23andMe will appeal more to the hip young web-savvy childless yuppie who wants to know more about him/herself and build up some cool conversation topics (their website is all about "Your personal journey of genetic discovery"); Navigenics is aiming for the sober, older executive with kids who watches their weight and cholesterol and heads to the gym three times a week (website quote: "I want to be part of all the big moments in my son's life, so I'm doing everything I can to stay healthy.")

  3. Providing access to genetic counselling and promoting physician education, while certainly praise-worthy, is all part of this market positioning. Navigenics is trying to say that they're above all the hype and frivolity of 23andMe; all they care about is your future health, and they care about that in a deeply earnest, professional yet compassionate manner. (The genetic counselling video sums up the mood nicely.)

  4. Navigenics has started with a genotyping service that is CLIA-certified - unlike 23andMe, who had to change labs to a CLIA-certified facility a few weeks back (causing disruptions to their service).

  5. Navigenics offers a long-term DNA storage service, which is a clever business move. It will be that much easier to convince customers to purchase an extra DNA test in a year's time - or whole-genome sequencing in five years' time - if they don't have to go through the hassle of re-submitting a mouth swab. Never underestimate the role of convenience in shaping consumer decisions.

  6. The "relative lifetime risk" analysis offered by Navigenics seems as though it will provide more impressive-sounding numbers than the absolute risk estimates offered by 23andMe and deCODEme, but I'm not sure how statistically sound it is to extrapolate odds ratios from genetic association studies to total lifetime risk (especially given that the strength of genetic associations is known to vary with age). Would any statisticians out there care to dissect Navigenics' white paper on their methods?

  7. Finally - and this is completely a personal thing - there's absolutely no way I'd buy into a service that didn't provide me with complete and unfettered access to my raw SNP data. Both 23andMe and deCODEme offer customers the ability to download and analyse their own data; this isn't the case for Navigenics, according to the Wired article. That's a complete deal-breaker for me, but admittedly it's unlikely to have much of an impact on the chiselled, athletic executive types featured on Navigenics' front page, who (understandably) have no particular interest in the raw data but simply want to know their risk of stroke and heart disease (presumably so they can stay alive and healthy long enough to play football with their chiselled, athletic kids).

Cynicism aside, I've got to hand it to Navigenics: they've managed to neatly differentiate themselves from the competition, and they're now poised to capture a lucrative high-income section of the market that has been surprisingly poorly targeted by existing personal genomics companies.

Anyway, you'll no doubt read a lot more about this over the next day or two. Genetics and Health has an ongoing (and thus far relentlessly positive) series of posts on Navigenics. I'm particularly interested to see what Steve Murphy has to say - it seems to me that Navigenics has managed to avoid most of the problems that he's been slamming 23andMe for over the last few months.


Subscribe to Genetic Future.

Monday, April 7, 2008

Height and hypertension genes in Nature Genetics

The advance online edition of Nature Genetics is stuffed with juicy complex human genetics goodness.

Firstly, there are three massive genome-wide scans for genes involved in regulating human height, each of which analysed more than ten thousand individuals. As I've mentioned before, height appears to be one of those traits (like bipolar disease) that thumbs its nose at genome-wide association studies (GWAS). That's evidently clear from these studies, each of which - despite their unprecedented size (one of them scanned more than 25,000 individuals!) - managed to capture variants explaining less than 5% of variation in height.

I note that a few previously identified height genes, like HMGA2 and GDF5, pop up in more than one of the three studies, while a new gene (ZBTB38) appears as the top candidate in all three of the studies. However, there doesn't seem to be a huge amount of overlap in the lower-ranked genes (although I need to read the articles more carefully to be sure).

ScienceDaily puts a positive spin on the story ("Scientists are beginning to develop a clearer picture of what makes some people stand head and shoulders above the rest"), but the real story is this: despite the massive scale of these studies, they're still only capturing less than 5% of the total variance in a trait that is almost entirely (~80%) genetic. This is a powerful demonstration of the inability of current GWAS technology to access the genetic variants responsible for the vast majority of heritable variation in at least some complex traits, for reasons I've discussed recently.

Researchers interested in the genetics of common diseases are no doubt experiencing a sinking feeling as they read these studies, since there's every reason to expect that what holds true for height will also apply in at least some of these conditions. If so, the number of patients required to characterise even a trivial proportion of the total genetic risk using GWAS will be astronomical. However, there is a light at the end of the tunnel: large-scale sequencing, once it drops in price, will provide researchers with access to the rare variants and structural variation currently missed by chip-based GWAS technologies, and should help to capture a substantial proportion of the missing variation.

This leads into another Nature Genetics article, which used an interesting candidate gene resequencing strategy to detect variants linked with variation in blood pressure. Readers persistent enough to slog through yesterday's post on the genetics of bipolar disease might recall that hypertension is another disease in which the GWAS approach has yielded little success; in the comments to that post, G from Popgen ramblings notes that admixture mapping (an approach to gene identification using populations with mixed ancestry) has also failed to produce consistent signals, despite a profound difference in hypertension risk between populations.

The Nature Genetics study took a different approach, sequencing the full coding regions of three genes associated with rare, serious hypertension conditions in previous family studies in more than 3,000 individuals from the Framingham Heart Study cohort. They found a scattering of rare variants - all present in a single copy in any given individual - with either inferred or biochemically verified effects on protein function. When the individuals carrying these rare mutations were analysed as a group they showed significantly lower blood pressure than non-carriers.

This combination of targeted resequencing and functional analysis is a difficult road, but it's one that researchers will have to follow increasingly often as they attempt to characterise the rare variants that likely comprise a significant fraction of common disease risk. I'll have more to say about this in future posts.


Subscribe to Genetic Future.

Sunday, April 6, 2008

The elusive genetics of bipolar disorder

Bipolar disease is a common and profoundly debilitating mood disorder, with a remarkably strong genetic component. These features have made bipolar an appealing target for geneticists; yet despite three large genome-wide association studies, the genetic basis of this disease remains as elusive as ever.

This post serves as an introduction to the complex genetics of bipolar disease. In later posts, I'll be discussing the scientific basis of commercial tests for bipolar disease offered by the company Psynomics, and the implications of the genetic architecture of bipolar and other mental illnesses to normal variation in human personality traits.

The goal
Identifying individuals at serious risk of developing mental health problems later in life raises the possibility of early interventions, which might save at-risk individuals - and society as a whole - from the worst effects of mental illness. However, in order to develop predictive tests we first need to characterise the underlying risk factors, including causative genes.

Bipolar disease is a serious mood disorder that affects somewhere between 1 and 2% of individuals of European descent. Surprisingly, around 85% of the variation in risk for this disease is determined by heritable factors (i.e. genes), meaning that genetic approaches seem likely to be a fruitful way to develop predictive tests. Accordingly, bipolar has now been a target for three large genome-wide association studies (GWAS) involving a total of 4,684 patients and 6,447 healthy controls.

And the results of these studies have been - well, almost nothing. Thus far, not a single bipolar marker has been convincingly replicated in more than one of these studies. Bipolar disease thus serves as an unfortunate but illuminating poster-child for the limitations of genome-wide association studies that I discussed last week.

The WTCCC analysis

The largest genome-wide association study of bipolar conducted to date was the Wellcome Trust Case Control Consortium (WTCCC) analysis of 3,000 healthy individuals and 2,000 bipolar patients.

The WTCCC study is a truly remarkable piece of science: simultaneous genome-wide analysis in seven different common diseases (bipolar, coronary artery disease, Crohn's disease, hypertension, rheumatoid arthritis, type 1 diabetes and type 2 diabetes).

The diagram below illustrates the results of the genome scans performed by the WTCCC in all seven common diseases (each with ~2,000 patients compared to a shared set of ~3,000 controls). Each dot represents a different genetic variant found in one of our 23 chromosomes (numbered below the line), with its height above the horizontal line indicating how closely it was associated with that disease. Green dots indicate variants that were associated with each disease with a reasonable degree of statistical confidence; blue dots represent variants that couldn't be distinguished from random noise.


Hopefully you can see that while most of the other diseases show a a series of nice green peaks indicating regions of the genome that are associated with disease, bipolar disease - and, curiously, hypertension - are relatively flat and featureless wastelands. If you peer closely at the bipolar dots you'll see a few green markers here and there that peek above the statistical noise of the rest of the genome, but nothing like the green towers that furnish the other diseases (for the curious, that massive signal on chromosome 6 in both the rheumatoid arthritis and type 1 diabetes samples is the MHC cluster of immune genes).

In terms of risk genes identified by the study, bipolar disease performed worse than five of the other six diseases, with only one of those green dots reaching convincing genome-wide significance (as opposed to, say, nine in Crohn's disease and seven in type 1 diabetes). And it gets worse: that single significant risk variant disappeared when the researchers used an expanded reference group approach, essentially comparing bipolar samples with a pool of the controls plus the other six disease groups (which can, fairly reasonably, be considered "controls" for this comparison). The authors identified a few other regions with a weak association with the disease, but nothing further that satisfied their stringent criteria for a convincing signal.

Other genome-wide studies
The two other bipolar GWAS performed within the last twelve months haven't made the picture any less murky.

The first ever genome-wide association study of bipolar (published online in May 2007) used a DNA-pooling strategy that is more cost-effective but significantly less powerful than the traditional GWAS approach, followed by replication studies of interesting-looking variants. The authors of this study reported (with surprising confidence) an association between bipolar and a variant in the DGKH gene - the senior author even remarked to the media that "DGKH is a promising target for new treatments that might be more effective and better tolerated" than the existing therapy, lithium. It's not looking so promising now: there's absolutely no trace of the DGKH association in either the WTCCC study or the other genome scan described below. The other, weaker signals seen in this study (with the possible exception of DFNB31, described below) haven't fared much better, receiving no convincing validation from either of the later GWAS.

The third and most recent genome-wide study used a similar approach to the WTCCC analysis, examining 1461 bipolar patients and 2008 controls. It tells a now-familiar story: while the authors identified a number of variants that were somewhat more common in bipolar patients than controls, not one of their top 20 regions overlaps with any of the suggestive signals in either of the other two GWAS.

The best the authors could find is an overlap between their results and the WTCCC for a variant found in the CACNA1C gene, but this is a bit of a stretch; this region is not strongly associated in either of the studies and it's entirely possible that the overlap is down to chance. The same is true for the DFNB31 gene: although markers in this gene are weakly associated with bipolar in all three of the GWAS conducted to date, the variants flagged by the WTCCC study are physically distant from those found in the other two studies. These two genes certainly warrant detailed follow-up studies, but they're not convincing risk genes yet.

To add insult to injury, the authors' attempts to replicate their own findings in independent samples bore little fruit: although a few of the findings were marginally statistically significant, the number was no higher than would be expected by chance alone.

A bitter harvest
In other words, despite valiant attempts, these three large genome-wide association studies have yielded very little new useful information about the specific genes underlying bipolar risk. They have certainly provided leads to be followed up in targeted studies, and it's worth bearing in mind that a failure to replicate doesn't necessarily mean that all of the variants identified in these studies are false leads - rather, the inconsistency could simply be the result of insufficient power in each individual study leading to the identification of random and non-overlapping sets of risk genes being identified by each group. But this must seem a pretty distant consolation for the investigators in these studies, given that there's still no way to determine precisely which of the possible associations this applies to.

The near-complete failure of GWAS in this disease does tell us something about the genetic architecture of bipolar: it is not composed to any significant degree of the common, moderate-effect single-base variants that can be readily detected by current chip-based GWAS technologies.

So where, then, is that heritable 85% of bipolar risk hiding in the genome? In their discussion section, the authors of the most recent genome-wide study put forward several explanations, all of which I've discussed in my recent post on the reasons for failure in genome-wide scans: variants with modest effect sizes, population-specific variants, disease heterogeneity, epistatic interactions, copy number variation, and rare variants. Population-specific variants are unlikely to have played a role in the discrepancy between these three studies (all of which were conducted on subjects of predominantly western European ancestry), but there's a pretty good chance that all of the other factors play a role.

The researchers are surely hoping that small effect sizes are the major problem, since this is the easiest problem to remedy (simply increase sample sizes). Disease heterogeneity - in other words, multiple diseases with distinct causes that all converge on a bipolar end-point - also seems like a particularly plausible explanation given the complexities of mental illness. It's also likely that various types of genetic variants that are largely invisible to existing SNP chips, like rare variants and copy-number variation, are important. I'll be discussing these in more detail soon when I review a recent paper on rare copy-number variations in schizophrenia patients.

The next steps
So long as at least some of that heritable bipolar risk stems from common variants with weak effects on disease risk, it will eventually be captured by further genome-wide scans with much larger numbers of bipolar patients and controls. A relatively cheap way to start this will be to combine the results from existing studies: in the discussion section of the most recent paper discussed above, the authors mention plans to perform a combined analysis with the WTCCC investigators. This will be made easier by the fact that both studies were performed using the same (Affymetrix 500K) genotyping platform. Unfortunately, because the third scan was performed using a different platform and a sub-optimal DNA pooling strategy it will be more difficult to incorporate its results into a three-way combined analysis.

Some time back, the National Institute for Mental Health announced that it was pledging $5 million for genomic approaches to bipolar disorder and schizophrenia, which will help to pay for the recruitment and genotyping of new patients and controls. With ten thousand or so patients and controls, the power of genome scans to detect low-risk variants will be dramatically higher than the studies discussed here - so if there are in fact common bipolar variants out there with an effect worth caring about, we should know about them within the next few years.

Of course, it seems increasingly unlikely that common variants constitute more than a small proportion of the total genetic risk for bipolar, so future studies will need to dig deeply into the less well-mapped regions of human genetic variation. In the immediate future we will see more studies of copy-number variation using high-resolution arrays, but ultimately (once costs drop low enough) the real answers are likely to come from large-scale sequencing studies. Sequencing will detect both rare variants and copy-number variation; however, it will take large sample sizes and some clever analysis to make sense of the huge volumes of data that it generates. Improved diagnostic approaches - perhaps brain-imaging technologies - that allow bipolar patients to be divided into distinct clinical sub-groups (potentially with separate genetic etiologies) may also prove useful.

A final note: while bipolar serves as a rather extreme example of the failure of genome-wide association with common markers, it's also a reminder of the vast swaths of genetic risk that remain unexplained in nearly all other common diseases as well as complex non-disease traits. Bipolar researchers are thus not alone in their frustrations, and the lessons learned in identifying the genes underlying this condition will be highly relevant to other common diseases.


Subscribe to Genetic Future.

Burton, P.R., et al. (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447(7145), 661-678. DOI: 10.1038/nature05911

Baum, A.E., et al. (2008). A genome-wide association study implicates diacylglycerol kinase eta (DGKH) and several other genes in the etiology of bipolar disorder. Molecular Psychiatry, 13(2), 197-207. DOI: 10.1038/sj.mp.4002012

Sklar, P., et al. (2008). Whole-genome association study of bipolar disorder. Molecular Psychiatry DOI: 10.1038/sj.mp.4002151

Thursday, April 3, 2008

More on the 23andMe lab change-over

I pointed yesterday to 23andMe's apology and explanation for significant delays in sending out genotyping results to customers - apparently a change of lab was to blame, driven by the need to shift their facility to a CLIA-certified venue.

Our Gene Sherpa Steve Murphy has a lot more to say about this, include a challenge to 23andMe to repeat the assays for all clients that have already been genotyped and compare the two data-sets. I personally don't think that's necessary, although it would be a powerful symbol of openness for the company to release some more detailed results of the comparisons between results between the old and new facilities (at least to its customers, if not to the rest of us riff-raff).

Disclaimers or no disclaimers, people are taking the results of these tests seriously. Failing to act quickly and strongly to reassure the community that their personal genomic data are sound would further undermine an industry that is already being subjected to considerable media scepticism.

That said, if I were in the first batch of customers I'd still be feeling pretty confident about my data. Steve suggests that the stated <0.1% discrepancy between the old and new labs represents "a whole ton of SNPs"; I'd respond that it means that for any given SNP of interest, there's at most a 1 in 1000 chance that your data are incorrect. That strikes me as a thoroughly acceptable error rate given how small an emphasis I'm likely to place on any one SNP when considering making changes to my lifestyle.

Update: In the comments, Ann Turner points to her recent comparison of 23andMe and deCODEme data from the same individual on RootsWeb. For the 560,128 SNPs for which both companies called a genotype, only 35 were different between the two platforms - that's an astoundingly low discrepancy rate of just 6.25 differences per one hundred thousand SNPs! Slightly more worrying is 23andMe's missing data rate of 3.5 per 1000 SNPs (for SNPs that were called successfully by deCODEme), which is uncomfortably high to my liking, and compares unfavourably to deCODEme's missing rate of less than 1 per 1000 SNPs.


Subscribe to Genetic Future.

The power of positive thinking

From a widely-circulated AP piece:

Reykjavik, Iceland-based deCODE Genetics Inc. is publicly traded and has a market capitalization of about $100 million (€63 million). It recently began offering its own personal genomic scanning service, deCODEme, which analyzes about 1 million genetic variations for just under $1,000 (€632).

"I am convinced that within five years every college-educated person in America is going to have a profile like this," said deCODE chief executive Kari Stefansson. "You cannot afford not having this."

Given deCODE's rapidly dwindling financial reserves, I suspect it's more that he can't afford you not having it...


Subscribe to Genetic Future.

P3G: the future of population genomics

The Public Population Project in Genomics (cutely abbreviated as P3G) has published a brief open-access mission statement in the European Journal of Human Genetics.

P3G is a truly massive collaborative enterprise that aims to bring together large population cohorts from around the world - eventually including more than 11,000,000 subjects, if all goes as planned - and facilitate the sharing of samples and data for population genomic studies. Several of the problems with genome-wide association studies that I highlighted in my recent post (particularly the existence of rare risk variants or variants with small effects) can only really be addressed with enormous sample sizes. Gathering sufficient numbers of patients will require large-scale international collaborations, and it looks as though P3G is a big step in that direction.

The lofty principles espoused in the statement ("free exchange of ideas, data-sharing and openness for the benefit of all") might sound rather naïve if it weren't for the surprising triumph of such ideals in human genetics over the last decade: as the article notes, the Human Genome Project, the SNP Consortium and the HapMap Project all serve as hugely successful models of large-scale collaboration across international borders and the free public release of data.

Collaborations of this scale are essential for human genetics to continue moving forward at its current exhausting pace. I look forward to seeing how this project evolves over the next few years.


Subscribe to Genetic Future.

Wednesday, April 2, 2008

23andMe delays explained

A couple of weeks ago I pointed to an anonymous LiveJournal entry in which "fdmts" complained about the delays in receiving his genome scan results from 23andMe. The LiveJournal entry quoted an email from the company explaining that they were "experiencing a backlog that is resulting in longer than predicted processing times".

23andMe founders Anne Wojcicki and Linda Avey (in their traditional, somewhat eye-hurting fluorescent pink attire) have now used the 23andMe blog The Spittoon to apologise and explain. It seems that the company has changed the lab used to perform their genotyping, and the transfer has resulted in an inevitable lag time.

Bearing in mind that 23andMe are at the forefront of a brand new industry involving complex technical and legal issues, such delays are not unexpected. It appears in this case that the change of lab was driven by regulatory requirements; the new lab is certified under the Clinical Laboratories Improvement Act of 1988 (CLIA).

I don't know enough about the regulatory environment in the U.S. to guess at whether this move is simply in order to stay ahead of the shifting demands of regulatory agencies, or whether it heralds a move towards a more clinical focus (something the company has explicitly steered away from so far). Any thoughts?


Subscribe to Genetic Future.