Tuesday, February 12, 2008

23andMe looks towards a sequencing future

Right now, personal genomics companies like the Me Two (23andMe and deCODEme) and their less well-advertised competitor SeqWright offer to give you your DNA sequence at up to one million positions throughout your genome - less than 0.05% of the total. While this approach is actually surprisingly informative about patterns of common genetic variation throughout the genome, it still provides a limited window into your genome as a whole.

Precisely how limited this window is has become clear from the recent results of large genome-wide association studies for common diseases like lupus or diabetes. While the successes of these studies have been well-publicised - dozens of new genetic variants that can be used to predict future risk of disease - the publicity has glossed over a slightly dirty little secret: the common genetic variation surveyed by chip-based approaches captures a relatively small proportion of the total genetic risk for most common diseases.

Where is the rest of the disease risk hiding? A large proportion of this risk is likely conferred by a large number of rare variants, each of which may be restricted to just a few families, but which add up to a huge amount of total risk. Such variants will be completely invisible to chip-based genotyping methods since they are not "tagged" by any of the common variations detected by the chips. The only realistic way to detect such variants will be through large-scale sequencing - determining the sequence at every position in the genome (or at least a substantial fraction of it).

So how long will it be before sequencing technologies can be brought down to the costs that personal genomics customers are willing to bear, as opposed to the $350,000 genome sequence currently offered by Knome? This is a difficult question to answer, as David Hamilton from VentureBeat explains in a great recent analysis centred around an article in the NY Times. But my best guess: we will see the first sequencing-based forays into the personal genomics (possibly sequencing just a few dozen important genes) within the next twelve months, and I would be very surprised if whole-genome sequencing doesn't reach the broad personal genomics market (i.e. at a cost of less than $5000) well within the next three years. Given the competition in this area, and the money being pumped into development by both governments and private consortia, it's a fair bet that the technology will move fast.

Existing personal genomics companies are also well aware of the need to move fast to stay on top of the shifting technology and keep their grip on the market. In a recent blog entry, 23andMe's DarrenP spells out how cheap sequencing will change personal genomics, and explicitly foretells the entry of 23andMe into the sequencing market:
By some estimates, the cost of sequencing a human genome could be a few thousand dollars by 2014.

23andMe is already riding this wave. A dozen years ago it would have cost about $600,000 to examine the 580,000 points, known as SNPs, that we include in our $999 service. Eventually we’ll be able to give you your complete sequence for that price.

That may be somewhat disappointing for 23andMe's existing customers, who will watch their $1000 genetic data become rapidly obsolete over the next few years - but this is an experience familiar to anyone who buys a new computer or other high-tech device only to watch it succeeded by cheaper, more powerful alternatives within a few months. In addition, I'd guess that 23andMe will offer a sequencing discount to current customers to help hold onto their share of the market.

Of course, the interpretation of large-scale sequencing data will bring its own set of challenges. A common genetic variant on a chip that is associated with, say, an elevated risk of prostate cancer, is comparatively easy to interpret: if you have the variant, you're at higher risk. But what if your gene for androgen receptor turns out to contain a rare mutation in its regulatory region that might alter the expression of the gene? Because the mutation is rare, there's unlikely to be any solid data on its effect on disease risk. Amplify that uncertainty by the hundreds of variants of questionable functional effect that will likely be found in any genome, and the end result for a customer is likely to be confusion rather than enlightenment.

Nonetheless, the rapidly dropping cost of sequencing will revolutionise personal genomics - and as David says, the jostling for position over the next few years will certainly be a heck of a lot of fun to watch.

6 comments:

Tim said...

Complete sequencing will likely be a waste of time without better annotations and a positive family history. Especially in minority populations, such as that where I live. We have a large number of "variants of unknown significance" for BRCA1 and 2 because Myriad hasn't done their homework. I wouldn't want to predict whether these variants are significant either, given the huge unknowns with upstream-downstream regulation, silencing RNAs, etc.

In addition, the sequencing labs are only looking at people with a positive family history so they would likely have a larger burden of "modifier genes" that push a variant into the disease state.

Simon Lin, MD said...

Great post! Thank you for sharing your concerns of the effectiveness of SNPs in the study of common diseases.

I agree that direct DNA sequencing is a promising technology. However, its utility in the study of common diseases, is equally, or more, questionable, partially discussed in the previous comment by Tim.

SNPs, by definition common alleles, although seems counter-intuitive in finding diseases, does work in many cases! The theoretical basis has been argued and established when the human SNP project was started.

We must still admit that the analysis of the SNPs are rudimentary: most of the rules in use are based only on a single SNP. Rarely are we able to consider a combination thereof, because of the computational complexity involved.

By the way, how did you enlist your blog as part of the DNA Network? It is not only a cool logo but also a very active source of information!

Simon
http://retail-genomics.blogspot.com/

Daniel said...

tim,

Great points. As I said in my post, I think the interpretation of genome sequence data will be extremely challenging - we simply don't know enough yet about the way genes operate to be able to predict the functional effects of newly discovered variations with any real confidence. Even variations that change protein sequence can be difficult to interpret, let alone variants in regulatory regions.

But our annotations are getting better, as are our models of gene function - in fact, BRCA1 is blazing a trail here, with a number of groups working on better ways to predict whether or not a novel variant is likely to affect the function of this gene. We're not going to have perfect predictive power for the foreseeable future, but at least we'll be able to make probabilistic estimates of risk for a subset of variants within the next five to ten years.

Daniel said...

simon,

I agree that common variants have proved to be responsible for a reasonable proportion of risk for some common diseases. But our existing "common SNP" chips are still missing a lot of the risk variance - for instance, even after genome scans involving tens of thousands of patients, the genetic risk variants known for type 2 diabetes only predict around 10% of the total risk.

Next-generation SNP chips will sample rare variants as well (enabled by studies like the 1000 Genomes Project, which will have the power to detect essentially all SNPs with a frequency above 1%). But it's likely that some proportion of the heritable risk is due to very rare - perhaps even family-specific - alleles of relatively large effect. These won't be picked up by association studies using any SNP chip, however dense.

There is one other advantage of whole-genome sequencing: your data won't become obsolete in a hurry. In a few years, new association studies will be relying on super-dense chips, and the disease SNPs they find likely won't be covered by the current 23andMe or deCODEme chips. But if you have your full genome sequence you are guaranteed to know your genotype for whatever disease variant the new studies reveal.

Daniel said...

simon,

To apply to join the DNA Network, email Rick Vidal at rvidal@gmail.com.

Tim said...

Daniel,

I agree that sequencing is much better than a SNP panel, however, until investigators take into account other interesting genetic phenomenon such as copy number and structural variation we have a long way to go to explaining phenotype-genotype correlations. For example, there are a number of null alleles (the GSTs for example) that don't pass the validation criteria for SNP chips since they don't follow the rules for Mendelian inheritance set forth in the algorithms used for chip development. This problem doesn't just go away when the gene is sequenced since most sequencing strategies don't account for copy number.