Showing posts with label whole-genome sequencing. Show all posts
Showing posts with label whole-genome sequencing. Show all posts

Sunday, April 20, 2008

A new model for genetic privacy: you don't have any

In a perspective piece in Nature Reviews Genetics (subscription required, I think), Personal Genome Project leader George Church and colleagues advocate a revolutionary new approach to research subject privacy. Essentially, they argue that "the reality of the new genetics and genomics urges us to abandon the traditional concept of medical confidentiality". In other words, research participants must learn to accept the fact that the privacy of their genetic and health information cannot be guaranteed.

When I first heard of this concept in the context of the Personal Genome Project it struck me as pure insanity - who would volunteer for a project if there is a significant risk of your genetic and health information being accessed by (say) insurance companies? Having thought it over, though, the need for such an approach is becoming more and more clear to me. The basic argument goes something like this:

  1. Your DNA sequence (or any sufficiently large set of genetic markers, like those used in modern genome-wide association studies) is enough by itself to unambiguously identify you.

  2. Thus even "anonymous" participants in large-scale genetic studies are vulnerable to having their identity revealed - all it would take is someone to have a sample of your DNA, and access to the individual data-points from the study, and they would then have access to any health or life-style information recorded about you as part of that study.

  3. As such, there simply cannot be guarantees of anonymity given to participants in such studies, fundamentally undermining the traditional model of confidentiality.

  4. The best solution to this problem is to abandon the illusion of research subject privacy, and instead recruit participants with the explicit condition that all of the data collected about them as part of the study may in fact be revealed to the public.

The authors aren't advocating a complete dump of participant genetic and health records on a publically accessible website - although volunteers in the Personal Genome Project have the option of doing just that, should they choose to. Rather, they argue for a strategy of "maximizing data protection while informing people about its limits". In other words, doing your best to limit disclosure of individual health data, while clearly informing participants of the fact that their privacy can't be guaranteed.

It certainly is an audacious paradigm shift, and I'm having trouble predicting its consequences. For instance, will such a policy discourage people with a clear family history of genetic disease from participating in large-scale cohort studies (for insurance reasons), thus reducing the power of such studies to detect disease-associated variants? Will it create a generation gap in research participation, with conservative older people shunning studies while the children of the Facebook era - who engage in public disclosure of information with a wilfulness that seems shocking to their elders - embrace participation? I don't know, but I guess we'll all find out sooner rather than later...

Anyone interested in the Personal Genome Project (which is calling for volunteers for whole-genome sequencing, by the way) should check out their informative web-site. Misha Angrist, one of the "First Ten" participants who will have their genomes sequenced by the PGP, also has a blog that's well worth adding to your RSS reader.

Subscribe to Genetic Future.

Monday, March 31, 2008

Eye on DNA interviews Knome CEO

Hsien-Hsien Lei from Eye on DNA has an exclusive interview with Jorge Conde, CEO of Knome - the company that offers whole-genome sequencing to customers for a cool $350,000.

As I've said before, the first Knome customers will be getting a pretty rough deal: a vast sum of money forked out for a pretty minimal return in terms of useful information, given our currently dismal understanding of most of the genome. Conde does his best to make this prospect sound more attractive:

...these early adopters will also be pioneers in the personal genome revolution and will be amongst the first people in history to be fully sequenced. These participants will be on the cutting edge of science and medicine. They will have access to the latest information as it becomes available and those that are willing to learn as we learn (and can appreciate risk prediction and the changing nature of our scientific understanding) will be best positioned to benefit.

Certainly, the early adopters will experience the warm glow of the pioneer. And it's true that they'll have their genome sequence in hand to take advantage of each new research finding that pops up over the next five years. But by the time we have enough genetic information to make a genome sequence seriously useful - in, say, five to ten years - the cost of sequencing will be down by three orders of magnitude. That's when I'll be buying my sequence!

Of course, Dan Stoicescu and other Knome early adopters didn't decide to purchase their sequences through a cold, logical cost-benefit analysis. Stoicescu explained in a recent NY Times article that he views his purchase as "a kind of sponsorship" - in other words, his over-spending will pave the way for affordable genome sequencing for the rest of us.

In any case, as sequencing costs plummet the real money is going to lie in sequence interpretation - translating six billion DNA letters into useful medical information, and then conveying that complex information to a customer in terms they can understand. Conde's interview suggests that Knome has invested heavily in this process, which should put them in a good position to compete with the inevitable flotilla of genome sequencing companies that pop up over the next five years.


Subscribe to Genetic Future.

Tuesday, March 4, 2008

Knome customer featured in NY Times

An article in the NY Times entitled "Gene Map Becomes a Luxury Item" introduces us to Dan Stoicescu, one of the first two customers to fork out $350,000 to get their genome sequenced by the personal genomics company Knome (pronounced "know-me" - get it?).

The article is well worth a read for anyone interested in the future of personal genomics, but one theme stands out for me:

Biologists have mixed feelings about the emergence of the genome as a luxury item. Some worry that what they have dubbed “genomic elitism” could sour the public on genetic research that has long promised better, individualized health care for all. But others see the boutique genome as something like a $20 million tourist voyage to space — a necessary rite of passage for technology that may soon be within the grasp of the rest of us.

I'm firmly in the second camp. In a previous post about Knome I noted that, "The willingness of wealthy early adopters to pay excessive amounts for untested technology is a big driver of progress" - in other words, Stoicescu and his fellow Knome customer are subsidising the costs of technology development that will eventually make genome sequencing cheaper for you and me.

Many new technologies start off as expensive yuppie toys, and rapidly tumble in price until they become accessible to the rest of the world. Worrying about "genomic elitism" is like someone back in 1981 worrying about "portable computer elitism". If a technology has broad appeal and utility - and in a few year's time, personal genomics will have both those things in spades - the price will come down quickly. You'll be chatting to your next-door neighbour about your kids' DRD4 genotypes before you know it.

Unfortunately for the early adopters, we currently know so little about the function of most genetic variants that their full sequence won't give them much more information than they could get from 23andMe or deCODEme, for 0.3% of the price. It's true that they're likely to find a few severe recessive disease variants, but these will have little or no effect on their own health, and are unlikely to affect their children (unless they're unfortunate enough to mate with someone who also carries a mutation in the same gene). There's also a low probability that they'll find something really nasty like a Huntington's disease mutation. But overall, the expected utility of this information is low - certainly not worth the $350,000 price tag, unless you're wealthy enough to not have to worry about that kind of money.

The true value of a genome sequence - identifying and deciphering the thousands of small changes that influence our risk of both rare and common diseases during our lifetime - won't come until we have complete sequences from hundreds of thousands of people, along with thorough medical information to find associations between variants and diseases (cue the Personal Genome Project). By that time, costs for full genome sequencing will be dramatically lower - hell, even poor scientists like me will be able to afford it!


Like what you just read? Subscribe to Genetic Future.

Saturday, March 1, 2008

Google backs the Personal Genome Project

Bloomberg.com reports that Google has invested an unspecified amount in George Church's Personal Genome Project, which plans to sequence protein-coding regions from the genomes of 100,000 humans and link the sequence data with information on health and other traits (such as facial and body measurements). This is in line with Google's $3.9 million investment in 23andMe; this is a company that clearly sees the impact that analysing genetic variation will have on human health, and wants to position itself at the fore-front of that wave.

The Bloomberg article is well worth a read for various other useful snippets of information. For instance, I liked Church's argument for why the first participants in the PGP are willing to take the risk of freely sharing their genetic information with the rest of the world: "The payoff is an unobstructed view of the next revolution in medicine".

HT: Genome Technology Online

Sunday, February 24, 2008

Knome signs up first two paying clients for whole-genome sequencing

Yesterday's press release from Knome has generated surprisingly little interest, but it's actually a pretty big deal: the company, in collaboration with the Beijing Genomics Institute, will be beginning whole-genome sequencing for its first two paying clients within the next few months. As the release says, these will be "the first individuals in the world to have their genome sequenced by a personal genomics firm".

The two clients have (wisely) chosen to remain anonymous at this stage. In return for around $350,000 in cold hard cash, they'll both be receiving "both sequencing and a comprehensive analysis from a team of leading geneticists, clinicians and bioinformaticians".

As I've noted before, the interpretation of whole-genome sequencing is complicated by the fact that no-one has a clue about the functional effects of most variations in the genome, and I wonder if these first clients will feel that they receive anywhere near enough useful information to warrant that hefty price tag.

It's true over the next few years there will be much better systems developed for predicting functional effects, and these customers' sequences will be ready and waiting to take advantage of this progress (whereas the genotyping data provided by the current crop of personal genomics companies will become increasingly obsolete). However, while this progress in interpretation is being made the cost of whole-genome sequencing will simultaneously be dropping by orders of magnitude. From a pure cost-benefit perspective the two customers would almost certainly be better off simply waiting for a few years, for a time when the cost of sequencing and the value generated by new analytical techniques start to meet half-way.

Of course, their loss is our gain. The willingness of wealthy early adopters to pay excessive amounts for untested technology is a big driver of progress: Knome (and everyone else keenly watching this experiment) will learn a great deal about the process of sequencing and interpreting genome sequences as a result.

And so, anonymous customers, I salute you: your willingness to spend large amounts of money for limited information will help to make my genome sequence cheaper and more useful, three to five years from now!

Tuesday, February 12, 2008

23andMe looks towards a sequencing future

Right now, personal genomics companies like the Me Two (23andMe and deCODEme) and their less well-advertised competitor SeqWright offer to give you your DNA sequence at up to one million positions throughout your genome - less than 0.05% of the total. While this approach is actually surprisingly informative about patterns of common genetic variation throughout the genome, it still provides a limited window into your genome as a whole.

Precisely how limited this window is has become clear from the recent results of large genome-wide association studies for common diseases like lupus or diabetes. While the successes of these studies have been well-publicised - dozens of new genetic variants that can be used to predict future risk of disease - the publicity has glossed over a slightly dirty little secret: the common genetic variation surveyed by chip-based approaches captures a relatively small proportion of the total genetic risk for most common diseases.

Where is the rest of the disease risk hiding? A large proportion of this risk is likely conferred by a large number of rare variants, each of which may be restricted to just a few families, but which add up to a huge amount of total risk. Such variants will be completely invisible to chip-based genotyping methods since they are not "tagged" by any of the common variations detected by the chips. The only realistic way to detect such variants will be through large-scale sequencing - determining the sequence at every position in the genome (or at least a substantial fraction of it).

So how long will it be before sequencing technologies can be brought down to the costs that personal genomics customers are willing to bear, as opposed to the $350,000 genome sequence currently offered by Knome? This is a difficult question to answer, as David Hamilton from VentureBeat explains in a great recent analysis centred around an article in the NY Times. But my best guess: we will see the first sequencing-based forays into the personal genomics (possibly sequencing just a few dozen important genes) within the next twelve months, and I would be very surprised if whole-genome sequencing doesn't reach the broad personal genomics market (i.e. at a cost of less than $5000) well within the next three years. Given the competition in this area, and the money being pumped into development by both governments and private consortia, it's a fair bet that the technology will move fast.

Existing personal genomics companies are also well aware of the need to move fast to stay on top of the shifting technology and keep their grip on the market. In a recent blog entry, 23andMe's DarrenP spells out how cheap sequencing will change personal genomics, and explicitly foretells the entry of 23andMe into the sequencing market:

By some estimates, the cost of sequencing a human genome could be a few thousand dollars by 2014.

23andMe is already riding this wave. A dozen years ago it would have cost about $600,000 to examine the 580,000 points, known as SNPs, that we include in our $999 service. Eventually we’ll be able to give you your complete sequence for that price.

That may be somewhat disappointing for 23andMe's existing customers, who will watch their $1000 genetic data become rapidly obsolete over the next few years - but this is an experience familiar to anyone who buys a new computer or other high-tech device only to watch it succeeded by cheaper, more powerful alternatives within a few months. In addition, I'd guess that 23andMe will offer a sequencing discount to current customers to help hold onto their share of the market.

Of course, the interpretation of large-scale sequencing data will bring its own set of challenges. A common genetic variant on a chip that is associated with, say, an elevated risk of prostate cancer, is comparatively easy to interpret: if you have the variant, you're at higher risk. But what if your gene for androgen receptor turns out to contain a rare mutation in its regulatory region that might alter the expression of the gene? Because the mutation is rare, there's unlikely to be any solid data on its effect on disease risk. Amplify that uncertainty by the hundreds of variants of questionable functional effect that will likely be found in any genome, and the end result for a customer is likely to be confusion rather than enlightenment.

Nonetheless, the rapidly dropping cost of sequencing will revolutionise personal genomics - and as David says, the jostling for position over the next few years will certainly be a heck of a lot of fun to watch.

Thursday, January 24, 2008

Criticism of the 1000 Genomes Project

Note: Post updated 31st January 2008 to correct errors (thanks to commenter Julia).

An article in Nature raises some criticism of the Project's proposed methodology:

Yet some scientists question how accurate the finished genomes will be, given the project's short timeline and low budget. Others say that the project should have included some phenotypic information about the participants — such as medical records or basic data such as height and weight. "It's curious that the disease-association studies don't exploit much sequencing — and the sequencing studies don't use the disease data. It would be helpful to hear a clear explanation of why, after 17 years and billions of dollars, these studies still aren't coordinated," says George Church, who is leading a venture called the Personal Genome Project out of his lab at Harvard University in Cambridge, Massachusetts. Church's project is collecting and releasing genetic and phenotypic data on ten individuals, including himself.
The data accuracy issue is a perfectly valid one, at least for the genomes sequenced at low coverage (180 individuals in the pilot phase, and probably over 1,000 individuals in the final project). These genomes will be sequenced at what is called 2X coverage, which means that each base in the genome will be sequenced on average two times. In practice, that means that in each of these individuals many regions of the genome will be sequenced more than twice, and many regions won't be sequenced at all; and that means that some rare variants will inevitably be missed.

This may be a real problem for the usefulness of the results from this stage of the project. As the Project's organisers discuss in their meeting report (PDF), there will need to be some careful quality control. Fortunately, sections of the genomes of all of the 180 individuals analysed in the pilot phase of the project have also been very well sequenced by the ENCODE project, so there will be an accurate comparison set to assess how well the sequencing methods are performing.

In any case, this won't affect the next stage of the pilot phase, in which much more comprehensive (~20X) coverage will be used to sequence the protein-coding regions of 1,000-2,000 genes. However, it may affect the final full genome sequences of the 1,000+ individuals generated in the final stage of the project, which at this stage are only planned to be sequenced at low coverage. Of course, this plan may change as the project develops.

As for Church's criticism about the failure to include disease samples, this seems like a real non-issue. For instance, the HapMap project didn't use disease samples: its purpose, like this project, was to learn more about the structure of human genetic variation to allow later researchers to study disease better, and it has achieved this goal admirably. Hundreds of studies have already used the HapMap data (directly or indirectly) to find common genetic variants that cause disease; the 1000 Genomes project will provide a catalogue of rare variants that can be used for similar studies in the future.

Subscribe to Genetic Future.

Wednesday, January 23, 2008

1000 Genomes Project launched

Note: Post updated 31st January to correct errors in original version.

A very exciting announcement:

An international research consortium today announced the 1000 Genomes Project, an ambitious effort that will involve sequencing the genomes of at least a thousand people from around the world to create the most detailed and medically useful picture to date of human genetic variation. The project will receive major support from the Wellcome Trust Sanger Institute in Hinxton, England, the Beijing Genomics Institute, Shenzhen (BGI Shenzhen) in China and the National Human Genome Research Institute (NHGRI), part of the National Institutes of Health (NIH).
They won't be sequencing 1,000 complete genomes immediately. Instead, the project will first move through a "pilot phase" in three parts:
  1. Complete genome sequencing for six individuals (two families, each of two parents and a child) at very high coverage;
  2. Complete genome sequencing of 180 people at much lower coverage; and
  3. Finally, the sequencing of the protein-coding portions of 1,000 genes in 1,000 people.
That's still a massive amount of sequencing work:
At 6 trillion DNA bases, the 1000 Genomes Project will generate 60-fold more sequence data over its three-year course than have been deposited into public DNA databases over the past 25 years,” said Gil McVean, Ph.D., of the University of Oxford in England, one of the co-chairs of the consortium’s analysis group. “In fact, when up and running at full speed, this project will generate more sequence in two days than was added to public databases for all of the past year.”
This project will immediately add significantly to our understanding of human genetic variation, by increasing the detection of less common genetic variants only present in a small proportion of people. It will also help to refine sequencing techniques, bringing affordable personal genome sequences closer to reality. And importantly, all of the information generated by the project will be freely available online.

So who are they sequencing? Don't bother volunteering your own DNA - the project will be using the same anonymous DNA samples that are being used for the HapMap project (which looked at millions of common variations throughout their genome, but didn't sequence their DNA). That's great for two reasons: firstly, it will allow us to determine exactly how much genetic variation is captured by the genotyping approach used by the HapMap project; and secondly, we will be getting a picture of gene sequence diversity from populations around the world:
Among the populations whose DNA will be sequenced in the 1000 Genomes Project are: Yoruba in Ibadan, Nigeria; Japanese in Tokyo; Chinese in Beijing; Utah residents with ancestry from northern and western Europe; Luhya in Webuye, Kenya; Maasai in Kinyawa, Kenya; Toscani in Italy; Gujarati Indians in Houston; Chinese in metropolitan Denver; people of Mexican ancestry in Los Angeles; and people of African ancestry in the southwestern United States.
Exactly how the project will proceed after the pilot phase is still unclear (that's the point of the pilot phase!), but in an email to Nature writer Erika Check, NHGRI project director Lisa Brooks says:
The exact final project design will depend on the data from the 3 pilots...but we expect that in the full project about 1000 samples will be lightly (2X) sequenced across the entire genome, and these 1000 samples plus possibly additional samples would be sequenced more in the gene regions.
I really can't emphasise enough how much this project will alter the field of human genetics - like the Human Genome Project itself, and the HapMap project after that, this project will have truly profound implications for research into human variation and disease.


Subscribe to Genetic Future.

The ethical challenges of whole-genome sequencing part 1

A recent perspective article in Nature Reviews Genetics (sorry, subscriber only!) discusses the issues arising from the advent of individual whole-genome sequencing. The authors discuss three major issues facing researchers using this new technology: the return of data to participants, obligations to participants' relatives, and potential future uses of samples and data. Although I don't agree with all of the authors' arguments, it's great to see some informed discussion of these issues in advance of whole-genome sequencing technologies becoming widely available.

I'll start with a bit of a background on whole-genome sequencing technology and discuss the first of these ethical challenges today; I plan to discuss the other two problematic areas in a separate post.

The power of whole-genome sequencing
It's important to remember just how difficult it was for us humans to obtain the first (almost) complete sequences of the DNA that resides within each of our cells - the sequences generated by the public Human Genome Project and by the private company Celera, both published in 2001. It took almost a decade and cost somewhere in the vicinity of $3 billion to obtain the public human genome sequence, a vast amount of money by any standards.

Over the last seven years the price of genome sequencing has plummeted. A private company, Knome, will now sequence your genome for the comparatively paltry sum of $350,000, and the notion that we will see a $1000 genome sequence within the next decade has become a cliché. Over the last year we saw the publication of several brand new genome sequences: those of James Watson (right) and J. Craig Venter, as well as a Chinese volunteer analysed by the Beijing Genomics Institute.

The advantages of whole-genome sequencing (WGS) for research and for medicine are enormous. Your full genome sequence - including both the nuclear DNA you inherited from both your parents, and the mitochondrial DNA you inherited from your mother - contains all of the genetic information that resulted, through a complex process of interaction with your environment, in your adult form. This gives researchers a complete catalogue of the genetic differences between you and the people around you, a far superior data-set to the limited collection of common variants provided by genotyping methods (like those employed by 23AndMe and deCODEme). This is because at least some of the differences between people are due to rare variations, perhaps found only in them and their immediate family, that simply won't show up at all on even the densest genotyping chips but will be revealed by complete sequencing.

How important are those rare variants? It's still difficult to say, but at least for some traits these uncommon genetic quirks probably play a major role. For instance, we know that variation in height is around 80% determined by our genes, but the common variants identified by recent (and quite well-powered) genome-wide scans for height differences explain only about 1% of this variation. It's likely that much of the residual variation is made up of less common variants that each confer only a small proportion of the total effect.

The same is likely to be true for many common diseases, for which genome scans to date have uncovered only a fraction of the total genetic risk. Such rare, small-effect variants could only realistically be identified by sequencing, either of a selected set of "candidate genes", or - more comprehensively - by whole-genome sequencing.

Return of genome data to participants
I suspect that the majority of the three readers of this blog (one of whom is me) would be very interested in getting a free copy of their own genome sequence, should we be fortunate enough to be part of a study in which this was generated. Indeed, the authors of the NRG review drily note that in the age of 23AndMe and hugely popular genetic ancestry sites, "the desire for information and the expectations of research participants for receiving their results are likely to increase."

However, they also note that "in most jurisdictions there are still no definitive research ethics policies regarding the return of research results." In practice, there are a number of logistical and ethical hurdles that need to be overcome before researchers start handing back data to their unprepared research subjects:

Data format. A DVD containing gigabytes of text files of As, Ts, Cs and Gs is unlikely to satisfy most WGS research participants. However, providing fully annotated sequences (with lay descriptions of the meaning of every potential disease allele) would be far beyond the means of most research groups, as well as triggering potential regulatory restrictions and litigation risk.

This is a tricky dilemma, and the advice of the NRG authors is irritatingly vague: they simply suggest that any research project involving WGS "should be conducted under a formal research protocol, and ought to include the development of a data return and counselling policy".

That's pretty unhelpful to anyone actually trying to come up with such a policy. My suggestion: if researchers can't afford to provide the annotation themselves, they should at least return the data in a standardised format that makes it easy for participants to get that annotation from other sources. Over the next year or so we will see a profusion of private companies seeking to decipher our genomes for us, for a price; at the same time, I fully expect that online communities and publicly funded research institutes will set about designing browsers that will let us do the same thing gratis. If the research participant has their data in a standard format recognised by all these systems they can decide for themselves who they trust to peer inside their genes.

Clinical follow-up. Anyone who has their genome sequenced will almost certainly learn that they carry several recessive disease variants - variants which cause no harm to them, since they are each complemented by a normal healthy copy of the gene, but may result in severe deformity and disease in their children if they are unlucky enough to mate with someone who carries nasty versions of the same genes. In addition, each of us will carry any number of common variants which are associated with an increased risk of complex diseases such as coronary artery disease or diabetes. Finally, a few of us will find that we carry variants of the worst sort: things like a Huntington disease mutation, which will result in an incurable slide into dementia and death within a few decades. Either way, it's likely that all of us will need someone to explain what these things mean, and point us towards specialist care if this is needed.

The review notes that the medical community is massively unprepared for this: there is a serious shortage of clinicians with the required training to effectively communicate genetic risks. They recommend, quite reasonably, that governments invest in further training of primary care physicians to this end. Surprisingly, although the authors mention "an expanded role for geneticists and genetic counsellors", there is no discussion of increasing the number of university places for non-physician genetic counsellors - despite the fact that these individuals are likely to take on a substantial part of the burden.

Integration of data into medical records. Obviously genetic information that impacts on a study subject's health is just as important as other sources of information - cholesterol, blood pressure and the like. But equally obviously, if a variant has not been well-validated as a genetic risk factor, it shouldn't be described as such to a research participant, and it shouldn't be included in that person's medical records.

The NRG authors make some good recommendations:

  1. only validated data of known clinical relevance should be included in the health record;
  2. practice guidelines should be outlined for determining what constitutes validated and clinically relevant data; and
  3. there should be a process by which health records are updated with new knowledge about the clinical relevance of specific genes.
The first two points are fairly self-evident, although it would have been great to see some realistic suggestions regarding those practice guidelines. The third recommendation, updateable records, will become more realistic if and when we start seriously moving into an era of centralised, electronic medical records.

That's more than enough for today. Later I'll discuss the other major ethical challenges discussed in the NRG review: obligations to close relatives of study participants, and future uses of samples and data.


Subscribe to Genetic Future.