Thursday, January 31, 2008

Researchers forced to share

Genome-wide association studies (GWAS), in a nutshell, compare patterns of genetic variation between people with a disease ("cases") and healthy people ("controls"), and identify genetic variants that are more common in cases than controls. Such variants are said to be "associated" with the disease, and there's an assumption that they probably play a role in causing it.

Much criticism of GWAS has stemmed from frequent failures to replicate their findings in follow-up studies. This quote from a recent Nature news feature (subscription only) explains one of the reasons why this occurs:
Nilesh Samani, chair of cardiology at the University of Leicester, UK, and one of two lead investigators responsible for coronary heart disease with the WTCCC, explains that even studies with many samples will miss variants with modest effects. Suppose, he says, that there are 10 loci in a genome that each increase the likelihood of a condition by 20%. Statistically, an examination of 2,000 cases and 2,000 controls would pick up at most three of these loci. An independent group with similar sample sizes might also find two or three loci, but they might be different loci, and the plague of false positives would make results inconclusive. “It's only when we pool all of these studies together that we have a realistic chance of picking up all of those loci,” Samani says.
Those are some pretty amazing (and dismaying) statistics, which illustrate just how large GWAS will need to be for researchers to have any hope of confidently identifying genetic variants with small effects. Collecting enough people with a specific disease is often too much for a single group, so international collaborations are becoming the norm in this area. The Nature article describes one initiative to speed up such collaborative efforts: the Database of Genotype and Phenotype (dbGaP), operated by the US National Institutes of Health (NIH).

As of last week, all GWAS funded by the NIH are required to deposit their data in dbGaP. Researchers who deposit the data have exclusive publishing rights for the next nine months, although during this time their data can be downloaded and analysed by other researchers. Once the nine months is up, the data are free for anyone to combine with their own datasets and publish at will. This will give researchers access to datasets far bigger than they could ever generate themselves, and that will boost progress in identifying and characterising disease genes.

I think most of us would agree that this is a good thing. However, some researchers disagree with the NIH forcing researchers to participate in the dbGaP database:
Kári Stefánsson, chief executive of deCODE Genetics, says that researchers are already doing a good job of finding collaborators but he resents what he calls the “Soviet flavour” of the NIH mandate. “I don't want to share my data with anyone because the NIH decides I should,” he says. “I want to do it because I decide to do it.”
Well, maybe. It's true that researchers tend to be very good at creating and fostering collaborative research - we need to be to have any chance of survival in the current funding environment. Nonetheless, science is also a highly competitive environment, and there is a very human urge to hold on to data for as long as we can to lessen the odds of being scooped, or of having our rivals point out errors in our analyses (this urge is discussed in an excellent recent NY Times article). The NIH requirement to submit data to dbGaP ensures that this urge is not allowed to overwhelm the need to share - in other words, that critical information is not left to gather dust in someone's hard drive when it could be useful to the community as a whole. In my opinion this can only be healthy for the field of human genetics.

Wednesday, January 30, 2008

Public tools for interpreting human genetic variation

A while back I began discussing the ethical challenges of whole-genome sequencing (a theme that will be continued soon). One of the ethical challenges I mentioned was the question of which format genome sequence data should be returned to study participants. My suggestion was:
...if researchers can't afford to provide the annotation themselves, they should at least return the data in a standardised format that makes it easy for participants to get that annotation from other sources. Over the next year or so we will see a profusion of private companies seeking to decipher our genomes for us, for a price; at the same time, I fully expect that online communities and publicly funded research institutes will set about designing browsers that will let us do the same thing gratis. If the research participant has their data in a standard format recognised by all these systems they can decide for themselves who they trust to peer inside their genes.
Since posting that, I've been exploring the various public tools and databases that already exist for interpreting genetic information. The basic question is: if someone had their own genetic data sitting on a DVD, what public resources could they use to get information about what that data meant for their health?

The oldest compendium of human genotype-to-phenotype associations I know of is the Online Mendelian Inheritance in Man (OMIM) database, which is maintained by the National Center for Biotechnology Information (NCBI). OMIM is a tremendous repository of gene and disease information, and an invaluable resource for researchers working on rare Mendelian disorders (i.e. diseases in which individuals carrying a particular mutation nearly always contract a specific disease). It's less helpful for genetically complex diseases such as lupus, and it would be extremely hard to query for someone who had a list of genotypes or DNA sequence variants in hand and wanted to know their risk of contracting a given disease - particularly given that OMIM doesn't accept dbSNP ID numbers (codes beginning with "rs"), now the most widely-used format for genetic variants.

A recent upstart that does accept dbSNP ID numbers is SNPedia, a Wikipedia-like community-driven effort to annotate genetic variants with information about their frequency and their effects on human variation and disease. Unlike OMIM, SNPedia doesn't rely on curation of their database by genetic experts - instead, information is added by volunteers from the community. As is the case for Wikipedia, this means that SNPedia loses accuracy in exchange for being cheaper to run and more up-to-date. The information on most SNPs is very limited and the format needs to be more standardised, but it's a promising start.

SNPedia member cariaso notes in the comments that the Promethease software can be used by 23andMe and deCODEme customers to cross-reference their genotype results against the SNPedia database. Some example output (from a "random" genotype sample) gives you an idea of what to expect.

Some time back we heard from Elaine at Genetics & Health about GEN2PHEN, a 12 million Euro (US$17.7 million) EU-funded project that "aims to harness the web to capture and unify genetic information that fundamentally impacts on a person’s health and disease processes," according toGEN2PHEN's press release (PDF). The details are still pretty sketchy, but their press release says:
GEN2PHEN will build a set of database components, tools and technologies that will help all research results pertaining to genome variation and disease to be properly integrated and immediately available for holistic analysis via the internet. The project will deploy a major internet portal, called the “GEN2PHEN Knowledge Centre”, which will prominently profile the solutions generated by the project and set these in the context of powerful search capabilities for genotype-phenotype data and the very latest expertise on genotype-phenotype databases.
This sounds promising, but as always the devil will be in the details. I'll certainly be keeping a close eye on this project.

Finally, today's issue of Nature Genetics includes a letter to the editor (subscriber access only) describing a "navigator for human genome epidemiology" developed by the Human Genome Epidemiology Network (HuGENet). The HuGE Navigator allegedly "provides access to a continuously updated knowledge base in human genome epidemiology, including information on population prevalence of genetic variants, gene-disease associations, gene-gene and gene- environment interactions, and evaluation of genetic tests." Whereas OMIM uses expert curators and SNPedia relies on community participants, the HuGE Navigator employs data-mining alogorithms that search the literature for new information, as well as manual curation.

I know it's still in its early stages, but the database is currently rather irritating to use. The HuGEpedia function seems to contain plenty of information, but it's difficult to access - for instance, a search for "APOE" revealed no hits, even though a search for "Alzheimer" shows this gene as the top candidate gene for this disease. [Added in edit: in the comments, Andro points out the fairly basic error I was making during this search!] Similarly, searching for dbSNP ID numbers doesn't generate anything useful. Although this database seems like a great resource for genetic epidemiologists with specific questions in mind, it would be a difficult tool for anyone armed with a genome sequence or a set of genotypes and looking for a way to interpret it.

The take-home message: although a clever lay-person (with a lot of time on their hands, and preferably with some programming skills) could manage to extract some useful information out of their own raw genetic information, at this stage there's really no public database that can compete with the proprietary browsers of personal genomics companies like 23andMe and deCODEme - Promethease certainly comes the closest, but at this stage it lacks the polish and intuitive interface. We'll have to wait and see if Promethease or one of the other public efforts manages to comprehensively fill this void, or if ultimately consumers need to turn to the private sector to figure out what their genome means.

I'll update this post as new databases arise. Comments on my list, or suggestions regarding other public databases that would be useful for personal annotation of genome sequence or genotype data, would be greatly appreciated.

Personal genomics "positively disruptive"

The latest issue of Nature Genetics has an optimistic editorial on the rise of personal genomics. The editorial makes several excellent points:
Giving individuals their own genotype is not so much premature as truly disruptive. The individual gains a personal stake in the ongoing research effort and a huge incentive to find out more. A personal stake in finding out something that was not previously known is the key to getting students into research and may well be a powerful tool to educate and interest members of the public in the details of their own health and functioning.
Knowledge is power. Over the next few decades our world will be changed forever by advances in genetic technologies, and people who have some understanding of genetics - who are capable of making informed decisions about genetic issues, rather than simply taking the advice of their health providers - will benefit the most.

Early personal genomics companies like the Me Two (23andMe and deCODEme) are important not because of the information they provide about individual genetic risk - at this stage this information is of marginal benefit at best. Rather, these companies are having two important effects:
  1. They are encouraging the community to engage with issues related to genetic information; and
  2. They are training people to think like geneticists - to understand the language of SNPs, alleles and odds ratios.
That training is both direct and indirect. Those people who have the money and inclination to purchase a kit from one of the Me Two get the benefit of inside information on their own genomes; but even those who are not early adopters are benefiting from the informed discussion permeating the blogosphere and the mainstream media. Anyone who takes this opportunity to learn more about genetics will benefit over the next few decades, as we are faced with increasingly complex decisions about the effect of genetics on our own health, the health of our children, and the shape of society.

The Nature Genetics article makes an excellent point about genetic counselling:
The pressure of information also creates a need for genetic counselors, but if uptake and use of individual genomics spreads as fast or widely as it seems likely to do, the counseling curriculum will undergo a rapid shift of emphasis away from rare mendelian diseases to both rare and common genetic determinants of common diseases and will acquire a new set of courses to deal with evaluating environmental risks.
This shift of emphasis is going to have to happen fast - in just a few years the technology to examine millions of common variants throughout the genome will be affordable to most citizens of the industralised world, and at the same time our understanding of the effects of these variants on human health will have exploded, thanks to large association studies. A few years after that whole genome sequencing will be within the grasp of the man on the street, and even more powerful information will have begun trickling out of absolutely enormous studies like the 500,000-person UK Biobank. Genetic counsellors will have to work incredibly hard to adapt to these changes or the public will be left adrift in a sea of worthless information.

I also love the final paragraph of the article:
In the meantime, individual genomics will have informed thousands participating in one of the most exciting areas of biomedical research, and it may recruit participants in prospective studies that they will have funded partially from their own pockets. That being said, they are co-investigators, not patients, and the experiment will be conducted on their own terms!
In other words, the future will be a marketplace of genetic information, with thoroughly informed individuals exchanging their genetic and medical data in return for further information. This may sound rather utopian, but in fact this is exactly what the customers of the Me Two are doing right now: 23andMe and deCODEme provide genetic information to their clients, and in return they get to aggregate that information with other data about health and environment to sell on to pharmaceutical and biotech companies. It's a world-changing business model, and no doubt one we will see increasingly more often.

Added in edit: In a comment on the Nature Genetics article on the 23andMe blog, this model is laid out explicitly:
Once our database is large enough, we plan to ask our customers to provide additional information beyond their genetic data – it could be anything from symptoms of autism to shoe size. That information would be used in research that could discover even more genetic links to traits and diseases.

Monday, January 28, 2008

Ann Turner on the Me Two

In my last post I mentioned the "underwhelmed" response of VentureBeat's David Hamilton to deCODEme's demonstration genome. Genetic genealogist Ann Turner has a thoughtful response to David's piece (as a guest poster at Eye on DNA) that is well worth reading. Here's an excerpt:
I agree that the website does not make for easy pickings — it takes some thought to grasp the principles behind the reports and graphics. The whole notion of relative risk is not something many people have even thought about. But, as Kevin Kelly said in a WIRED article reviewing the Genographic Project and my book Trace Your Roots with DNA, “a basic level of genetic literacy will be essential… ” and learning about our own DNA is a great motivator.
This is a perfectly valid point - but it's going to take a lot of time before genetic literacy is widespread, and in the meantime companies like the Me Two (23AndMe and deCODEme) are going to have to make it as easy as possible for newcomers to the field to make sense of their own genomes. Based on what I've seen so far, 23AndMe is doing a better job of this.

David has now written a second article on the deCODEme system in which he criticises the limited number of genetic variants (SNPs) used for the disease risk predictions generated by the company. This doesn't seem entirely fair - to a large extent deCODEme is simply covering itself against litigation by only including SNPs with high-confidence associations, and as I said in my last post, "there just aren't enough well-validated genetic associations out there".

As for why some high-confidence SNPs are not included in the deCODEme analysis: at least in some cases this is simply because they are not included on the chip that deCODEme uses to analyse customer DNA samples, as Ann Turner (who seems to be everywhere these days) notes in the comments to David's post.

Friday, January 25, 2008

deCODEme "underwhelming"

David Hamilton at Venture Beat writes about his disappointing experience with the deCODEme "reference genome":
For now, though, I have to say that if I’d just plunked down $985 for deCODEme, I’d be royally pissed, both at the waste of money and at the lack of information, flexibility and user-friendly functionality here. I’m kind of astonished that deCODEme thinks this version of its service resembles a finished product in any way, shape or form.
I haven't had time to play around much with this feature on deCODEme yet - it's available here - but what I have seen so far has not exactly been spectacular. Partly this is due to some problems with the interface, which I think will probably be fixed up as deCODEme starts to listen to customer complaints, but the main problem is with the data: there just aren't enough well-validated genetic associations out there.

What we're seeing right now are the very earliest days of a gold-rush of genetic information, and it won't be long before the number of solid associations starts climbing exponentially. Trust me - in five year's time people will be complaining about too much data.

Thursday, January 24, 2008

Criticism of the 1000 Genomes Project

An article in Nature raises some criticism of the Project's proposed methodology:
Yet some scientists question how accurate the finished genomes will be, given the project's short timeline and low budget. Others say that the project should have included some phenotypic information about the participants — such as medical records or basic data such as height and weight. "It's curious that the disease-association studies don't exploit much sequencing — and the sequencing studies don't use the disease data. It would be helpful to hear a clear explanation of why, after 17 years and billions of dollars, these studies still aren't coordinated," says George Church, who is leading a venture called the Personal Genome Project out of his lab at Harvard University in Cambridge, Massachusetts. Church's project is collecting and releasing genetic and phenotypic data on ten individuals, including himself.
The data accuracy issue is a perfectly valid one, at least for the genomes sequenced at low coverage (180 individuals in the pilot phase, and probably over 1,000 individuals in the final project). These genomes will be sequenced at what is called 2X coverage, which means that each base in the genome will be sequenced on average two times. In practice, that means that in each of these individuals many regions of the genome will be sequenced more than twice, and many regions won't be sequenced at all; and that means that some rare variants will inevitably be missed.

This may be a real problem for the usefulness of the results from this stage of the project. As the Project's organisers discuss in their meeting report (PDF), there will need to be some careful quality control. Fortunately, sections of the genomes of all of the 180 individuals analysed in the pilot phase of the project have also been very well sequenced by the ENCODE project, so there will be an accurate comparison set to assess how well the sequencing methods are performing.

In any case, this won't affect the next stage of the pilot phase, in which much more comprehensive (~20X) coverage will be used to sequence the protein-coding regions of 1,000-2,000 genes. However, it may affect the final full genome sequences of the 1,000+ individuals generated in the final stage of the project, which at this stage are only planned to be sequenced at low coverage. Of course, this plan may change as the project develops.

As for Church's criticism about the failure to include disease samples, this seems like a real non-issue. For instance, the HapMap project didn't use disease samples: its purpose, like this project, was to learn more about the structure of human genetic variation to allow later researchers to study disease better, and it has achieved this goal admirably. Hundreds of studies have already used the HapMap data (directly or indirectly) to find common genetic variants that cause disease; the 1000 Genomes project will provide a catalogue of rare variants that can be used for similar studies in the future.

Subscribe to Genetic Future.

Wednesday, January 23, 2008

1000 Genomes Project launched

A very exciting announcement:
An international research consortium today announced the 1000 Genomes Project, an ambitious effort that will involve sequencing the genomes of at least a thousand people from around the world to create the most detailed and medically useful picture to date of human genetic variation. The project will receive major support from the Wellcome Trust Sanger Institute in Hinxton, England, the Beijing Genomics Institute, Shenzhen (BGI Shenzhen) in China and the National Human Genome Research Institute (NHGRI), part of the National Institutes of Health (NIH).
They won't be sequencing 1,000 complete genomes immediately. Instead, the project will first move through a "pilot phase" in three parts:
  1. Complete genome sequencing for six individuals (two families, each of two parents and a child) at very high coverage;
  2. Complete genome sequencing of 180 people at much lower coverage; and
  3. Finally, the sequencing of the protein-coding portions of 1,000 genes in 1,000 people.
That's still a massive amount of sequencing work:
At 6 trillion DNA bases, the 1000 Genomes Project will generate 60-fold more sequence data over its three-year course than have been deposited into public DNA databases over the past 25 years,” said Gil McVean, Ph.D., of the University of Oxford in England, one of the co-chairs of the consortium’s analysis group. “In fact, when up and running at full speed, this project will generate more sequence in two days than was added to public databases for all of the past year.”
This project will immediately add significantly to our understanding of human genetic variation, by increasing the detection of less common genetic variants only present in a small proportion of people. It will also help to refine sequencing techniques, bringing affordable personal genome sequences closer to reality. And importantly, all of the information generated by the project will be freely available online.

So who are they sequencing? Don't bother volunteering your own DNA - the project will be using the same anonymous DNA samples that are being used for the HapMap project (which looked at millions of common variations throughout their genome, but didn't sequence their DNA). That's great for two reasons: firstly, it will allow us to determine exactly how much genetic variation is captured by the genotyping approach used by the HapMap project; and secondly, we will be getting a picture of gene sequence diversity from populations around the world:
Among the populations whose DNA will be sequenced in the 1000 Genomes Project are: Yoruba in Ibadan, Nigeria; Japanese in Tokyo; Chinese in Beijing; Utah residents with ancestry from northern and western Europe; Luhya in Webuye, Kenya; Maasai in Kinyawa, Kenya; Toscani in Italy; Gujarati Indians in Houston; Chinese in metropolitan Denver; people of Mexican ancestry in Los Angeles; and people of African ancestry in the southwestern United States.
Exactly how the project will proceed after the pilot phase is still unclear (that's the point of the pilot phase!), but in an email to Nature writer Erika Check, NHGRI project director Lisa Brooks says:
The exact final project design will depend on the data from the 3 pilots...but we expect that in the full project about 1000 samples will be lightly (2X) sequenced across the entire genome, and these 1000 samples plus possibly additional samples would be sequenced more in the gene regions.
I really can't emphasise enough how much this project will alter the field of human genetics - like the Human Genome Project itself, and the HapMap project after that, this project will have truly profound implications for research into human variation and disease.


Subscribe to Genetic Future.

The ethical challenges of whole-genome sequencing part 1

A recent perspective article in Nature Reviews Genetics (sorry, subscriber only!) discusses the issues arising from the advent of individual whole-genome sequencing. The authors discuss three major issues facing researchers using this new technology: the return of data to participants, obligations to participants' relatives, and potential future uses of samples and data. Although I don't agree with all of the authors' arguments, it's great to see some informed discussion of these issues in advance of whole-genome sequencing technologies becoming widely available.

I'll start with a bit of a background on whole-genome sequencing technology and discuss the first of these ethical challenges today; I plan to discuss the other two problematic areas in a separate post.

The power of whole-genome sequencing
It's important to remember just how difficult it was for us humans to obtain the first (almost) complete sequences of the DNA that resides within each of our cells - the sequences generated by the public Human Genome Project and by the private company Celera, both published in 2001. It took almost a decade and cost somewhere in the vicinity of $3 billion to obtain the public human genome sequence, a vast amount of money by any standards.

Over the last seven years the price of genome sequencing has plummeted. A private company, Knome, will now sequence your genome for the comparatively paltry sum of $350,000, and the notion that we will see a $1000 genome sequence within the next decade has become a cliché. Over the last year we saw the publication of several brand new genome sequences: those of James Watson (right) and J. Craig Venter, as well as a Chinese volunteer analysed by the Beijing Genomics Institute.

The advantages of whole-genome sequencing (WGS) for research and for medicine are enormous. Your full genome sequence - including both the nuclear DNA you inherited from both your parents, and the mitochondrial DNA you inherited from your mother - contains all of the genetic information that resulted, through a complex process of interaction with your environment, in your adult form. This gives researchers a complete catalogue of the genetic differences between you and the people around you, a far superior data-set to the limited collection of common variants provided by genotyping methods (like those employed by 23AndMe and deCODEme). This is because at least some of the differences between people are due to rare variations, perhaps found only in them and their immediate family, that simply won't show up at all on even the densest genotyping chips but will be revealed by complete sequencing.

How important are those rare variants? It's still difficult to say, but at least for some traits these uncommon genetic quirks probably play a major role. For instance, we know that variation in height is around 80% determined by our genes, but the common variants identified by recent (and quite well-powered) genome-wide scans for height differences explain only about 1% of this variation. It's likely that much of the residual variation is made up of less common variants that each confer only a small proportion of the total effect.

The same is likely to be true for many common diseases, for which genome scans to date have uncovered only a fraction of the total genetic risk. Such rare, small-effect variants could only realistically be identified by sequencing, either of a selected set of "candidate genes", or - more comprehensively - by whole-genome sequencing.

Return of genome data to participants
I suspect that the majority of the three readers of this blog (one of whom is me) would be very interested in getting a free copy of their own genome sequence, should we be fortunate enough to be part of a study in which this was generated. Indeed, the authors of the NRG review drily note that in the age of 23AndMe and hugely popular genetic ancestry sites, "the desire for information and the expectations of research participants for receiving their results are likely to increase."

However, they also note that "in most jurisdictions there are still no definitive research ethics policies regarding the return of research results." In practice, there are a number of logistical and ethical hurdles that need to be overcome before researchers start handing back data to their unprepared research subjects:

Data format. A DVD containing gigabytes of text files of As, Ts, Cs and Gs is unlikely to satisfy most WGS research participants. However, providing fully annotated sequences (with lay descriptions of the meaning of every potential disease allele) would be far beyond the means of most research groups, as well as triggering potential regulatory restrictions and litigation risk.

This is a tricky dilemma, and the advice of the NRG authors is irritatingly vague: they simply suggest that any research project involving WGS "should be conducted under a formal research protocol, and ought to include the development of a data return and counselling policy".

That's pretty unhelpful to anyone actually trying to come up with such a policy. My suggestion: if researchers can't afford to provide the annotation themselves, they should at least return the data in a standardised format that makes it easy for participants to get that annotation from other sources. Over the next year or so we will see a profusion of private companies seeking to decipher our genomes for us, for a price; at the same time, I fully expect that online communities and publicly funded research institutes will set about designing browsers that will let us do the same thing gratis. If the research participant has their data in a standard format recognised by all these systems they can decide for themselves who they trust to peer inside their genes.

Clinical follow-up. Anyone who has their genome sequenced will almost certainly learn that they carry several recessive disease variants - variants which cause no harm to them, since they are each complemented by a normal healthy copy of the gene, but may result in severe deformity and disease in their children if they are unlucky enough to mate with someone who carries nasty versions of the same genes. In addition, each of us will carry any number of common variants which are associated with an increased risk of complex diseases such as coronary artery disease or diabetes. Finally, a few of us will find that we carry variants of the worst sort: things like a Huntington disease mutation, which will result in an incurable slide into dementia and death within a few decades. Either way, it's likely that all of us will need someone to explain what these things mean, and point us towards specialist care if this is needed.

The review notes that the medical community is massively unprepared for this: there is a serious shortage of clinicians with the required training to effectively communicate genetic risks. They recommend, quite reasonably, that governments invest in further training of primary care physicians to this end. Surprisingly, although the authors mention "an expanded role for geneticists and genetic counsellors", there is no discussion of increasing the number of university places for non-physician genetic counsellors - despite the fact that these individuals are likely to take on a substantial part of the burden.

Integration of data into medical records. Obviously genetic information that impacts on a study subject's health is just as important as other sources of information - cholesterol, blood pressure and the like. But equally obviously, if a variant has not been well-validated as a genetic risk factor, it shouldn't be described as such to a research participant, and it shouldn't be included in that person's medical records.

The NRG authors make some good recommendations:
  1. only validated data of known clinical relevance should be included in the health record;
  2. practice guidelines should be outlined for determining what constitutes validated and clinically relevant data; and
  3. there should be a process by which health records are updated with new knowledge about the clinical relevance of specific genes.
The first two points are fairly self-evident, although it would have been great to see some realistic suggestions regarding those practice guidelines. The third recommendation, updateable records, will become more realistic if and when we start seriously moving into an era of centralised, electronic medical records.

That's more than enough for today. Later I'll discuss the other major ethical challenges discussed in the NRG review: obligations to close relatives of study participants, and future uses of samples and data.


Subscribe to Genetic Future.

Tuesday, January 22, 2008

UK controversy over 23AndMe

A report in The Guardian gives a pretty negative reaction to 23AndMe's European launch today:
The service is likely to provoke controversy in the UK, where authorities have warned that genetic tests are often meaningless yet can provoke needless anxiety among those who take them. Last month the Human Genetics Commission condemned them as a dangerous waste of money and called for regulations to control their marketing.

[...]

In the UK, however, the Human Genetics Commission's report on direct-to-consumer tests warned that neither exact nor complete knowledge of what differences in the chromosome pairs mean exist yet. "Our advice to the public is that with many of the tests currently on the market people are wasting their money," said Dr Christine Patch, co-author of the report. "At the moment the science is simply not strong enough. The tests could be positively harmful if the results caused unnecessary anxiety or gave false reassurance."

Dr Helen Wallace, director of the pressure group Gene Watch, is equally concerned. "Our main concern is that the human genome is set to become a massive marketing scam," she said, adding that special diet foods and pills had been promoted on the back of tests. "Genetic tests like these are not regulated and the science is still poorly understood - so there is a real danger people could be misled about their health."
Ouch. There's certainly some valid criticisms in there, but the alarmist tone is a little over the top. I don't think anyone would argue with the statement that "neither exact nor complete knowledge of what differences in the chromosome pairs mean exist yet" - in fact, it's likely that it will be decades before we can even come close to fully understanding the effects of human genetic variation. That doesn't mean it's not interesting to look at what we do know.

I liked this statement from 23AndMe's founders:
Wojcicki and Avey argue that those who want to know about their genetic make-up should be treated as adults and given the data, together with careful explanations of what it means.
I couldn't agree more, and I hope that 23AndMe lives up to this ideal (so far, what I've seen from the company has been largely reasonable, but I'd welcome examples to the contrary).

Appropriate regulatory frameworks will help, but one factor that I think will eventually rein in the scammers is the raw power of the internet. If providers of direct-to-consumer genetic testing fail to provide clear and accurate information about the predictive power of their testing, they should (and will) be called on this by independent bodies and external reviewers (such as genetics bloggers!). With a few Google searches consumers should be able to get a fair idea of how much they can trust a testing company - and eventually, most companies will find it is in their own best interests to provide accurate information up-front.

Monday, January 21, 2008

Australian state government outsources forensic DNA testing

The Sydney Morning Herald reports on a DNA-related story from my neck of the woods:
CRIMINAL DNA testing will be outsourced to a private company in an attempt to clear a backlog of thousands of police samples at NSW government laboratories and keep up with rapidly growing demand.
The company, Genetic Technologies, will be paid up to AU$5 million (US$4.4 million) to use its automated genotyping facilities - endearingly referred to in the article as "CSI-style robotic technology"! - to help break the 12-month back-log of samples building up in police facilities.

The out-sourcing is part of a AU$22 million four-year plan to boost forensic DNA analysis capacity in the state. It sounds as though they've left it a little late:
A NSW Ombudsman's report, from October 2006, warned that DNA analysis was not meeting its potential. The number of samples sent for testing rose from 1046 in 2000 to 9113 in 2004, causing a backlog of more than 7000 cases.

[...]

The report, quietly released last January, also found at least 13 cases in which identities had been muddled. In one, a man was jailed for break and enter but was adamant that he had not committed the offence. He provided another sample and was released.

An over-stressed system will always make mistakes. Of course, it's far from clear that exporting the problem into the private domain will prevent these mistakes from occurring in the future. Instead, there's a danger that it will simply reduce transparency and accountability, and increase the risk of privacy issues.

Ultimately, the success of this strategy will depend on the NSW government successfully instituting appropriate procedures to minimise these dangers - and given the track record of the NSW government in other areas this seems like a rather forlorn hope.

Friday, January 18, 2008

New large-scale study searches for depression genes

A new article in the European Journal of Human Genetics reports the early stages of a new large-scale project searching for the genetic factors influencing risk of major depressive disorder (MDD). The study will look at markers throughout the genomes of 1862 people who have suffered from major depression and 1857 controls and identify genetic variants that are more common in patients than in unaffected individuals.

Depression is a major health burden in Western countries, and has a fairly hefty genetic component. Identifying the genes that influence our risk of developing this disease may help patients and clinicians to recognise and treat depression early, before it creates havoc in sufferers' lives. In addition, genes that predispose to depression will be seen as tempting new drug targets by pharmaceutical companies looking to create the next Prozac. I'll be keeping a close eye on the progress of this study.

Thursday, January 10, 2008

Personal genomics: is your doctor ready?

A perspective piece in the latest New England Journal of Medicine offers a review of the new field of personal genomics from a clinician's point of view.
It may happen soon. A patient, perhaps one you have known for years, who is overweight and does not exercise regularly, shows up in your office with an analysis of his whole genome at multiple single-nucleotide polymorphisms (SNPs). His children, who were concerned about his health, spent $1,000 to give him the analysis as a holiday gift. The test report states that his genomic profile is consistent with an increased risk of both heart disease and diabetes, and because the company that performed the analysis stated that the test was "not a clinical service to be used as the basis for making medical decisions," he is in the office for some "medical direction." What should you do?
The authors highlight a major emerging problem: clients taking advantage of the new commercial genome-wide genotyping services such as 23AndMe and deCODEme (hereafter called the "Me Two") are going to want to use that information to make health and life-style decisions, regardless of the disclaimers that companies offer to discourage this. But how do they make sense of the reams of data pouring out of their own genomes? Are clinicians in any way prepared to guide their patients through these decisions?

In general, the medical profession (perhaps with some exceptions) is far from prepared for the challenges that personal genomics will bring. What suggestions can the authors offer to clinicians faced with the scenario above?
For the patient who appears with a genome map and printouts of risk estimates in hand, a general statement about the poor sensitivity and positive predictive value of such results is appropriate, but a detailed consumer report may be beyond most physicians' skill sets. For the patient asking whether these services provide information that is useful for disease avoidance, the prudent answer is "Not now — ask again in a few years." More information is needed on the clinical utility of this information in the light of existing disease-specific opportunities for prevention or early detection and the potential value that genomic profiles can add to that of simpler tools, such as the family health history. Finally, given the risk of commercial exploitation, if patients are determined to proceed, perhaps because they are simply curious, are genetic hobbyists, or are "early adopters" of new technology, it would make sense to encourage them to enroll in formal scientific studies. [my emphasis]
The first two bolded statements are good advice - the usefulness of the information from the first wave of personal genomics companies is still pretty marginal, but it's clear that it won't be long before that changes.

I'm curious about the third suggestion, though - what scientific studies do they have in mind? I don't know of many that would provide their clients with full access to their own genome-wide data-set (as deCODEme does) or that would be willing to provide the type of user-friendly interface that both companies have put together. The clinical value of the Me Two is still on the dubious side, but at least their clients can get some useful information in return for their time and money.

Thursday, January 3, 2008

Breast cancer genetics: why don't we know more?


A recent article by Michael Stratton and Nazneen Rahman in the journal Nature Genetics reviews "the emerging landscape of breast cancer susceptibility."

Breast cancer has a public image as a terrifying disease, suddenly striking down previously healthy women in their prime. Although the risk of death from breast cancer is in fact generally exaggerated in the public mind (a woman's risk of contracting breast cancer before the age of 60 is just 3%; in terms of life-time risk, women are far more likely to die from heart disease), the disease nonetheless has horrific effects on sufferers and their families.

It has long been known that the risk of breast cancer is influenced to some degree by genetic factors: first-degree female relatives of women who have contracted breast cancer have twice the risk of contracting the disease compared to the general population. In the mid-1990s, researchers discovered that women who carried mutations in either of two genes - BRCA1 and BRCA2 (BReast CAncer 1 and 2) - were subject to a massively increased risk of breast cancer, about 10 to 20 times higher than the risk for non-carriers. In real terms, that means that women who carry one of these mutations are at a 30-60% risk of developing breast cancer before the age of 60, compared to 3% in the general population. However, these mutations are rare, being carried by only one in 1,000 individuals in the UK.

Since the discovery of BRCA1 and BRCA2, four other genes have been found in which mutations confer high risk of breast cancer (and other cancers), but again these mutations are rare. The low frequency of these mutations in the population mean that although they have high predictive value for the few individuals who carry them, they are not particularly informative for the population as a whole: taken together, these six genes account for less than 20% of the total familial risk of breast cancer.

So where does the remaining 80% of familial risk come from?

It turns out that this risk comes from a whole range of other genetic variations, most of which are yet to be identified. Each genetic variation differs in terms of its frequency within the population and its penetrance (that is, the proportion of individuals carrying the mutation who actually go on to develop breast cancer), but the mutations that increase the risk of breast cancer can be classified into three rough and overlapping categories: rare variants with high penetrance (like BRCA1 and BRCA2); rare variants with moderate penetrance; and common variants with low penetrance.

Finding the second and third class of variants is hard, as I'll explain in a series of upcoming posts. The new techniques developed to find these types of variants are now becoming standard approaches in modern human genetics, and with good reason: they are our only hope of identifying the genetic risk factors that underlie most of our risks of common diseases, including not only breast cancer but also heart disease, diabetes, arthritis, and a host of other ailments.


Subscribe to Genetic Future.

Wednesday, January 2, 2008

The purpose of this blog

Over the last two decades there have been countless stories in the mainstream news about human genetics. We are frequently told that the "gene for" something has been discovered - the gene for speech, the gene for schizophrenia, the gene for diabetes, even a gene for left-handedness. However, the mainstream media notion of a single "gene for" any complex human trait is misguided, since essentially all human traits are influenced by multiple genetic and environmental factors. The media rarely report accurately on this complexity; nor do they report on follow-up studies, which frequently fail to replicate the findings of early genetic studies.

If the media can't be trusted to provide reliable information on human genetics, who can? The academic literature is the ultimate source, but it is often difficult to access and always difficult to read - most research papers are written in jargon that makes it easier for scientists to communicate, but is inaccessible for laypeople. Genetic testing companies provide online information that sounds authoritative, but can be skewed to fit a corporate message. Increased public interest in genetics has created a niche for web-sites covering the area written by non-experts. It's easy to feel overwhelmed by the masses of conflicting information.

Yet it is now more important than ever to be informed about advances in human genetics. Recent studies involving thousands of participants have rigorously identified dozens of different genetic factors influencing the risk of common diseases, as well as the genes underlying common variable traits such as eye and hair colour. Many of these genetic influences have since been independently verified by other groups. As large research groups recruit more and more participants for genetic studies, their power to identify real genetic influences grows - and this has very real consequences for the early detection and prevention of diseases, and for family planning.

At the same time, our ability to obtain genetic information about ourselves is advancing with amazing speed. Ten years ago, commercial genetic tests were available for only a handful of rare diseases. Now, companies offer customers information on hundreds of thousands of different genetic variations scattered throughout their genome.

So, what will the new genetic technologies say about us? Interpreted correctly, these technologies can sometimes tell us things we already know (our eye and hair colour) and sometimes things we don't (our true paternity). They can tell us about our past, by mining our genome for the traces of our ancestors; and most importantly, they can tell us about our future, by calculating the impact of many small genetic influences on our risk of both disease. Over the next decade, these technologies will have a profound effect on public healthcare, our legal system, and our society, for better or for worse.

The purpose of this blog is to cut through the hype associated with new discoveries in human genetics, and present you with the facts. Instead of exaggerated stories from the mainstream news or biased advertising from genetics companies, I'll explain to you - in plain language - what these findings tell us about ourselves, and the impact they will have on the lives of our children.

Daniel.


Subscribe to Genetic Future.