Thursday, January 31, 2008

Researchers forced to share

Genome-wide association studies (GWAS), in a nutshell, compare patterns of genetic variation between people with a disease ("cases") and healthy people ("controls"), and identify genetic variants that are more common in cases than controls. Such variants are said to be "associated" with the disease, and there's an assumption that they probably play a role in causing it.

Much criticism of GWAS has stemmed from frequent failures to replicate their findings in follow-up studies. This quote from a recent Nature news feature (subscription only) explains one of the reasons why this occurs:
Nilesh Samani, chair of cardiology at the University of Leicester, UK, and one of two lead investigators responsible for coronary heart disease with the WTCCC, explains that even studies with many samples will miss variants with modest effects. Suppose, he says, that there are 10 loci in a genome that each increase the likelihood of a condition by 20%. Statistically, an examination of 2,000 cases and 2,000 controls would pick up at most three of these loci. An independent group with similar sample sizes might also find two or three loci, but they might be different loci, and the plague of false positives would make results inconclusive. “It's only when we pool all of these studies together that we have a realistic chance of picking up all of those loci,” Samani says.
Those are some pretty amazing (and dismaying) statistics, which illustrate just how large GWAS will need to be for researchers to have any hope of confidently identifying genetic variants with small effects. Collecting enough people with a specific disease is often too much for a single group, so international collaborations are becoming the norm in this area. The Nature article describes one initiative to speed up such collaborative efforts: the Database of Genotype and Phenotype (dbGaP), operated by the US National Institutes of Health (NIH).

As of last week, all GWAS funded by the NIH are required to deposit their data in dbGaP. Researchers who deposit the data have exclusive publishing rights for the next nine months, although during this time their data can be downloaded and analysed by other researchers. Once the nine months is up, the data are free for anyone to combine with their own datasets and publish at will. This will give researchers access to datasets far bigger than they could ever generate themselves, and that will boost progress in identifying and characterising disease genes.

I think most of us would agree that this is a good thing. However, some researchers disagree with the NIH forcing researchers to participate in the dbGaP database:
Kári Stefánsson, chief executive of deCODE Genetics, says that researchers are already doing a good job of finding collaborators but he resents what he calls the “Soviet flavour” of the NIH mandate. “I don't want to share my data with anyone because the NIH decides I should,” he says. “I want to do it because I decide to do it.”
Well, maybe. It's true that researchers tend to be very good at creating and fostering collaborative research - we need to be to have any chance of survival in the current funding environment. Nonetheless, science is also a highly competitive environment, and there is a very human urge to hold on to data for as long as we can to lessen the odds of being scooped, or of having our rivals point out errors in our analyses (this urge is discussed in an excellent recent NY Times article). The NIH requirement to submit data to dbGaP ensures that this urge is not allowed to overwhelm the need to share - in other words, that critical information is not left to gather dust in someone's hard drive when it could be useful to the community as a whole. In my opinion this can only be healthy for the field of human genetics.

1 comments:

Steve Murphy MD said...

Why care? Google will own it all soon enough.
-Steve
www.thegenesherpa.blogspot.com