In my recent post "Researchers forced to share", I noted that NIH-funded researchers will now be forced to immediately submit their genetic association data to an online database, dbGaP, with other researchers having free access and the right to publish new analyses of the data following a nine-month grace period. In the comments, Steve Murphy pessimistically replies:
Why care? Google will own it all soon enough.I know Steve has a big axe to grind in this arena (and is happy to grind it out loud), but is there a grain of truth here? Will Google/23andMe manage to capture and control a substantial proportion of the genomic data generated over the next decade or so?
Unless the world of science is miraculously transformed within the next few years, the answer is no.
Don't get me wrong: if 23andMe succeeds in attracting customers (as I suspect it will), then they will quickly set about combining genetic data from customers with self-reported information about diseases and other physical traits - "anything from symptoms of autism to shoe size". This will eventually give 23andMe (and, potentially, Google) access to a moderately to extremely large data-set with which to find new associations between genetic markers and a range of traits.( How large, of course, will depend on exactly how successful 23andMe is at attracting customers.)
But there will be some major caveats with these data. Most importantly:
- the 23andMe data is likely to consist almost exclusively of upper-middle-class, and probably mainly white, suburbanites (unless they repeat their Davos free-kit splurge in Buffalo County, South Dakota, which seems unlikely!), and
- as far as I can tell their phenotype data will be entirely self-reported.
Given these limitations, although I think we will see some novel and interesting associations emerge from 23andMe over the next five years or so, I don't expect to see huge breakthroughs in the genetics of human health. In other words, we might see a new genetic variant associated with left-handedness or a fondness for broccoli - but it's unlikely that 23andMe will be able to find genes linked to type 2 diabetes that haven't already been scooped up by previous genome-wide scans.
Massive surveys like the UK BioBank, on the other hand, have vastly larger sample sizes, cover a much broader range of ethnicities and socioeconomic groups, and have access to direct clinical measurements. The major funding bodies responsible for funding these massive projects, such as the UK's Wellcome Trust or the NIH, are increasingly committing the research groups they fund to policies of free data release. This means that the identity and nature of the genetic risk variants identified in these studies will be freely available to anyone with an internet connection.
At least for the foreseeable future, 23andMe will be relying heavily on the results of these external studies rather than its own in-house data to provide information to its customers. And so long as these research groups make their data freely available, communities like SNPedia will be building databases and free tools into which people can input their own genetic data (generated by private companies or by researchers) in order to learn about their genomes. Google won't "own" this information, and neither will anyone else.
5 comments:
Daniel,
It's not so much that they will have the SNPs as it it they will have the SNPs, GMAIL accounts, Search records, applications, mobile accounts, Blogs (including mine), etc. etc. etc.... HOw would we feel if we replaced the name Google with "The United States Government"? Piensalo
My axe is not with google, nor 23andME...it is with the educators, physicians and scientists who did not prepare the public for these technologies and instead left the wolf to teach the public. This creates a false sense of trust. Much like the pharma industry. Some of my contemporaries think MDs are up to the task of learning this subject material, including the president of the ACMG. I have seen community docs and worked with them day after day. They just don't have the time or interest. I love your blog and look forward to meeting you some day. I hope it helps educate a few of us along the way.
-Steve
www.thegenesherpa.blogspot.com
And I didn't see my comment as pessimistic. We need these huge datasets. We need them accesible in a FREE manner. Maybe google will let that happen, or maybe wikipedia will, or maybe microsoft...oh wait, they haven't entered this fray yet....or have they?
-Steve
I think you raise some excellent points. One that you didn't address is the extent to which companies like 23andMe will be licensing out access to their genotype/self-reported phenotype data to pharma companies and the like, who may find it much easier to deal with a like-minded corporate entity to run, say, drug-response studies than with the much larger datasets that are available but usually in public or non-profit hands.
I'm not saying that's right or wrong, of course, although I wonder if 23andMe customers have fully thought through what they're getting into. I had more to say on the subject here.
Hi Steve,
I got the impression you had an axe to grind with 23andMe when you explicitly drew an analogy between them and the Tuskegee syphilis experiments!
I share your concern about the ability of physicians to deal with emerging genetic technologies (Gerd Gigerenzer does a great job of showing how doctors can mess up even basic risk calculations in his wonderful book Calculated Risks). However, I'm not as harsh as you about the failure of the medical community to prepare for the coming of personal genomics. The field has simply moved so fast that it is hard enough for researchers working in the area to keep up with it - and certainly next to impossible for educators working within their bureaucratic constraints. The growing personal genomics market will create a strong financial incentive for medicos and counsellors to bring themselves up to date as quickly as possible - and I think we'll soon see a host of Helix Health competitors mushroom up to meet the challenge.
Finally, I think it's clear that we both want genetic health datasets to be freely available wherever possible, and as I said in my post I think the large public consortia will achieve this. Private genomics companies like 23andMe will certainly develop proprietary databases of their own, and we can't expect them to throw away their competitive advantage by releasing their data. However, private efforts will also spur more rapid progress by public consortia - just like Celera provided the incentive required to speed up the Human Genome Project.
Daniel,
Hyperbole is a useful tool. Clearly the Tuskegee investigators had less ethics than google.
-Steve
Post a Comment