Wednesday, January 30, 2008

Public tools for interpreting human genetic variation

A while back I began discussing the ethical challenges of whole-genome sequencing (a theme that will be continued soon). One of the ethical challenges I mentioned was the question of which format genome sequence data should be returned to study participants. My suggestion was:
...if researchers can't afford to provide the annotation themselves, they should at least return the data in a standardised format that makes it easy for participants to get that annotation from other sources. Over the next year or so we will see a profusion of private companies seeking to decipher our genomes for us, for a price; at the same time, I fully expect that online communities and publicly funded research institutes will set about designing browsers that will let us do the same thing gratis. If the research participant has their data in a standard format recognised by all these systems they can decide for themselves who they trust to peer inside their genes.
Since posting that, I've been exploring the various public tools and databases that already exist for interpreting genetic information. The basic question is: if someone had their own genetic data sitting on a DVD, what public resources could they use to get information about what that data meant for their health?

The oldest compendium of human genotype-to-phenotype associations I know of is the Online Mendelian Inheritance in Man (OMIM) database, which is maintained by the National Center for Biotechnology Information (NCBI). OMIM is a tremendous repository of gene and disease information, and an invaluable resource for researchers working on rare Mendelian disorders (i.e. diseases in which individuals carrying a particular mutation nearly always contract a specific disease). It's less helpful for genetically complex diseases such as lupus, and it would be extremely hard to query for someone who had a list of genotypes or DNA sequence variants in hand and wanted to know their risk of contracting a given disease - particularly given that OMIM doesn't accept dbSNP ID numbers (codes beginning with "rs"), now the most widely-used format for genetic variants.

A recent upstart that does accept dbSNP ID numbers is SNPedia, a Wikipedia-like community-driven effort to annotate genetic variants with information about their frequency and their effects on human variation and disease. Unlike OMIM, SNPedia doesn't rely on curation of their database by genetic experts - instead, information is added by volunteers from the community. As is the case for Wikipedia, this means that SNPedia loses accuracy in exchange for being cheaper to run and more up-to-date. The information on most SNPs is very limited and the format needs to be more standardised, but it's a promising start.

SNPedia member cariaso notes in the comments that the Promethease software can be used by 23andMe and deCODEme customers to cross-reference their genotype results against the SNPedia database. Some example output (from a "random" genotype sample) gives you an idea of what to expect.

Some time back we heard from Elaine at Genetics & Health about GEN2PHEN, a 12 million Euro (US$17.7 million) EU-funded project that "aims to harness the web to capture and unify genetic information that fundamentally impacts on a person’s health and disease processes," according toGEN2PHEN's press release (PDF). The details are still pretty sketchy, but their press release says:
GEN2PHEN will build a set of database components, tools and technologies that will help all research results pertaining to genome variation and disease to be properly integrated and immediately available for holistic analysis via the internet. The project will deploy a major internet portal, called the “GEN2PHEN Knowledge Centre”, which will prominently profile the solutions generated by the project and set these in the context of powerful search capabilities for genotype-phenotype data and the very latest expertise on genotype-phenotype databases.
This sounds promising, but as always the devil will be in the details. I'll certainly be keeping a close eye on this project.

Finally, today's issue of Nature Genetics includes a letter to the editor (subscriber access only) describing a "navigator for human genome epidemiology" developed by the Human Genome Epidemiology Network (HuGENet). The HuGE Navigator allegedly "provides access to a continuously updated knowledge base in human genome epidemiology, including information on population prevalence of genetic variants, gene-disease associations, gene-gene and gene- environment interactions, and evaluation of genetic tests." Whereas OMIM uses expert curators and SNPedia relies on community participants, the HuGE Navigator employs data-mining alogorithms that search the literature for new information, as well as manual curation.

I know it's still in its early stages, but the database is currently rather irritating to use. The HuGEpedia function seems to contain plenty of information, but it's difficult to access - for instance, a search for "APOE" revealed no hits, even though a search for "Alzheimer" shows this gene as the top candidate gene for this disease. [Added in edit: in the comments, Andro points out the fairly basic error I was making during this search!] Similarly, searching for dbSNP ID numbers doesn't generate anything useful. Although this database seems like a great resource for genetic epidemiologists with specific questions in mind, it would be a difficult tool for anyone armed with a genome sequence or a set of genotypes and looking for a way to interpret it.

The take-home message: although a clever lay-person (with a lot of time on their hands, and preferably with some programming skills) could manage to extract some useful information out of their own raw genetic information, at this stage there's really no public database that can compete with the proprietary browsers of personal genomics companies like 23andMe and deCODEme - Promethease certainly comes the closest, but at this stage it lacks the polish and intuitive interface. We'll have to wait and see if Promethease or one of the other public efforts manages to comprehensively fill this void, or if ultimately consumers need to turn to the private sector to figure out what their genome means.

I'll update this post as new databases arise. Comments on my list, or suggestions regarding other public databases that would be useful for personal annotation of genome sequence or genotype data, would be greatly appreciated.

4 comments:

cariaso said...

Promethease reads 23andMe and deCODEme export formats, and cross references it against SNPedia.

Daniel said...

Thanks cariaso - I've added Promethease to the post.

Andro said...

To look for genes on HuGEpedia, you have to click the radio button for "Gene". Otherwise it looks for APOE in the list of diseases, which returns no hits.

If you do that, you get exactly one hit for APOE, and it leads you to this page.

Not terribly intuitive, unfortunately.

Daniel said...

andro,

Thanks for pointing that out - looks like people can get useful data out of the Navigator, as long as they're more observant than I am!