Yet some scientists question how accurate the finished genomes will be, given the project's short timeline and low budget. Others say that the project should have included some phenotypic information about the participants — such as medical records or basic data such as height and weight. "It's curious that the disease-association studies don't exploit much sequencing — and the sequencing studies don't use the disease data. It would be helpful to hear a clear explanation of why, after 17 years and billions of dollars, these studies still aren't coordinated," says George Church, who is leading a venture called the Personal Genome Project out of his lab at Harvard University in Cambridge, Massachusetts. Church's project is collecting and releasing genetic and phenotypic data on ten individuals, including himself.The data accuracy issue is a perfectly valid one, at least for the genomes sequenced at low coverage (180 individuals in the pilot phase, and probably over 1,000 individuals in the final project). These genomes will be sequenced at what is called 2X coverage, which means that each base in the genome will be sequenced on average two times. In practice, that means that in each of these individuals many regions of the genome will be sequenced more than twice, and many regions won't be sequenced at all; and that means that some rare variants will inevitably be missed.
This may be a real problem for the usefulness of the results from this stage of the project. As the Project's organisers discuss in their meeting report (PDF), there will need to be some careful quality control. Fortunately, sections of the genomes of all of the 180 individuals analysed in the pilot phase of the project have also been very well sequenced by the ENCODE project, so there will be an accurate comparison set to assess how well the sequencing methods are performing.
In any case, this won't affect the next stage of the pilot phase, in which much more comprehensive (~20X) coverage will be used to sequence the protein-coding regions of 1,000-2,000 genes. However, it may affect the final full genome sequences of the 1,000+ individuals generated in the final stage of the project, which at this stage are only planned to be sequenced at low coverage. Of course, this plan may change as the project develops.
As for Church's criticism about the failure to include disease samples, this seems like a real non-issue. For instance, the HapMap project didn't use disease samples: its purpose, like this project, was to learn more about the structure of human genetic variation to allow later researchers to study disease better, and it has achieved this goal admirably. Hundreds of studies have already used the HapMap data (directly or indirectly) to find common genetic variants that cause disease; the 1000 Genomes project will provide a catalogue of rare variants that can be used for similar studies in the future.
0 comments:
Post a Comment