The human reference genome, even 10 years after its official completion, still contains gaps that are not easy to close. A report, published online this week in Nature Methods, describes an approach to discover some of these missing pieces and to better reflect the genetic diversity of humans.
An ever increasing number of human genomes are being sequenced by high throughput technologies that fragment the genome into small pieces. To assemble them into chromosomes the human reference genome, a mosaic built from the genomes of several individuals, is essential as the blueprint. This reference correctly depicts more than 90% of the human genome, yet some regions are still not well represented, and any medically relevant genetic information is not accessible. To sequence such regions, Evan Eichler and his team used a set of clones that contained the full genome of nine humans in 40-kilobase segments. The scientists used dideoxy sequencing, the same technology that led to the first draft of the human genome in 2001, to sequence either the full clones or just their ends. After aligning the sequences to the reference genome they identified new insertions at 720 sites in the genome. Around 25% of these novel sequences show great variation between the European, Asian and African individuals in the group. This indicates that to assess the true diversity of the human race more groups should be sampled and analyzed.