Proteogenomics for gene sequencing
doi:10.1038/nindia.2011.174 Published online 30 November 2011
A team of international scientists has annotated the genome of the malaria vector Anopheles gambiae using high-accuracy mass spectrometry data. They report that this could become a complementary technology for genome annotation. The genome sequence of the mosquito was reported in 2002.
Genome annotation is a continuing effort, and many of the approximately 13,000 genes of A. gambiae have not been validated by any other method. The joint team of Indian, British, American and German scientists tried to identify protein-coding genes of A. gambiae based on its genomic sequence. They undertook deep proteomic analysis using high-resolution Fourier transform mass spectrometry for both precursor and fragment ions.
Based on peptide evidence, they were able to support or correct more than 6000 gene annotations including 80 novel gene structures and about 500 translational start sites. They also validated 105 selected genes by RT-PCR and cDNA sequencing methods.
"Our proteogenomic analysis led to the identification of 2682 genome search–specific peptides," the researchers say. Numerous cases of encoded proteins were documented in regions annotated as intergenic, introns, or untranslated. Using a database created to contain potential splice sites, they also identified 35 novel splice junctions.
The researchers say that the study illustrates the application of high-resolution mass spectrometry data for analysis and annotation of genome. Their study supports the proteogenomic approach as a valuable tool to complement genome sequencing, they add.