25 November 2020
Improved characterization of genetic variation by machine-learning algorithm
Published online 22 May 2014
Differences in just a single base pair in a gene can alter the structure of an encoded protein and affect an organism in ways that can range from harmless to causing a lethal disease. Predicting how an organism might be affected by a single base pair substitution is a challenging task. In a paper appearing in Bioinformatics, researchers combined machine-learning with a genetic algorithm to overcome the limitations of existing methods.
The team, including Trisevgeni Rapakoulia of the King Abdullah University of Science and Technology in Saudi Arabia, used an approach that combined the output of several classification models and also developed a new genetic algorithm to evolve each of the individual classifiers towards its optimal performance.
“Our method uses a new fitness function that balances the sensitivity and specificity trade-off, while at the same time reduces the complexity of the extracted prediction models and improves their generalization properties,” says Rapakoulia.
As a result, the new algorithm more successfully characterises genetic variations than existing approaches. It also produces a score that can predict the severity of a variation’s impact. The research team emphasises that the relationship between the predictor and the severity of the single base pair substitution should be properly confirmed by further testing with larger datasets.
Rapakoulia, T. et al. EnsembleGASVR: A novel ensemble method for classifying missense Single Nucleotide Polymorphisms. Bioinformatics (2014) doi: 10.1093/bioinformatics/btu297