Research Press Release

Scientific community: Machine-learning model predicts potential impact of research

Nature Biotechnology

May 18, 2021

A machine-learning model can be used to predict the future ‘impact’ of work published in the scientific literature, according to a paper in Nature Biotechnology. The model, whose score is used to predict the ‘top 5% of papers’ published in any year, could complement existing bibliographic systems that rely on metrics employing paper citations to gauge the potential impact of a scientist’s work.

Many systems have been employed to assess the scientific output of researche, including metrics based on the number of citations accrued by the papers they author. With the advent of machine learning, the opportunity exists to use more aspects related to researcher output in determining the potential impact of their published work. This has led to the proposal that a machine learning model that predicts time-scaled ‘PageRank’ scores, similar to the metric used to rank the importance of webpages, could be applied to researcher output.

James Weis and Joseph Jacobson implemented this idea by employing a model called DELPHI (Dynamic Early-warning by Learning to Predict High Impact) which was trained on the scientific research graph. Using a pool of 1,687,850 unique papers published between 1980 and 2019, a set of 29 features relating to each paper, author, journal and network were derived for 1 to 5 years post-publication. The features for each paper were then used to train a machine-learning model that produced an ‘early warning’ score of impact.

The authors’ model correctly identified 19 out of 20 seminal biotechnologies from the 1980–2014 period in a blind, retrospective study. The model also predicted 50 papers published in 2018 from 42 biotechnology-related journals which would appear in the top 5% in the future, and could be used to identify and channel funding to ‘hidden gem’ research in a data-driven manner. Further extensive testing will be needed to evaluate performance of the approach in fields outside of biotechnology against traditional impact indicators, such as field-normalized citation scores, before such models can be adopted in other areas of research.

doi:10.1038/s41587-021-00907-6

Return to research highlights

PrivacyMark System