Scientific community: Machine-learning model predicts potential impact of research
Nature Biotechnology
May 18, 2021
A machine-learning model can be used to predict the future ‘impact’ of work published in the scientific literature, according to a paper in Nature Biotechnology. The model, whose score is used to predict the ‘top 5% of papers’ published in any year, could complement existing bibliographic systems that rely on metrics employing paper citations to gauge the potential impact of a scientist’s work.
Many systems have been employed to assess the scientific output of researche, including metrics based on the number of citations accrued by the papers they author. With the advent of machine learning, the opportunity exists to use more aspects related to researcher output in determining the potential impact of their published work. This has led to the proposal that a machine learning model that predicts time-scaled ‘PageRank’ scores, similar to the metric used to rank the importance of webpages, could be applied to researcher output.
James Weis and Joseph Jacobson implemented this idea by employing a model called DELPHI (Dynamic Early-warning by Learning to Predict High Impact) which was trained on the scientific research graph. Using a pool of 1,687,850 unique papers published between 1980 and 2019, a set of 29 features relating to each paper, author, journal and network were derived for 1 to 5 years post-publication. The features for each paper were then used to train a machine-learning model that produced an ‘early warning’ score of impact.
The authors’ model correctly identified 19 out of 20 seminal biotechnologies from the 1980–2014 period in a blind, retrospective study. The model also predicted 50 papers published in 2018 from 42 biotechnology-related journals which would appear in the top 5% in the future, and could be used to identify and channel funding to ‘hidden gem’ research in a data-driven manner. Further extensive testing will be needed to evaluate performance of the approach in fields outside of biotechnology against traditional impact indicators, such as field-normalized citation scores, before such models can be adopted in other areas of research.
doi: 10.1038/s41587-021-00907-6
Research highlights
-
May 12
Geoscience: Monitoring earthquakes at the speed of lightNature
-
May 4
Microbiology: Bacteriophage therapy helps treat multi-drug resistant infection in an immunocompromised patientNature Communications
-
Apr 27
Planetary science: Building blocks of DNA detected in meteoritesNature Communications
-
Apr 8
Health: Psilocybin use associated with lower risk of opioid addictionScientific Reports
-
Apr 5
Energy: Winterizing the Texan energy infrastructure pays off in the long termNature Energy
-
Mar 17
Neuroscience: Sample size matters in studies linking brain scans to behaviourNature