Research Press Release

Computational biology: Machine learning approach to identify new designer drugs

Nature Machine Intelligence

November 16, 2021

The chemical structure of unknown, new psychoactive substances, also known as ‘designer drugs’ or ‘legal highs’, can be determined from their mass spectrum alone thanks to an automated, generative, machine learning approach presented in Nature Machine Intelligence. Knowledge of these structures could help forensic laboratories to more quickly identify suspected designer drugs.

A large number of new psychoactive substances appear on the illicit market every year. These substances provide psychoactive effects similar to those of known illicit drugs; however, as they are synthesized in a way that makes them chemically different, they avoid existing drug legislation and even detection. Forensic laboratories use mass spectrometry to identify known designer drugs in seized pills or powders. Elucidating the structure of an entirely new designer drug, however, typically requires weeks or months of work by expert chemists and the use of additional experimental techniques.

Michael Skinnider and colleagues used confidential data crowdsourced from forensic laboratories around the world to train a machine learning model to generate molecules with structures and properties similar to those of recent designer drugs. A database of one billion structures of potential, new psychoactive substances was subsequently produced by the model. Testing the model with new data — collected after it was trained — revealed that this approach could determine the chemical structure of unknown designer drugs from their mass spectrum alone. In instances in which an exact structure could not be accurately determined, the model suggested structures that were very similar to the unknown designer drug.

The authors conclude that similar generative approaches, trained on other datasets, could also help to identify the structure of unknown molecules in other specialized domains, such as the identification of new performance-enhancing drugs or environmental pollutants.

doi:10.1038/s42256-021-00407-x

Return to research highlights

PrivacyMark System