The recent boom in ‘machine learning’ or ‘artificial intelligence’-driven approaches to materials discovery has been largely underpinned by structured property databases, often encompassing vast amounts of numerical data. But of course there is more to the literature than just datapoints and numbers. Now, Vahe Tshitoyan, Anubhav Jain, Gerbrand Ceder and colleagues argue that implicit connections and relationships between words in the literature could also be harnessed to discover new materials. They take 3 million abstracts of published materials science articles and apply natural language processing algorithms to uncover relationships between words, material compositions and properties – some obvious, some less obvious. By projecting material compositions onto the word ‘thermoelectric’, they predict potential new thermoelectric materials, and also show that there was enough information in the literature to predict current ‘top performers’ several years before they actually were discovered.
Recent Hot Topics
Sign up for Nature Research e-alerts to get the lastest research in your inbox every week.