March 16, 2012

A quantitative analysis of the fluctuations in usage frequencies of over 10 million English, Spanish and Hebrew words is published in the journal Scientific Reports. Over 4% of the world's literature, comprising seven languages and dating back to the 16th century, has now been digitized, representing a unique opportunity to systematically study the evolution of language. Alexander Petersen and colleagues analyzed the dynamic properties of words from English, Spanish and Hebrew texts from 1800-2008 recorded in Google’s n-gram database. They observed that the death rate of words has recently increased, with new lexical additions becoming less common; digital spell-checkers may have a role in this, boosting the ‘fitness’ of accepted words at the expense of their misspelled or non-standard counterparts. The communication efficiency bias towards shorter words and the adoption of English as the leading language for science could explain other lexical changes: the word X-ray, for example, outcompeted its synonyms radiogram and roentgenogram. The team also report that fluctuations of the growth rate of a new word peak at around 30-50 years, which corresponds with the typical time period for a word to be included in a standard dictionary. This is also close to the generational time scale for humans, supporting evidence that languages require only one generation to drastically evolve. The study also highlights the way that international conflicts and other social, cultural and political phenomena can impact language use. During World War II, the languages of participating countries appear to have been influenced - through common media attention and increased lexical diffusion, perhaps - whereas non-participating regions, such as Spain and Latin America, were minimally affected.

DOI:10.1038/srep00313 | Original article

