An analysis of large bodies of digitised texts dating from the 12th to 21st centuries provides new insights into the evolution of the English language, a Nature paper reports.
Both languages and genes are transmitted between generations, and are subject to change at each step as a result of random fluctuation and natural selection. Joshua Plotkin and colleagues have assessed the relative contributions of these two mechanisms in language evolution by analysing large corpora of digitally annotated texts. They studied three well-known grammatical changes in the English language: the regularisation of past-tense verbs, the use of the periphrastic ‘do’ (such as is found in the phrase, ‘do you know?’), and variation in verbal negation.
They find that both evolutionary mechanisms are operating in English. Random drift is stronger in rare words than in common words, which could explain why rare words are more likely than common words to be replaced in a language. However, selection for the irregular forms of some past-tense verbs (such as ‘lit’ being favoured over ‘lighted’ and ‘dove’ over ‘dived’) seems to buck this trend. The authors suggest that this might be driven by changes in rhyming patterns over time; the increased selection for ‘dove’ coincides with a marked increase in the use of the irregular past-tense form ‘drove’, for example.
The findings demonstrate how combining large digital corpora with inferences from population genetics can yield valuable clues into the forces that drive language evolution.