23 September 2020
The challenge of predicting life’s paths
Published online 3 April 2020
A mass collaboration of international researchers demonstrates the pitfalls of using machine learning to predict the lives that individuals may lead.
The ability to predict an individual’s life course from an early age and thus identify families who may need support is an appealing concept for social scientists and policymakers. However, a large-scale study has suggested there will always be practical limits to how accurately life trajectories can be predicted.
To explore the viability of models used to predict life trajectories, Matthew Salganik and Sara McLanahan at Princeton University, USA and co-workers used a research design known as the common task method. They gave 160 international research teams the same task: to develop and run models for predicting what will happen to a large group of young people and their families by the time they turn 15, such as school performance or parental unemployment.
The teams were given the actual outcomes of half the group to train their models and had to predict the outcomes for the other half. All data came from the longitudinal Fragile Families and Wellbeing Study, which followed children from 4,242 American families throughout their first 15 years.
The results, which included outputs from models built by researchers in Saudi Arabia and the UAE, suggest that even the most sophisticated models cannot replicate the wide variability in such a rich and complex dataset.
“Even with these data and complex machine learning, researchers were unable to make accurate predictions of life outcomes,” says Salganik. “In practice, policymakers should not assume that complex models will automatically produce accurate predictions about an individual’s future.”
Salganik hopes that the study will inspire other mass collaborations using longitudinal survey data in social science.
Salganik, M.J. et al. Measuring the predictability of life outcomes with a scientific mass collaboration. PNAS http://dx.doi.org/10.1073/pnas.1915006117 (2020).