Genes can reveal where we come from

Published online 29 April 2014

Changing an accent may no longer be enough to disguise a person's origins, scientists say. They have developed an algorithm that predicts a person's likely birthplace to within 50 kilometres.

Mohammed Yahia

Small coloured circles with a matching colour to geographical regions represent the 54 reference points used for GPS predictions.
Small coloured circles with a matching colour to geographical regions represent the 54 reference points used for GPS predictions.
© Eran Elhaik et al/ Nature Communications
Researchers have produced a biogeographical algorithm that uses genetic information to reliably infer a person's country of origin. The international team of researchers, led by Eran Elhaik of the University of Sheffield, United Kingdom, and including Pierre Zalloua from the Lebanese American University, Beirut, tested the Geographic Population Structure (GPS) algorithm on a sample of more than 1,650 individuals who had been previously genotyped.

The algorithm, which is presented in Nature Communications, accurately placed 83% of individuals within their country of origin, with some individuals pinpointed down to their cities.

The researchers compared GPS to the spatial ancestry analysis (SPA), an algorithm that was published in Nature Genetics in 2012ref. that only worked on individuals whose European origin was known. This defeats the purpose of the tool in the first place, Elhaik explains. The inclusion of one non-European would drop the prediction accuracy of the entire cohort to zero.

"By contrast, GPS is using a fixed reference population panel and is sample-independent. Analysing a sample alone or with 10,000 other samples would always yield the same result for that sample," adds Elhaik.

"If the goal of the study is to understand the patterns of genetic variation over space, then SPA would be more useful," says Eleazar Eskin, co-author of the SPA algorithm. "GPS on the other hand, is a method designed specifically for predicting the location of individuals. If highly accurate reference samples with known locations are available, then GPS would be more accurate."

"GPS's novelty is in implementing a new paradigm in population genetics whereby all populations are considered mixed and can thus be represented by coefficients of admixture, in contrast to alternative tools modelling populations as mixtures of two populations," says Elhaik.

The algorithm was created to pinpoint exact locations of individuals; but when it comes to admixed communities; it's more precise in tracking their ancestry than it is in pinpointing fresh migrations and present-day residence. The 17% that were not properly placed in their home countries is probably due to their migratory history, suggests Elhaik.

Migratory populations

I am confident that even better algorithms will be developed in the next few years.

Of the Middle Eastern samples, which came from Egypt, Iran, Kuwait, Lebanon and Tunisia, the Kuwaitis were the hardest to track. Kuwaiti individuals with recent ancestors in Saudi Arabia or Iran, for example, were pinpointed there instead.

"Placing Kuwaitis in Iran and surrounding countries is not an error, because this is where they came from," explains Elhaik. "We'll have the same issue with modern-day Americans, most of them will not be mapped to America, but to anywhere else."

The researchers say the accuracy of GPS and its ability to identify trajectory changes in ancestry will improve as more reference population data sets are added. "It is like with the satellite navigation system, the more satellite around you the better they can pinpoint your exact location," says Elhaik.

The tool can be used in genealogical research to study family histories or help people who were adopted to find their home region. It can also help researchers understand how populations' size, genetic diversity and the environment may have shaped certain communities.

"We are most interested in using GPS to promote the field of personalized medicine where a growing number of medications are found to have different therapeutic effects for different populations," says Elhaik. He explained that knowing a person's origins via genetic means would improve the possibility of the best medication for them.

"Although GPS presents very nice results, there is still room for improvement and since this is an active research area, I am confident that even better algorithms will be developed in the next few years," adds Eskin.

Tatiana Tatarinova, co-author of the paper, developed a website to engage the public in learning about their past. The public is welcome to upload their genotype data and find their origin, while learning about the science behind it.


  1. Elhaik, E. et al. Geographic population structure analysis of worldwide human populations infers their biogeographical origins. Nature Communications (2014) doi:10.1038/ncomms4513
  2. Yang, W.N. et al. A model-based approach for analysis of spatial structure in genetic data. Nature Genetics (2012) doi:10.1038/ng.2285