Using math to visualize ancestral relationships between of ancient DNA samples

On September 17, 2020
Researchers from Université Grenoble Alpes (TIMC Laboratory - UGA/CNRS/Grenoble INP) and Paris-Saclay University (Research Lab in Computer Science - CNRS/Inria) have published a study (Factor analysis of ancient population genomic samples) in the Nature Communications journal on a mathematical method that enables geometrical representation of the DNA of ancient populations, while preserving their family relationships. In the future, this method could be applied to all organisms - from viruses to large animals - in order to better understand the history of populations and the evolution of organisms based on changes in their environments.
Stimulated by the boom in high-throughput sequencing technologies, paleogenomics makes it possible to analyze of the DNA of organisms dating back thousands of years. The molecules sequenced (ancient DNA) come notably from archaeological sites, museum collections, and samples preserved in ice or permafrost. Paleogenomics has enabled a number of advances in the study of the evolutionary history of ancient populations, their past migrations, and their relationships with modern populations. In this field, the mathematical analysis of relationships between ancient populations using DNA is nonetheless complicated by the unpredictable nature of the evolution of gene frequencies - known as genetic drift - the variability of which depends on the size of the populations studied in particular. Current approaches measure the effects of this phenomenon using reference populations whose genomes are already available. Nonetheless, researchers do not always have access to adequate reference genomes and their choices can influence the estimation of ancestral relationships between the samples studied.

In a study published this week, researchers from Université Grenoble Alpes and Paris-Saclay University proposed a mathematical and computer method enabling visual representation of ancient DNA samples that reflects their ancestral relationships. This representation makes it possible to bypass the issues of selecting reference genomes and more accurately estimate the coefficients of genetic ancestry for the samples studied. The authors have applied their approach to the analysis of a database of ancient DNA provided by the David Reich laboratory of Harvard University. For samples dating from the Mesolithic period to the Middle Age, the study specifies the relative contributions of groups of hunter-gatherers from Western Europe, the first farmers of Anatolia, and Pontic Steppe farmers (Yamnaya culture). The study confirms that a major migration occurred 4,500 years ago, from the Pontic Steppe to northern and central Europe, and then to the British Isles and France. The most recent European populations have maintained in their genetic heritage an important contribution from the populations of the Pontic Steppe, notably in Scandinavia and the British Isles.

The approach proposed in this study has shed light on several details concerning the genetic structure of European populations. These details, hidden in previous methods, have previously been clouded by the phenomenon of genetic drift.

The methodology proposed in the study makes it possible to analyze the genetic variation of populations by taking into consideration the temporal specificity of DNA samples. In the future, this method could be applied to all organisms - from viruses to large animals - in order to better understand the history of populations and the evolution of organisms based on changes in their environments. According to one of the two authors, the key to the success of this work - combining population genomics, probability calculations, molecular anthropology, new algorithms, and an open-source computer program - was the support of research laboratories in France enabling high-level interdisciplinary activities that would not have been possible otherwise.

Figure 1
Factor analysis of 704 ancient Eurasian individuals aged between 400 and 14,000 years old. The first axis separates western hunter-gatherers (Serbia) from early farmers (Anatolia), while the second axis corresponds to Steppe ancestry. A major change in the genetic composition of the populations of Great Britain and central Europe is seen at around 4,500 years (dotted line). HG: Hunter-Gatherers, N: Neolithic, MN: Middle Neolithic, C: Copper Age, EBA: Early Bronze Age.
Published on  October 7, 2020
Updated on  October 7, 2020