Pre-Roman Population Structure

October 14, 2021

There's a new study on Etruscans that makes a ridiculous claim about Modern Italians. Because the Etruscans were West Mediterranean (Iberian-like) and Imperial Romans were shifted toward the East Mediterranean, the authors stupidly assume that Modern Italians must be a ~50/50 mix of Etruscans and slaves/barbarians from the Roman Empire, with a North-to-South cline going from Western Europe to the Near East/Levant. But that ignores the entire Pre-Roman demographic history of Italy and Europe, and contradicts the findings of Antonio et al. (2019) that Roman Republicans already resembled Modern Italians, and that foreigners disappeared after Rome fell.

As five other new studies show, population structure in Italy is much more ancient. Modern Italians, like other Europeans, are actually in a cline from Sardinian-like Anatolian Neolithic (AN) Farmers, with varying Western Hunter-Gatherer (WHG) admixture, to the Bronze Age "Steppe" — a mix of divergent components including Caucasus Hunter-Gatherer (CHG) and Iranian Neolithic (IN) from the "Middle East", and Eastern Hunter-Gatherer (EHG) and Ancient North Eurasian (ANE) from "Northeast Asia". The reason Italians (especially in the South) appear closer to the "Levant" is that they have higher AN and CHG/IN, and lower WHG and EHG/ANE, than most other Europeans, because they were more affected by early farmers and then by migrations that brought Indo-European languages and later Ancient Greek culture from the Balkans in Southeastern Europe.

As you can see in the PCA of the first study, Iron Age Eastern Europeans (dark blue plus signs) were very "diverse", with a few still mostly CHG/IN and some way over in France. About 10 of them already cluster with Modern Italians and Southern Europeans. The cline away from AN/WHG is very obvious, going from bottom to top, with samples of Iron Age Apulians (IAA) from Southern Italy moving over time toward the broad "Steppe" zone and up to where Italians and other Europeans are today:

In order to shed light on this and to provide a genetic picture of Iron Age Southern Italy, we collected and sequenced human remains from three archaeological sites geographically located in Northern Apulia (the area historically inhabited by Daunians) and radiocarbon dated between 1157 and 275 calBCE. We find that Iron Age Apulian samples are still distant from the genetic variability of modern-day Apulians, they show a remarkable genetic heterogeneity, even though a few kilometers and centuries separate them, and they are well inserted into the Iron Age Pan-Mediterranean genetic landscape. Our study provides for the first time a window on the genetic make-up of pre-imperial Southern Italy, whose increasing connectivity within the Mediterranean landscape, would have contributed to laying the foundation for modern genetic variability. In this light, the genetic profile of Daunians may be compatible with an autochthonous origin, with plausible contributions from the Balkan peninsula.


To explore the genetic make-up of the IAA population, we performed a PCA projecting the ancient individuals onto the genetic variation of modern Eurasian samples (Figure 1C, Data S2 and Data S3). Our samples are largely scattered between modern peninsular Italians and Sardinians, and, in contrast to what was generally described for other European Iron Age populations (e.g., Northern_Europe_IA, Western_Europe_IA and Levant_IA in Figure 1C), they are still clearly distant from the genetic variability of modern-day inhabitants of Apulia. The downward shift of Iron Age Apulians from the present-day ones is further confirmed by the significantly negative f4(Modern Apulians, IAA; X, Mbuti), where X is a Neolithic/Chalcolithic/Copper Age population (Figure 2A, Data S4). Within the remarkable heterogeneity reported by the PCA, which does not mirror the archaeological sites, the two medieval individuals are shifted towards modern Middle Eastern and Caucasus populations (ORD010 and SGR001), while the others are stretched along the PC2. This pattern partially mirrors the chronological date with the most recent being more similar to present-day Southern Europeans, and is further strengthened when considering the PC3 distribution (Figure S2). Three samples located at the bottom of the PCA (ORD004, ORD019, SAL007) and one (SAL010) falling in the middle did not include modern Apulians among the top 25 results of an f3 outgroup analysis (Figure S3). All of them showed an affinity to Copper and Bronze Age Italians as well as the Aegean and the Mediterranean worlds (including Minoans, Greece, Croatians, and Gibraltar). A similar distribution is mirrored in the Multi-Dimensional scaling (MDS) built from the f3 outgroup measures, where the oldest IAA individual (SAL001; 1235 - 1048 calBCE (95.4%)) lies farthest from the modern samples, while the medieval ones (ORD010: 1078 - 1156 cal CE (95.4%) and SGR001: 670 - 774 cal CE (95.4%)) are the closest (Figure S4).


The geographic location of Apulia, a narrow peninsula stretching out in the sea at the South of Italy, has made this region an important Mediterranean crossroads connecting Western Europe, the Balkans, the Aegean, and Levant worlds. This is reflected in the PCA where IAA individuals are closely related to other Iron Age populations from the Mediterranean and surrounding areas (e.g., Montenegro, Bulgaria and Sardinia) (Figure 1C and Figure S4).


In order to shed light onto the genetic composition of the IAA individuals, we modelled them as a combination of the main ancestries documented across Western Europe at that time: Western Hunter-Gatherers (WHG), Anatolian Neolithic (AN), Steppe-related and, interchangeably, Caucasus Hunter-Gatherers (CHG) or Iranian Neolithic (IN) using the qpWave/qpAdm framework (Figure 2B, Figure S5A, STAR Methods). Broadly, the contributions of such ancestries to the genetic variability of ancient European populations vary according to their geographical positions: in particular, northernmost locations received higher proportions of WHG, Steppe-related ancestry and, consequently, CHG ancestries, while Southern European groups carried variable Iranian Neolithic or CHG traces. In view of this, we observed that while the IAA individuals could generally be modelled as a two-way admixture between AN and Steppe (0.63±0.08 and 0.37±0.08, respectively), the alternative model AN + CHG/IN could also fit for a subset of them, particularly in case of the samples ORD004, ORD010 and SAL010 with higher or comparable p-values (Figure S5A, first row with two sources). When three or four sources were tested, the presence of WHG ancestry in the majority of our individuals emerges, which, together with AN, Steppe and CHG/IN, forms a supported model for IAA samples (Figure 2B, Figure S5A and Data S5). Notably, for the individuals stretching downwards in the PCA (ORD004, ORD019, SGR002 and the Medieval ORD010) a three-way admixture involving AN, Steppe and CHG/IN is generally preferable.


We also investigated whether the PCA scattering was due to varying African or Levantine contributions with f4(Rome Republican, IAA, Levant_N/YRI, Mbuti) and tried the same on Medieval ancient Apulians (ORD010 and SGR001). However, none of the tested ancient Apulians shows a significant excess of YRI ancestry when compared to the contemporary Roman Republicans, even though ORD014, SAL007 and SAL011 show negative f4 values with a Z-score between 2 and 3 (Figure S8B).

The second study shows that the descendants of those Iron Age Balkan peoples (blue squares) in a Serbian sample dated to a later period (but before they were changed by Slavic migrations) all cluster with Modern Italians, forming the same kind of cline as in Italy from Iron Age Slovenia and Croatia in the North to Iron Age Bulgaria and Aegean in the South. This represents the original Pre-Roman ancestry of Southeastern Europe (including most of Italy).

Individuals from the first cluster fall on an area of the PCA delimited by the "Balkan Iron Age cline" (Figure 1A). Consistent with this, we model the ancestry of this Balkans Iron Age Cluster as predominantly deriving from Iron Age (IA) groups from nearby areas in the Balkans, with 67% Aegean Bronze Age-related ancestry and the remainder Slovenia Iron Age-related ancestry (Figure 2; Supplementary section 12.1). A local origin is supported by a high frequency of Ychromosome lineage E-V13, which has been hypothesized to have experienced a Bronze-to-Iron Age expansion in the Balkans and is found in its highest frequencies in the present-day Balkans. We interpret this cluster as the descendants of local Balkan Iron Age populations living at Viminacium, where they represented an abundant ancestry group during the Early Imperial and later periods (~47% of sampled individuals from the 1-550 CE).

The third study looks back even farther, showing that population structure already existed in the Neolithic, with many Balkan and Italian farmers being more AN and less WHG, and that it kept increasing in the Bronze Age, with Sicilians starting to shift toward CHG/IN long before the Roman Empire and even before the Greek settlements of Magna Graecia. Other Italians were also starting to shift (in slightly different directions) toward their current positions.

It is globally accepted that this [Neolithic farming] demic diffusion followed both a Mediterranean route (following the Mediterranean coasts, including Italy) and a continental route (Balkan route reaching Central and Western Europe through the Danubian valley), with the terminal fringes of this migration admixing again.


However, with the increase of ancient samples examined, subtle differences between Central-West and Central-East Europe started to emerge, with samples from Western and Eastern Europe forming two distinct clusters in the PCA (Fig. 2C) (Olalde et al. 2015; Rivollat et al. 2020; Marcus et al. 2020; Saupe et al. 2021). F4 statistics and related method (qpAdm) showed that this separation is possibly due to different amounts of ancestry related to WHG. It may be possible that the populations spreading through Central and Mediterranean routes mixed differently with residing hunter-gatherer populations, with the former witnessing a higher number of admixture events, in line with some archaeological evidence suggesting a sparser mesolithic hunter-gatherer population in Southern Italy, although new archaeological surveys are needed (Mussi 2006; Martini 2019).

The Bronze Age Aegean civilizations, mainly represented by Minoans and Mycenaeans, also harboured up to a quarter of their ancestry ultimately connected with ancient populations from Caucasus and Iran. Nonetheless, only the Mycenaean individuals also showed the EHG-related component (Lazaridis et al. 2017). For this reason, at least in the Minoans, the three-way split ancestral contribution is not completely fulfilled and raises other issues about who brought the Caucasus/Iran-related component into Europe.


The work from Antonio and colleagues examined the genetic transitions experienced by Central Italians which, during the Iron Age, would have given origin to the founders of the Roman civilization (Antonio et al. 2019). While they detected the suggestive presence of a small amount of the Caucasus/Iran-related component as early as the Neolithic period, they observed an increase of that component during the Bronze and Iron Age (between~4800 and 2850 ya), possibly due to increased trade-driven mobility.


The Steppe ancestry started appearing in Sicily around 4150 ya, during the Early Bronze Age, as demonstrated by qpAdm modelling on autosomal genetic variation with two notable outliers carrying up to 20% and 40% of this genetic component (Sicily_EBA8561 and Sicily_EBA11443, (Fernandes et al. 2020). Starting from the Middle Bronze Age (3750–3450 ya), Sicilian samples show a considerable shift in the PCA towards Minoans and Mycenaeans (Fig. 3A, Minoans and Mycenaeans are grouped in the Balkan area in green) and can be modelled with the Iranian-related component in a percentage around 15% (Fernandes et al. 2020). This data opens up the fascinating possibility that this mysterious ancestry must have reached Southern Italy before the occupation of the southern coastal areas of Italy (Magna Graecia). Conversely, ancient Sardinians showed a higher degree of genetic continuity from the Neolithic to the Bronze Age, with almost all individuals showing similar proportions of Anatolian Neolithic and WHG ancestries.


Multiple Iron Age populations characterized by different languages and cultures lived in Italy but, genetically speaking, they started to approximate the modern Italians (Aneli et al. 2021; Antonio et al. 2019).

The fourth study focuses on the same process in new samples from Northern and Central Italy (previously published samples are the circles with black borders). It shows again the AN/WHG structure in Neolithic Europe, but the Bronze/Iron Age Italian cline this time is shifted slightly away from the Balkan kind of Steppe ancestry toward the kind with more EHG/ANE that likely came across the Alps from Central Europe.

The human remains are from one necropolis (Necropoli di Gattolino; hereafter, "Gattolino") and three cave sites located in Northeastern (Grottina dei Covoloni del Broion: "Broion") and Central Italy (Grotta La Sassa: "La Sassa" and Grotta Regina Margherita: "Regina Margherita"; Figures 1A and S1).


A separation within the EN [European Neolithic] cluster (Figure 2A) clearly differentiates Anatolian and Eastern Europe N (right) from Western Europe N, defined as samples west of Germany (left, toward western hunter-gatherer [WHG]; Data S1D). A similar separation has already been reported and interpreted as a difference in WHG proportion of these samples. We note that most of our Chalcolithic individuals fall on the right side of the cluster (Anatolia and Eastern Europe).


Our analyses show the expected signature of peri- and post-BA movements from Steppe-related populations across Italy: absent in Italian individuals from the N and Chalcolithic, emerging in the Early BA (Italian Bell Beaker [I2478: 2195–1940 calBCE], Italian Remedello [RISE486: 2134–1773 calBCE], and Broion [BRC010: 1952–1752 calBCE (95.4%)]) and increasing through time in the individuals from Broion and Regina Margherita (GCP003: 1626–1497 calBCE [95.4%]). These samples confirm the date of arrival in Northern Italy to at least 2000 BCE and its presence in Central Italy by 4 centuries later, although denser sampling strategies are needed to assess the dynamics of this spread. Our qpAdm results suggest that the Steppe-related ancestry component could have arrived through Late N/Bell Beaker groups from Central Europe, though what remains unknown due to small sample size and limited geographical and chronological distribution is whether there were multiple Steppe population sources and the exact timing and diffusion of this ancestry component through the Italian Peninsula. The R1b subtype found in BA Broion is a lineage found in both ancient Sicilian samples and Italian Bell Beakers. Together with the autosomal affinity of North and Central Italian BA groups with Late N Germany, the Ychr data point to a possibly Northern-, transalpine-, and potentially Bell-Beaker-associated source of the Italian Steppe-related ancestry.

The fifth study has admixture profiles for several Modern Italian groups: Sardinians, Northerners, 3 Southern samples, and "Aspromonte" (old Greek-speaking communities in the South), showing that the same pan-Italian (and pan-European) components used to model the ancient samples are also the best fits for modelling the modern ones, with the levels of each in different parts of Italy matching the clines and positions of populations in all the PCAs above.

Ancient genetic heritage of Southern Italian populations.

Since present-day patterns of genetic variation reflect both local dynamics of differentiation and the ancestral population history, in order to provide a temporal overview on the ancestral genetic legacy of analyzed Southern Italian groups we finally compared the genetic landscape defined by modern populations with a large panel of ancient DNA samples extracted from the literature and timewise spanning from the Mesolithic to the Iron Age (Suppl. Table S7).

Consistently with previous results, the PCA performed by projecting ancient samples onto the modern genetic variation reveals specific patterns of population relationships (Suppl. Figure S8). In fact, all the Southern Italian groups, besides showing a general high affinity with Anatolian and European Neolithic farmers, cluster also closely with the Chalcolithic and Bronze Age samples from Anatolian and Aegean (Minoan and Mycenaean) populations. Differences in affinity patterns were formally tested with the outgroup-f3 statistic measuring the extent of shared drift between modern Italian groups and the main ancient genetic components represented by Western European Hunter-Gatherers (WHG), Eastern European Hunter-Gatherers (EHG), Caucasian HunterGatherers (CHG), Anatolian Neolithic farmers (AN) and Pontic-Steppe Yamnaya (EMBA). Overall Sardinia shows the highest levels of shared drift with samples of Neolithic-related ancestry compared to Northern and Southern Italy. Both Sardinians and Northern Italians show higher affinity to WHG than Southern Italians, who instead appear more affected by CHG-related groups. On the other hand, Yamnaya Steppe and EHG share more affinity to North Italy than to both Southern Italians and Sardinians (Suppl. Figure S9). In addition, qpGraph-based phylogenies consistently recapitulate the observed genetic patterns, with Sardinians showing a good fit to a two-way mixture model between populations representing Early European Farmers and West European Hunter-Gatherers (Suppl. Figure S10a), and North Italy instead achieving a successful fit to a graph model with an additional admixture event from an EHG-related lineage (Suppl. Figure S10b). Interestingly, when fitting present-day Southern Italian populations into the tested qpGraph models we find them compatible with an additional contribute that, differently from Northern Italy, does not originate from an EHG-related source but instead from a CHG-related lineage (Suppl. Figure S10c). This fits to the data in the sense that there are no f-statistics more than |Z| > 3 different between model and expectation.

Finally, to better characterize the ancestral composition of Southern Italian populations, we inferred their mixture proportions with respect to a four-population model of admixture including all the above-mentioned WHG, Neolithic, CHG/Iran_N and Steppe-related main sources, using qpAdm. All Italian populations were successfully modeled as characterized by a relatively high amount of Anatolian Neolithic ancestry, with the major contribution observed in Sardinians (Fig. 4, Suppl. Table S8). The remaining ancestries were assigned to a lower WHG contribution and to differential influences of Steppe_EMBA and CHG/Iran_N in the profiles of Northern and Southern Italians, respectively (Fig. 4, Suppl. Table S8). In fact, while Steppe ancestry is greater in North Italy (~ 27%), the Iran_N/CHG-related source is more present in South Italy with the highest values (~ 29%) observed in the populations from the Aspromonte area.

Finally, even the Etruscan study itself has a PCA showing Copper-Bronze Age Sicilians (pink triangles) creeping up to where Modern Sicilians and Southern Italians are. They don't veer left to first become like Etruscans so that they can later "mix with slaves" and do a U-turn toward the Levant. They were always Balkan-like, because Iberian-like people were not the only (or even the original) inhabitants of Italy.

So the alleged cline of Italians from Etruscans to Levantines is an illusion created by coincidental PCA positions and bad choices of reference populations for modelling. Italians are on the same Farmer-to-Steppe cline as all other Europeans. It's just that because they're geographically and genetically between Southwestern and Southeastern Europe, they're increasingly shifted toward AN/CHG/IN (peaking in the South), whereas the rest of Europe (including, somewhat, Northern Italy and the Northern Balkans) is shifted increasingly more toward WHG/EHG/ANE. And it's been that way since at least the Neolithic, up through the Bronze and Iron Age, way before the slaves and barbarians of the Roman Empire, who had almost no impact on the pre-existing population structure.

Posth et al. "The origin and legacy of the Etruscans through a 2000-year archeogenomic time transect". Sci Adv, 2021.

Aneli et al. "The genetic origin of Daunians and the Pan-Mediterranean southern Italian Iron Age context". bioRxiv, 2021.

Olalde et al. "Cosmopolitanism at the Roman Danubian Frontier, Slavic Migrations, and the Genomic Formation of Modern Balkan Peoples". bioRxiv, 2021.

Aneli et al. "Through 40,000 years of human presence in Southern Europe: the Italian case study". Hum Genet, 2021.

Saupe et al. "Ancient genomes reveal structural shifts after the arrival of Steppe-related ancestry in the Italian Peninsula". Curr Biol, 2021.

Sarno et al. "Genetic history of Calabrian Greeks reveals ancient events and long term isolation in the Aspromonte area of Southern Italy". Sci Rep, 2021.