Population Structure Within Italy

October 29, 2012

The good news is there's a new study on population structure in Italy. The bad news is that it's not very well done. The authors failed to sample populations from elsewhere in Southern Europe (like Iberia and the Balkans), which Italians are most related to in other studies, and they didn't collect 4-grandparental information on each individual, resulting in several outliers apparently with recent origins in different parts of the country. They also describe genetic components that are probably very old and have a wide distribution as "Northern European ancestry" and "Middle Eastern ancestry" as if they came from modern populations. Besides all that, the results aren't too surprising.

According to the study, Italians have similar proportions of the same genetic components, varying slightly from North to South, but distinct from both Northern/ Central Europe and the Middle East/North Africa. The island of Sardinia is unique in having an excess amount of one of those components.

In terms of PCA, Italians plot expectedly according to geography between the Western and Eastern Mediterranean (the authors say "France" and "the Middle East" because of the lack of Southern European reference samples), with Sardinians out to the side.

The position of the Italian population samples suggests that genetic distances between these populations and other European and Middle East populations has a good correlation with geographic distances. At the same time, Sardinia was confirmed to be a genetic "outlier".


The relative position of the samples reflected their geographic location: the close correlation between PC and geography was previously reported by several authors. When compared to other European populations, Sardinia was confirmed to be a genetic "outlier", whereas the Northern Italian population was genetically close to the French population, and the Southern Italians had some similarities with other Mediterranean populations such as those from Middle East. Unfortunately, lack of data from other relevant reference populations from the South-East Europe, e.g. from the Balkan peninsula, made it impossible to fully analyze the extent of the Eastern contribution in Italian populations.

Our main goal was to investigate the genetic structure of the Italian population considering four main macro-areas (Northern, Central, Southern Italy and Sardinia). We carried out PC analysis on the Italian samples and plotted the eigenvectors 1 and 2 in Figure 2. Most samples fell within a main cluster which seems to be indicative of Italian peninsula individuals. The first PC divided Italian populations in two clusters, one for Sardinia and the other for the remaining three Italian macro-areas. The Sardinian population is highly dispersed along the first eigenvector.

The second PC divided Italian mainland population into two clusters, with a certain degree of overlapping between Northern and Central Italy, and a separate cluster for Southern Italy, suggesting that genetic variation is generally continuous rather than discrete, at least within Italian mainland.

The overlap of Northern and Central Italy, and the gap between Central and Southern Italy, is explained by the uneven distribution of the samples.

ADMIXTURE analysis confirms that there was no clear separation between Northern and Central Italy, at least as considered as macro-areas. Additional comparison of the distribution of pair-wise identity-by-state within each of the four populations and ADMIXTURE analysis clarified that this is not an artifact of the PC analysis. However, the PC and ADMIXTURE analysis results could be due to the sparse geographical coverage of our samples, especially for the Central and Northern macro-areas. In fact, many of the individuals (N = 413) in the North Italian sample analyzed in this study were from Piedmont — a North West Italian region that has historically been affected by intense migration. At the same time, many individuals in the Central Italy macro-area (113 samples) are settled in Tuscany, an administrative region which is at the border with northern regions.

Within each macro-area, there isn't much substructure, meaning that a Sicilian, e.g., is not particularly differentiated from a Campanian or a Puglian.

A finer view of the Italian substructure, can be seen in Figure S2 where the hidden population structure within the Italian dataset is appreciable. Subjects are labeled by municipality, or in the case of the Sardinian samples, by the main linguistic area. In this figure we can appreciate the lack of clustering at the municipality level, also within Sardinia. Individuals seem to cluster within the main macro-area, but the geographic patterning is less obvious for the municipality (or in the case of Sardinia, linguistic) division, and in our opinion this pattern indicates no substructure within regions among municipalities, while the structuring between regions can be easily detected. It is also possible to appreciate a certain genetic homogeneity within Sardinia.

Di Gaetano et al. "An Overview of the Genetic Structure within the Italian Population from Genome-Wide Data". PLoS One, 2012.

Related: More Italian Population Structure, Much Better Population Structure