Worldwide Genetic Structure Elucidates the Eurasian Origin and Invasion Pathways of Dothistroma septosporum, Causal Agent of Dothistroma Needle Blight

Dothistroma septosporum, the primary causal agent of Dothistroma needle blight, is one of the most significant foliar pathogens of pine worldwide. Its wide host and environmental ranges have led to its global success as a pathogen and severe economic damage to pine forests in many regions. This comprehensive global population study elucidated the historical migration pathways of the pathogen to reveal the Eurasian origin of the fungus. When over 3800 isolates were examined, three major population clusters were revealed: North America, Western Europe, and Eastern Europe, with distinct subclusters in the highly diverse Eastern European cluster. Modeling of historical scenarios using approximate Bayesian computation revealed the North American cluster was derived from an ancestral population in Eurasia. The Northeastern European subcluster was shown to be ancestral to all other European clusters and subclusters. The Turkish subcluster diverged first, followed by the Central European subcluster, then the Western European cluster, which has subsequently spread to much of the Southern Hemisphere. All clusters and subclusters contained both mating-types of the fungus, indicating the potential for sexual reproduction, although asexual reproduction remained the primary mode of reproduction. The study strongly suggests the native range of D. septosporum to be in Eastern Europe (i.e., the Baltic and Western Russia) and Western Asia.


Introduction
Dothistroma needle blight (DNB) is one of the most important and damaging diseases of pines worldwide, affecting over 109 Pinaceae taxa [1]. The disease causes foliar necrosis, premature needle cast, reduction in growth, and in extreme cases, tree death [2]. DNB causes large economic losses (e.g., NZD $24 million per year in New Zealand [3] and GBP £8.6 million per year in the UK [4]) due to timber losses from forest plantations, but also negatively affects the landscape and recreational value of forests and the esthetic value of ornamentals, with non-timber losses estimated at GBP £50 million per year in the UK alone [5].
The disease is caused by two species: Dothistroma septosporum (Doroguine) Morelet and D. pini Hulbary, which are only distinguishable using molecular methods [6]. Dothistroma septosporum has a worldwide distribution, being present on all continents where available hosts grow in habitats ranging from tropical to sub-arctic on a large host range [1]. On the other hand, D. pini has a much more restricted host and geographical range, being present in parts of east-central USA and a limited number of European locations [1,7]. DNB achieved notoriety in the 1950s and 1960s in the Southern Hemisphere, where it caused and continues to cause extensive damage to non-native pine plantations [1,8]. Since the 1990s, however, DNB has increased in incidence and severity in the Northern Hemisphere, with severe largescale outbreaks occurring in Canada, the UK, and France, and with the disease being reported for the first time from much of central and northern Europe [1]. Reasons for this increase in incidence and severity in the last 30 years has been ascribed to various aspects of the disease triangle: host genotype, pathogen genotype and environment. A number of studies have concluded that contributing factors to this disease escalation are increased host availability through the expansion of plantation forests, often of more susceptible non-native hosts, and changing climatic conditions, particularly aboveaverage precipitation [1,[9][10][11]. However, a comprehensive global population study to investigate the role of pathogen genotype and population has not been conducted to date.
DNB has been observed in some regions for over 100 years, for example, Northeastern France [12], European Russia [13], Ukraine [1], and parts of the USA [14][15][16], with dendrochronological studies indicating its presence in British Columbia, Canada since 1831 [10,17]. However, the origin of the genus and both species is still unknown, with the native area hypothesized to be Central America [14], the Himalayas [18], and parts of North America and Europe [14,19]. Recent genetic studies have not been able to resolve the question of origin, partly due to a lack of substantial samples from Asia and Central America. These studies have, however, shown support for the pathogen being native to parts of both North America and Europe [20][21][22][23].
Individual population studies have shown unexpected levels of diversity in many populations [20][21][22][23][24][25][26][27][28][29]. This diversity indicates that sexual reproduction is more common than previously thought, particularly as the sexual stage (previously known as Mycosphaerella pini) has only rarely been observed, and the asexual stage is ubiquitous. Yet, it has also provided support for the hypothesis that the fungus may be native to these areas. However, most of these studies have focused on a single country or region within a country, and it is unclear how these various populations relate to each other, including their pathways of introduction and migration. Only a few studies have included samples from more than one country, and only two [20,21] have explored intercontinental population patterns. Given the global distribution of D. septosporum, the objective of this study was to combine isolates from the previous population studies with newly obtained isolates to provide a unified overview and large-scale context of its populations and to investigate the role of pathogen genotype and population in contributing to the increase in serious outbreaks.
Microsatellite data from an unprecedented number of samples were collected from across the worldwide range of D. septosporum, encompassing all but three countries from which the pathogen has been reported. The data were modeled using approximate Bayesian computation (ABC), a method used to understand population history, including invasion pathways of fungi, e.g., [30,31]. This likelihood-free technique manages an arbitrary number of populations and samples that are employed in complex evolutionary scenarios and is particularly suited to inferences about introduction histories of invasive species [32]. The specific aims of this study were to (i) tie together and contextualize previous individual population studies by including isolates from previous population studies and newly obtained isolates, (ii) elucidate the phylogeographic relationships of individual regions and populations, (iii) investigate the mating-type and prevalence of sexual recombination in these populations, (iv) determine whether the pathogen origins lie in North America or Eurasia, and (v) determine the source of introduced populations in the Southern Hemisphere.

Sample Collection and Fungal Isolation
In order to cover the largest geographical range possible and to facilitate comparison with previous D. septosporum population studies, isolates and/or DNA from earlier studies [20][21][22][23]25,[27][28][29]33] were included in this study. Additional needle samples with typical symptoms of DNB were collected from various Pinus species across six continents. Many samples were opportunistically collected where infected pine trees were found. Single spore isolations of D. septosporum were made using the methods outlined in [34].
Isolates were grouped according to their country of origin and are listed in Table S1. In cases where large unsampled areas occurred between samples, distinct population groups were created to reflect geographical population groups more accurately (e.g., multiple population groups in Canada).

Haplotype and Mating-Type Determination
Isolates were grown in the dark for ca. two weeks at 20 • C on autoclaved cellophane discs (Innovia Films, Wigton, UK) placed on Dothistroma medium [35] to obtain mycelium for DNA extraction. DNA was extracted using a Kingfisher Flex magnetic particle processor (Thermo Scientific, Waltham, MA, USA) using Kingfisher Plant DNA extraction kits (Thermo Scientific). Species-specific mating-type primers [36] were used to determine the Dothistroma species and mating-type of each isolate as outlined in [27]. Eleven microsatellite markers developed by [37] were used for multilocus haplotyping. Multiplex PCR of the markers (Doth_DS1, Doth_DS2, Doth_E, Doth_F, Doth_G, Doth_I, Doth_J, Doth_K, Doth_L, Doth_M, Doth_O) and fragment analysis were conducted as described by [27].
Isolates with identical multilocus haplotypes (MLHs, i.e., alleles identical at all 11 loci) were considered clones. Two data sets were created: one containing all individuals (nonclone-corrected data set), another containing only one individual of each multilocus haplotype per population (clone-corrected data set).

Genetic Diversity and Differentiation
Five indices were used to evaluate genotypic diversity and were calculated in the R packages poppr [38] and vegan [39] using the non-clone-corrected dataset: (i) Shannon-Wiener index, H [40,41]; (ii) Stoddart and Taylor's index, G [42]; (iii) Simpson's index, λ [43]; (iv) genotypic richness, eMLG, the expected number of multilocus genotypes (eMLG) calculated by rarefaction to the smallest sample size (≥10); and (v) genotypic richness, E 5 , an estimation of evenness which is equal to 0 when a single genotype is dominant and increases to 1 as genotypes become more equally represented [41]. The proportion of isolates derived from clones, or asexual reproduction, is known as the clonal fraction (CF) and was calculated according to the method of [44].
The clone-corrected dataset was used to calculate further indices: Nei's gene diversity, H exp [45], calculated in poppr; the total number of alleles, number of private alleles, and the mean haploid genetic diversity (h) calculated in GENALEX 6.5 [46]; and allelic richness (A R ) (i.e., the number of distinct alleles in a group) and private allele richness (PA R ) (i.e., the number of alleles unique to a particular group) calculated in ADZE [47]. The A R and PA R were computed using a rarefaction procedure to adjust them to a specific sample size that allowed comparison between populations having different sample sizes. Calculations were standardized to a uniform size corresponding to the size of the smallest group.
Pairwise F ST values, used as a measure of population differentiation, and Nm, the predicted number of migrants between population groups, were calculated in ARLEQUIN 3.5 [48].

Mating-Type Distribution and Sexual Recombination
An equal proportion of mating-type idiomorphs indicates that sexual reproduction could be frequent enough to maintain equilibrium. To determine whether groups differed significantly from the null hypothesis of a 1:1 ratio of mating-type idiomorphs, an exact binomial test, using two-tailed p-values, was used [49].
Poppr [38] was used to calculate the index of association (I A ) together with its associated measure (r d ). The I A is a measure of multilocus linkage disequilibrium and r d is a modification of it that facilitates comparisons between studies by removing the dependency on the number of loci used [50,51]. Sexual populations are expected to have linkage equilibrium due to no linkage among loci, while clonal populations are expected to have significant disequilibrium due to linkage among loci. The I A and r d from the observed data were compared to values obtained after 1000 randomizations to simulate random mating.
Both clone-corrected and non-clone-corrected data sets were used for mating tests in order to reduce the chance of rejecting the null hypothesis of random mating that a smaller clone-corrected data set might carry [52].

Population Structure
The population structure of the clone-corrected dataset was assessed using both STRUCTURE and DAPC. STRUCTURE 2.3.4 [53] implements a Bayesian, model-based clustering algorithm to assign individuals to a specified number of clusters (K), minimizing linkage disequilibrium and maximizing Hardy-Weinberg equilibrium within the clusters [54]. To estimate the optimal number of clusters, 60 independent runs of K = 1-15 were carried out in STRUCTURE using no priors (i.e., no information on geographical location or host was provided). Each run had a burn-in of 100,000 iterations followed by 500,000 data-collecting iterations, using a model of correlated allele frequencies and with admixture among populations allowed. CLUMPAK [55] was used to determine the optimal value of K using the ∆K method of [56]. CLUMPAK was used to align all optimum K STRUCTURE runs to the permutation with the highest H-value. The MCL threshold for similarity scores was set to 0.9. The DISTRUCT program [57] was used to visualize the CLUMPP output. Individual haplotypes were assigned to a particular cluster if their membership probability to that cluster was ≥0.8. Additionally, a hierarchical STRUCTURE analysis was done in which the isolates from each of the three major clusters were run in a separate STRUCTURE analysis, with the settings identical to those described above.
To complement the Bayesian approach implemented in STRUCTURE, a multivariate technique that makes no assumptions regarding the population model or data structure was used [58]. Discriminant analysis of principal components (DAPC) was conducted in the R package ADEGENET [58,59]. It is particularly suited to identifying clusters (K) of genetically related individuals as it minimizes variation within groups and maximizes variation between groups [58]. A sequential K-means procedure followed by an assessment of the Bayesian information criterion (BIC) to assess the optimal number of clusters precedes the DAPC analysis itself. Cross-validation was used to determine the optimal number of principal components retained in the analysis [60].

Modeling of Evolutionary History
The STRUCTURE clusters were used to inform and develop historical scenarios describing the evolutionary relationships among populations. These scenarios were investigated using approximate Bayesian computation (ABC) conducted in DIYABC v2.1.0 [61]. The real observed dataset of microsatellite haplotypes is compared with large numbers of simulated datasets (one million per scenario) based on competing for evolutionary scenarios (models). The topology of the scenarios is designed as a composition of events such as separation of one population from another, merging of two populations or change of effective population size. Furthermore, each scenario is characterized by a set of demographical parameters (time of events in the number of generations, effective population size, admixture rate) and a mutational model. Model selection (scenario comparison) is performed via relative posterior probabilities assigned to each scenario resulting from their relative vicinity (of the appropriate simulated datasets) to the observed dataset in a multidimensional space of summary statistics (i.e., usual population genetic characteristics such as gene diversity or Fst which decrease the complexity of the multilocus dataset).
As the number of potential scenarios between a large number of populations is large, cumbersome, and computationally onerous, a stepwise procedure was adopted to build evolutionary scenarios to address specific questions about relationships among two or three STRUCTURE (sub)clusters, where the best scenario in the first step was used to inform the scenarios of the second step (sensu [62]).
The first question about relationships among the North American (NA), Western European (WE) and Eastern European (EE) clusters (K = 3 STRUCTURE result) was assessed via a set of 19 scenarios in ABC analysis 1. The scenarios tested whether all three population clusters were derived separately from an ancestral population or if one cluster was derived from one of the other two, from an unsampled population, or from an admixture event between populations. The dataset contained all samples from Canada and USA, Western and Eastern Europe (243, 546 and 1041 clone-corrected samples for NA, WE and EE, respectively; Table S2).
Subsequently, as the position of the Western European cluster was revealed, the evolutionary relationship between North American and Eastern European clusters was addressed in the fourteen scenarios of ABC analysis 2. These scenarios were based on the premise that one sexual cycle per year was possible and that any genetic exchange between North America and Eurasia was either less than 500 years ago (i.e., the European discovery of America) or 11,000-30,000 years ago (i.e., across the Bering Land Bridge). Tested scenarios included the North American cluster being derived from the Eastern European cluster, and vice versa, either with a bottleneck or without, and either with restrictions on the timing (up to 500 years ago or 11,000-30,000 years ago) or without. Further scenarios included both populations being derived from an unsampled ancestral population, with and without a bottleneck, and with the restrictions on timing described above. Bottleneck events were allowed to range from 0 to 40 generations as continental translocation could be expected to include a longer than usual bottleneck (see Table S2 for a detailed explanation and the historical interpretation of each scenario).
The third question (analysis 3) regarded relationships among the three subclusters of the Eastern European cluster (K = 5 STRUCTURE result), i.e., Northeastern Europe (NEE, 223 samples), Central Europe (CEE, 641 samples) and Turkey (TUR, 82 samples). The same 19 scenarios as in analysis 1 were used.
As the relationship of the Central European subcluster was clarified, analysis 4 centered on determining the relationship between the Northeastern European subcluster and the Turkish subcluster using five scenarios. The scenarios tested whether both populations were derived from an ancestral population independently, if one derived from the other, or from admixture with an unsampled population.
Once the relationship between the three Eastern European subclusters was clarified, the position of the Western European cluster was evaluated in analysis 5 using 11 scenarios. The topology resulting from the previous analyses was retained, and scenarios tested whether the Western European cluster derived from any of the three Eastern European subclusters or an unsampled population or from admixture between any two of these populations.
The final three analyses (6)(7)(8) aimed to determine the origins of the introduced populations (with over 10 MLHs) in the Southern Hemisphere. Twenty-two scenarios tested whether the Southern Hemisphere population derived from any of five main populations delimited by STRUCTURE (North America, Western Europe, Central Europe, Northeastern Europe, Turkey) or an unsampled population, or from admixture between any two of these populations. As the Southern Hemisphere populations are known to be recent introductions, the time of their formation was limited to between 1 and 300 generations ago. Three Southern Hemisphere populations were considered-South Africa Hogsback (n = 16; analysis 6), South Africa Tzaneen (n = 13, analysis 7), and Chile (n = 11, analysis 8).
A list and full description of all scenarios is provided in Table S2. Demographic priors of the tested scenarios included effective population size (10 to 10,000), the time of the split or admixture event (in the number of generations ago; 1 to 10,000), the duration of the bottleneck event (in number of generations; 0 to 20), the effective number of founders of a population (2 to 100), and the rate of admixture (0 to 1), except where these differ as specified above. One million datasets were simulated for each scenario. The generalized stepwise model (GSM) was followed for the microsatellite loci, and the default DIYABC values for the priors of the mutation model parameters were used, with the exception of the mean mutation rate, which was extended to a minimum of 1 × 10 −5 . Only classic dinucleotide microsatellite markers fitting the GSM were used in the ABC datasets. Additionally, the highly polymorphic dinucleotide marker L was excluded, resulting in seven markers (DS1, DS2, F, G, I, J, and K) used in the analyses.
For each simulation, a number of commonly used genetic summary statistics (mean number of alleles for one sample and between two samples, mean heterozygosity, F st between two samples, mean index of classification between two samples, and (δµ) 2 distance between two samples) were used to compare it to the observed dataset using Euclidian distances. The posterior probability of each scenario was then estimated by polychotomous logistic regression on the 1% of simulated datasets closest to the observed dataset [63,64]. Posterior distributions of parameters, model checking using the posterior based error and summary statistics not used in model selection, and confidence in scenario choice using 1000 pseudo-observed test data sets were calculated using the options in DIYABC v2.1.0.

Isolates and Haplotypes
In total, 3872 D. septosporum isolates from 44 countries on six continents were used in this study (Table S1). The isolates were grouped into 56 population groups based on geographical proximity (Figure 1a,b, Table S1). In the vast majority of cases, this corresponded to the county. In some cases, isolates from the same country were placed into separate population groups (e.g., distant geographical groups separated by large areas without isolates in Canada and Norway). Full details of isolates, including the host, geographic location, and population group, are provided in Table S1. Based on the 11 microsatellite markers, these isolates consisted of 1913 unique multilocus haplotypes. All loci were polymorphic with a total of 377 alleles, ranging from 6 at Doth_O to 97 at Doth_L. stepwise model (GSM) was followed for the microsatellite loci, and the default DIYABC values for the priors of the mutation model parameters were used, with the exception of the mean mutation rate, which was extended to a minimum of 1 × 10 −5 . Only classic dinucleotide microsatellite markers fitting the GSM were used in the ABC datasets. Additionally, the highly polymorphic dinucleotide marker L was excluded, resulting in seven markers (DS1, DS2, F, G, I, J, and K) used in the analyses.
For each simulation, a number of commonly used genetic summary statistics (mean number of alleles for one sample and between two samples, mean heterozygosity, Fst between two samples, mean index of classification between two samples, and (δµ) 2 distance between two samples) were used to compare it to the observed dataset using Euclidian distances. The posterior probability of each scenario was then estimated by polychotomous logistic regression on the 1% of simulated datasets closest to the observed dataset [63,64]. Posterior distributions of parameters, model checking using the posterior based error and summary statistics not used in model selection, and confidence in scenario choice using 1000 pseudo-observed test data sets were calculated using the options in DI-YABC v2.1.0.

Isolates and Haplotypes
In total, 3872 D. septosporum isolates from 44 countries on six continents were used in this study (Table S1). The isolates were grouped into 56 population groups based on geographical proximity (Figure 1a,b, Table S1). In the vast majority of cases, this corresponded to the county. In some cases, isolates from the same country were placed into separate population groups (e.g., distant geographical groups separated by large areas without isolates in Canada and Norway). Full details of isolates, including the host, geographic location, and population group, are provided in Table S1. Based on the 11 microsatellite markers, these isolates consisted of 1913 unique multilocus haplotypes. All loci were polymorphic with a total of 377 alleles, ranging from 6 at Doth_O to 97 at Doth_L.

Population Structure
Assessment of the delta K statistic clearly indicated three clusters best explained the data from the STRUCTURE analysis ( Figure S1). All 60 independent STRUCTURE runs were concordant (Figures 2 and S2a,b). K-means clustering and inspection of the BIC (Figure S3) from the DAPC analysis also supported three clusters as the best split ( Figure 3). The STRUCTURE and DAPC clusters were highly congruent. The clusters were named according to their major geographical occurrences as the North American, Western European and Eastern European clusters. Higher values of K can also be biologically relevant and were examined to discern the substructuring of the populations. At K = 4, a clear subcluster formed of Central European isolates, and at K = 5, the Turkish isolates formed a distinct subcluster ( Figure 2). The hieriarchical STRUCTURE analysis, in which each of the three major STRUCTURE clusters was run separately, produced patterns identical to those of the main run at higher values of K. The exception to this was the North American cluster, in which substructuring was seen in a roughly east-west pattern ( Figure S4).

Population Structure
Assessment of the delta K statistic clearly indicated three clusters best explained the data from the STRUCTURE analysis ( Figure S1). All 60 independent STRUCTURE runs were concordant ( Figure 2 and Figure S2a,b). K-means clustering and inspection of the BIC ( Figure S3) from the DAPC analysis also supported three clusters as the best split ( Figure 3). The STRUCTURE and DAPC clusters were highly congruent. The clusters were named according to their major geographical occurrences as the North American, Western European and Eastern European clusters. Higher values of K can also be biologically relevant and were examined to discern the substructuring of the populations. At K = 4, a clear subcluster formed of Central European isolates, and at K = 5, the Turkish isolates formed a distinct subcluster ( Figure 2). The hieriarchical STRUCTURE analysis, in which each of the three major STRUCTURE clusters was run separately, produced patterns identical to those of the main run at higher values of K. The exception to this was the North American cluster, in which substructuring was seen in a roughly east-west pattern ( Figure S4).

2.
Bayesian clustering of D. septosporum multilocus haplotypes inferred using the program STRUCTURE at K = 3, K = 4 and K = 5. Each multilocus hap esented by a vertical line partitioned into colored sections that represent the isolate's estimated membership fractions in each cluster. Black lines se s from different population groups.

3.
Scatterplot of the discriminant analysis of principal components (DAPC) of D. septosporum multilocus haplotypes. Only the first two principal compo DAPC are displayed. The first axis is the horizontal axis; the second axis is the vertical axis. Group 1 is equivalent to the Eastern European STRUC , group 2 to the Western European STRUCTURE cluster, and group 3 to the North American STRUCTURE cluster. Individual multilocus haplotyp nted by dots and groups as inertia ellipses. At the top right, the PCA eigenvalues are represented, with the number of principal components used zed analysis in black. Only the first two principal components of the DAPC are displayed. The first axis is the horizontal axis; the second axis is the vertical axis. Group 1 is equivalent to the Eastern European STRUCTURE cluster, group 2 to the Western European STRUCTURE cluster, and group 3 to the North American STRUCTURE cluster. Individual multilocus haplotypes are represented by dots and groups as inertia ellipses. At the top right, the PCA eigenvalues are represented, with the number of principal components used in the optimized analysis in black.

Genetic Diversity
Genotypic diversity measures (H, G, λ; Table 1a) showed the lowest diversity in South Croatia and Spain, while the highest was in Southern Poland and Canada West BC. Genetic diversity measures (H exp , h, A R ; Table 1a), on the other hand, showed the highest diversity in Lithuania, Latvia, and South Estonia while the lowest tended to be in South Croatia or in population groups in the Southern Hemisphere such as Australia and New Zealand. While general trends can be seen from these values, they are influenced by the large variation in sample size between individual population groups; therefore, it is more informative to consider the diversity values for the STRUCTURE clusters (K-3) and subclusters (K = 4 and 5). The Eastern European cluster showed the highest level of diversity by almost all measures, both genotypic and genetic (eMLG, H, G, λ, H exp , h, A R ; Table 1b). The Western European and North American clusters had broadly similar levels of diversity. In terms of subclusters, the Central European subcluster had the highest levels of genotypic diversity (eMLG, H, G, lambda) as well as the highest total number of alleles and private alleles, and Turkey had the lowest values for these measures (Table 1c). However, the Central European subcluster also had the greatest number of samples. When rarefied to an equal number of samples (n = 82), the Northeastern European subcluster had the highest allelic richness (A R ), private allele richness (PA R ), and mean haploid genetic diversity (h) ( Table 1c). Values of F st varied greatly and ranged between 0 and 1, with larger values between North American and European population groups and Nm (number of migrants) ranging from 0 to infinity (Table S3).

Mating-Type Distribution and Sexual Recombination
The vast majority of population groups contained both mating-types (Table 2a). The exceptions to this were population groups with a low number of isolates or MLHs (<5) and two Southern Hemisphere population groups (New Zealand, Chile). Sexual recombination is therefore possible in most population groups, and this was supported by the exact binomial test on the mating-type ratios, using both non-clone-corrected and clone-corrected datasets (Table 2a). However, the more sensitive I A and r d tests indicate linkage equilibrium, and therefore routine sexual reproduction, in far fewer population groups, particularly using the clone-corrected dataset.
Of the main STRUCTURE clusters, only in the North American cluster was sexual recombination frequently occurring according to the mating-type ratios (Table 2b). The I A and r d tests did not support frequent sexual recombination in any of the other STRUCTURE clusters. Regarding the STRUCTURE subclusters, frequent sexual recombination was supported in the Turkish subcluster by both non-clone-corrected and clone-corrected mating-type ratios and in Central Europe by the clone-corrected I A and r d tests.

Modeling of Evolutionary History
To determine the evolutionary history of D. septosporum from the current set of isolates representing a global collection, 8 sets of scenarios were considered, each addressing a unique question regarding the evolution of D. septosporum populations. The STRUCTURE results revealed three main population clusters (North American, Western European and Eastern European) and the first set of ABC scenarios (analysis 1, Table S2) aimed to elucidate the relationship between them. The posterior probabilities were highest for scenarios 1.12 and 1.9 (p = 0.4012 and p = 0.3821, respectively; Figure S5, Table S2) in which the Western European cluster derived from the Eastern European cluster, either directly (scenario 1.9) or through a weak admixture event with an unsampled population (scenario 1.12), and both the Eastern European and North American clusters derived independently from the ancestral population. Posterior probabilities for scenario 1.9 indicate the Western European population cluster derived from the Eastern European cluster a median of 95.8 and a mode of 45.2 generations ago. For scenario 1.12, an admixture of the Eastern European population cluster and an unsampled population occurred at a median of 46.1 and a mode of 18.7 generations ago (Table S4). Therefore, the posterior probabilities indicate the creation of the Western European cluster occurred around 18-96 generations ago.
The next set of scenarios (analysis 2) was designed to elucidate the relationship between the North American and Eastern European populations. The most supported scenario involved the North American population deriving from an unsampled/ancestral population less than 500 generations ago and the Eastern European population deriving from an unsampled/ancestral population up to 50,000 generations ago (p = 0.7593; Table S2). The inverse scenario, with Eastern Europe deriving from an unsampled/ancestral population less than 500 generations ago and the North American population deriving from an unsampled/ancestral population up to 50,000 generations ago was unsupported (p = 0.0031), as were scenarios where the North American and Eastern European populations were directly derived from each other or those where the genetic exchange between Eurasia and North America corresponded to the presence of the Bering Land Bridge (p < 0.05).
Posterior probabilities indicate the North American population derived from the ancestral population a median of 316 and a mode of 434 generations ago with a strong bottleneck (few founders and long duration), whereas the Eastern European population derived from the ancestral population a median of 6460 and a mode of 3210 generations ago (Table S4). Table 2. Mating type ratio and index of association tests for the D. septosporum isolates of (a) the population groups; (b) the three main STRUCTURE clusters; (c) the three Eastern European subclusters. Bold p values (i.e., those that are non-significant) indicate random mating is supported by the test. (a)

Non-Clone-Corrected
Clone-Corrected  Subsequent sets of scenarios dealt with the relationship between subclusters of the three main STRUCTURE clusters in Eastern Europe. Analysis 3 investigated whether the Central European subcluster, Northeastern European subcluster and Turkish subcluster arose from the ancestral population separately or whether one derived from the other, from an unsampled population, or from admixture, possibly with an unsampled population. This analysis revealed that the Central European subcluster was clearly derived. In the most supported scenario (S3.16, p = 0.2218), the Central European subcluster derived from the Northeastern European subcluster. However, confidence intervals overlapped with two other scenarios, which were also well supported (S3.19, p = 0.2079 and S3.17, p = 0.1957, Figure 4, Table S2). The Central European subcluster was also derived in these scenarios, in the case of S3.19 from admixture between the Northeastern European subcluster and an unsampled population, or in the case of S3.17 from admixture between the Northeastern European subcluster and the Turkish subcluster. Posterior probabilities for the formation of the Central European subcluster were a median of 232 and a mode of 136 generations ago for S3.16, a median of 152 and a mode of 71.2 generations ago for S3. 19, and a median of 187 and a mode of 109 generations ago for S3.17. Therefore, the Central European subcluster is likely to have arisen in the range of 70 to 190 generations ago with a weak bottleneck occurring (short duration and a higher number of founders) (Table S4). contrast, both the South Africa Hogsback and Chile population groups originated from the Western European population (p = 0.3602 and p = 0.7727, respectively). Other population groups in the Southern Hemisphere contained too few multilocus haplotypes to confidently determine their origin using ABC analyses, yet most of them (Australia, New Zealand, Ecuador and Colombia) belonged strongly to the Western European STRUC-TURE cluster, as did Chile, suggesting they too were introduced from the Western European population. Only Kenya differed in its STRUCTURE membership, which is almost identical to the South Africa Hogsback population at all values of K ( Figure 2, Figure  S2a,b), suggesting a similar origin. A graphical representation of relative timings of D. septosporum population divergence events is given in Figure 4, and a graphical representation of geographical migration events from all historical scenarios is given in Figure 5. Visual inspection of the model checking results confirmed all winning scenarios and their priors fit the real observed dataset well ( Figure S6). Confidence in scenario choice for each analysis is given in Table S2, and all posterior distributions of parameters are given in Table S4.  The next set of scenarios (analysis 4) considered only Northeastern Europe and Turkey (without the purely derived Central European subcluster) and revealed that the Turkish subcluster was derived from the Northeastern European subcluster (p = 0.4116, Figure S5, Table S2). Posterior probabilities indicated the Turkish subcluster arose a median of 985 and a mode of 444 generations ago with a strong bottleneck occurring (long duration and a low number of founders) (Table S4).
Analysis 5 revealed that the Western European cluster derived from the Northeastern European subcluster directly and not from the Central European or Turkish subclusters (p = 0.4318, Figure S5, Table S2). Posterior probabilities closely corresponded to those in other analyses and demonstrated that the youngest population to diverge was the Western European cluster, followed by the Central European subcluster and finally the Turkish subcluster, which was the oldest population divergence event to take place ( Figure 4, Table S4).
Investigation of the origin of the Southern Hemisphere populations revealed that the South Africa Tzaneen population group originated from admixture between the Central European and Northeastern European populations (p = 0.4322, Figure S5, Table S2). In contrast, both the South Africa Hogsback and Chile population groups originated from the Western European population (p = 0.3602 and p = 0.7727, respectively). Other population groups in the Southern Hemisphere contained too few multilocus haplotypes to confidently determine their origin using ABC analyses, yet most of them (Australia, New Zealand, Ecuador and Colombia) belonged strongly to the Western European STRUCTURE cluster, as did Chile, suggesting they too were introduced from the Western European population. Only Kenya differed in its STRUCTURE membership, which is almost identical to the South Africa Hogsback population at all values of K ( Figure 2, Figure S2a,b), suggesting a similar origin.
A graphical representation of relative timings of D. septosporum population divergence events is given in Figure 4, and a graphical representation of geographical migration events from all historical scenarios is given in Figure 5. Visual inspection of the model checking results confirmed all winning scenarios and their priors fit the real observed dataset well ( Figure S6). Confidence in scenario choice for each analysis is given in Table S2, and all posterior distributions of parameters are given in Table S4.

Discussion
This study constitutes the largest population study ever done on D. septosporum, encompassing new and all previously studied populations representing collections from 44 countries, comprising all but three of the countries where the pathogen has been reported from, with over 3800 isolates-with the aim of inferring global evolutionary histories. The population study on this large world-wide collection of D. septosporum revealed the presence of three major population clusters (North America, Western Europe and Eastern Europe) and their historical relationships, as well as more fine-scale substructuring of these populations and their interrelations. The North American population cluster was the most genetically distinct and geographically restricted, while the Western European population cluster has spread to much of the Southern Hemisphere. Analysis of historical scenarios crucially revealed that D. septosporum is an Old World species being introduced into North America from an ancestral population of Eurasian origin. Historical scenarios also showed many of the European populations ultimately derived from the Northeastern European subcluster indicating that Northeastern Europe and western Asia are likely to be the center of origin of the pathogen. The Turkish subcluster is derived from the Northeastern European subcluster first (i.e., the oldest divergence event), followed by the Central European subcluster and then the Western European cluster (the most recent divergence event).
DNB has been present in North America since at least 1917, when infected needles were collected in Idaho, USA [6,15]. However, dendrochronological studies by [17] suggest that DNB was causing a noticeable growth reduction of P. contorta Dougl. ex Loud. var. latifolia Engelm. (lodgepole pine) in British Columbia, Canada, as early as the 1830 s. In Europe, on the other hand, DNB was first recorded from needles collected in 1860 in France (as Hypostomum flichianum Vuill.), in 1878 in lower Austria (as Leptostroma pinastri in de Thümen Mykotheca universalis n • 1278) (Holdenrieder and Queloz, personal communication), in 1880 in Denmark (as Mycosphaerella pini Rostr. apud Munk) and in 1910 in European Russia (as Cytosporina septospora Dorogin) [6]. The first records in North America and Europe are roughly concurrent; thus, it has long been debated whether Dothistroma and D. septosporum, in particular, is an Old or New World species. This question is especially relevant as D. septosporum was long considered a non-native quarantine organism on the EPPO A2 list of quarantine organisms. However, since the end of 2019, due to its now widespread occurrence in Europe, it has been considered a Regulated Non-Quarantine pest (RNQP) in the European Union.
Historical scenarios developed to elucidate the relationship between North American and Eurasian populations using DIYABC centered around the only known connections between the two continents, i.e., either ca. 11,000 to 30,000 years ago when the Bering Land Bridge allowed human, plant, and animal contact between North America and Asia or ca. 500 years ago till present when European rediscovery of North America again made the transfer of the fungus possible. If the Eurasian population split from an ancestral population less than 500 years ago, or between 11,000 and 30,000 years ago, the ancestral population could have been in the Americas. Conversely, if the North American population split from an ancestral population either less than 500 years ago, or between 11,000 and 30,000 years ago, then the ancestral population could have been in Eurasia. The analysis revealed that the North American population cluster is relatively young and was clearly derived from an ancestral population less than 500 years ago (i.e., ca. 300 generations ago). The Eastern European population cluster was estimated to have derived from the ancestral population in the order of several thousand years ago (i.e., three to seven thousand generations), making a New World ancestral population impossible. The data, therefore, demonstrate that D. septosporum is an Old World species. This is supported by the greater genetic and genotypic diversity of the Eastern European population cluster compared to the North American population cluster. Furthermore, all North American isolates formed a distinct, well-supported population cluster in STRUCTURE and DAPC analyses with no further substructuring of the population even at very high values of K (up to K = 15). Conversely, the Eastern European population cluster displayed clear substructuring and high genetic diversity typical of native or long-established populations, whereas no substructuring with lower diversity suggests a derived, or more recently established, population. Only when a hierarchical STRUCTURE analysis was conducted, running the North American isolates independently, were distinct clusters delimited, which resembled those found by [25] (Figure S4). This suggests the structuring of the North American populations is much weaker than that of the European populations.
Nonetheless, the use of a larger number of markers, e.g., SNPs from genotyping-bysequencing or whole-genome sequencing, and isolates from as yet unsampled regions may reveal new relationships and links. Particular progress could be made elucidating the timing of divergence events. Such progress would help reconcile the findings of Capron et al. (2020), who found Canadian D. septosporum populations diverged from each other between 31 and 7 thousand years ago, with the findings of the present study where the North American population diverged more recently.
The distinct Western European population cluster dominates the range from northwestern France through the British Isles and Ireland to western Norway. Even though this is a European population, it is clearly differentiated from the Eastern European population cluster, which occurs throughout the rest of Europe and Asia. Modeling of historical scenarios showed that the Western European population arose from the Eastern European population cluster and had no connection with the North American population cluster. Such a connection between Western Europe (i.e., France and Britain) and its ex-colonies in North America would not have been unexpected considering the close trade links and movement of potentially infected material. However, the results using microsatellite markers indicate the Western European cluster derived relatively recently, less than 100 generations ago, and if we consider one generation to occur in one year, then this is after the first evidence of DNB in North America.
Regardless of its recent origin, the Western European population cluster has been spread to much of the Southern Hemisphere where it has caused, and in some countries continues to cause, severe damage to pine plantations [1,9]. Pine species are native to the Northern Hemisphere, with only P. merkusii's native range just crossing the equator [65], and have been introduced to the Southern Hemisphere primarily for timber production [66]. Introductions often followed colonization routes, and afforestation programs began in many areas from the 1870s onward (e.g., Australia, New Zealand, South Africa) [67][68][69]. Introduction of pines was accompanied by or followed by the introduction of D. septosporum, and it is noteworthy that many of the Southern Hemisphere countries containing the Western European population cluster were old British colonies (i.e., Australia, New Zealand, South Africa, Kenya), demonstrating a clear route of introduction through increased trade volumes with the colonial power and Western Europe in general. Nonetheless, South American countries that were never British colonies also harbor the Western European population cluster (e.g., Chile, Ecuador and Colombia). Additionally, the Central European and Northeastern European subclusters have been introduced to South Africa Tzaneen, showing that not only the Western European population has been introduced to the Southern Hemisphere. It is clear that there have been at least two to three separate introductions of D. septosporum to South Africa as the Hogsback population group originates from the Western European cluster while the Tzaneen population group originates from admixture between the Central European and Northeastern European subclusters.
Although the causal agent of DNB was first clearly described from European Russia in 1911 [13], the disease only achieved notoriety in the 1950s and 1960s due to its severe impact on pine plantations in the Southern Hemisphere, particularly East Africa, New Zealand and Chile [1,8]. Sporadic reports of the disease occurred in Europe until the 1980s, yet without it causing significant damage [1]. It was not until the 1990s that the disease caused severe damage in Europe-in France and Britain [2,70]. The areas where DNB has caused the most damage in both the Southern and Northern Hemispheres are plantations of often exotic species, yet these areas also share the same population cluster of D. septosporum-the Western European population cluster. This population is well-differentiated, based on microsatellite markers, from the Eastern European population from which it derived, and differences doubtless occur throughout the genome. It is possible, given the severe damage this population inflicts on its host, that the Western European population cluster has higher virulence compared to the other population clusters. The Western European population cluster is found on 37 host taxa (species, subspecies and varieties) while the Eastern European population cluster has only been found on 23 host taxa and the North American population cluster on only P. contorta and its varieties. This increased host range is also suggestive of increased virulence and, coupled with the era of colonization and increased global trade from Western Europe to the Southern Hemisphere, may have contributed to its global success. Variation in virulence levels of the populations should be investigated by rigorous artificial inoculation tests which would also help define the risk further introductions and spread of this population would pose to pine forests worldwide.
At the edge of their distribution ranges, species are known to vary from their core distribution range due to limited gene flow with the core population and increased genetic drift within the edge population, resource limitation, new interspecific interactions, environmental pressures, etc. [71][72][73]. Populations that colonize new habitats beyond their original distribution limits are likely to suffer strong selection pressure due to the intensification of these factors and exposure to new pressures and species competitions [74]. Those populations that survive in new or edge habitats are likely to become locally adapted, with studies of mollusks to mammals and plants showing populations adapted to their local environments [74][75][76], which in some cases are extremely, and unexpectedly, successful when transplanted outside of their local environment [75].
Dothistroma septosporum populations are by no means exempt from these pressures and tendencies. Separate populations in new areas are exposed to new host taxa, different foliar microbial communities and competitors, and disparate environmental conditions. This sudden exposure to novel conditions, along with the associated genetic bottleneck, places the fungus under intense pressure to adapt rapidly. A shift to more sexual reproduction in such challenging conditions is likely as sexual reproduction in many ascomycete fungi is associated with adverse conditions [77]. The generation of new genotypes by sexual recombination facilitates survival and adaptation, particularly advantageous in new or changing conditions [78,79]. We hypothesize that the novel host and environmental conditions faced by D. septosporum in North America drove the population towards an increased rate of sexual reproduction, as evidenced by the equal proportion of both matingtypes, a situation not seen in the Western or Eastern European population clusters. This increased rate of sexual recombination hastened local adaptation and divergence from the Eurasian populations.
This local adaptation may reflect a possible host preference of the North American population group to lodgepole pine, the sole host on which this population cluster was found and a species native to North America. Although the majority of isolates from North America were from lodgepole pine, the two non-lodgepole pine isolates originating from North America did not cluster with the North American population cluster. Additionally, there is strong evidence that the North American population cluster has subsequently been reintroduced into Europe, to Scotland, where it occurs almost exclusively on lodgepole pine [23,80,81]. Such specialization suggests the population cluster could be distinct enough to form a geographical and physiological race, a term used by [80,81] when working with isolates of this population. Once again, artificial inoculation tests, based on the populations presented here, would help clarify not only the virulence but also the degree of host specialization of the major D. septosporum global populations.
While the affinity of the North American population group for lodgepole pine is clear, many of the other population clusters and subclusters present a preponderance for certain host taxa. For example, the Western European population cluster is found predominantly on P. nigra subsp. laricio, P. sylvestris and P. contorta, the Central European subcluster mainly on P. nigra and P. mugo, the Northeastern European subcluster on P. sylvestris and P. nigra and the Turkish subcluster on P. brutia. This may indicate some degree of host adaptation, but it is much more likely to be a result of the geographical distribution of the subclusters and the hosts that are grown in these areas, either native species or the non-native species often used in commercial forestry plantations where DNB thrives. Therefore, the presence of D. septosporum on particular hosts is typically more influenced by the geographical distribution of the particular D. septosporum populations rather than the host specification of the pathogen itself. To what extent specific populations exhibit local adaptation, and whether this is more influenced by host species or environmental conditions (e.g., climate) will require extensive testing under controlled conditions.
The Eastern European population cluster dominates the D. septosporum population in much of Europe and Asia from northern Norway and Sweden throughout Europe to Turkey and across to Bhutan and the Russian Far East in Asia. This population cluster has the highest genotypic and genetic diversity of all the population clusters and exhibits significant substructuring. The most prominent subcluster is that of Central Europe, which dominates the Czech Republic, Slovakia, southern Poland, Hungary, Switzerland, Eastern Austria, Slovenia, and Romania. Further subclustering reveals Turkey as a distinct subcluster. Historical scenarios show the Central European subcluster to be derived from the Northeastern European subcluster, although with possible genetic contributions from the Turkish subcluster or an unsampled population. The Central European subcluster was formed relatively recently, ca. 70-190 generations ago, and underwent a weak bottleneck event. In contrast, the Turkish subcluster, clearly derived from the Northeastern European subcluster, was formed long ago, ca. 400-1000 generations ago, and underwent a strong bottleneck event. This weak bottleneck and proximity to Northeastern Europe suggest the Central European subcluster could be derived from the natural spread of the pathogen.
The result that all European population clusters and subclusters (Western European cluster, Central European subcluster, Turkish subcluster) are derived from the Northeastern European subcluster, along with this group's high genetic diversity strongly suggests the region is part of the center of origin of D. septosporum as proposed by [20]. However, the same STRUCTURE population cluster is present in southern, particularly south-eastern, Europe. These population groups were not used for ABC modeling due to the lower numbers of samples from this region. Therefore, this region could encompass the native range of the pathogen as well. This same population cluster is present in Bhutan and the Russian Far East and, although no samples were obtained from much of central Asia, this population could occur throughout the boreal forests of Asia and Europe. The region possesses similar environmental conditions and abundant host species, particularly P. sylvestris, whose native range extends across the entire region. Dothistroma septosporum produces only mild symptoms on P. sylvestris in the region [1,20,82], a situation typical of co-evolved hosts and pathogens [83,84], further supporting the pathogen's indigenous status in the region.
Supplementary Materials: The following are available online at https://www.mdpi.com/2309-6 08X/7/2/111/s1, Figure S1: Delta K plot of the STRUCTURE analysis, showing K = 3 as the best clustering of individuals, Figure S2: Bayesian clustering of D. septosporum multilocus haplotypes inferred using the program STRUCTURE at K = 6, K = 7 and K = 8 (a) and K = 9, K = 10 and K = 11 (b). Each multilocus haplotype is represented by a vertical line partitioned into colored sections that represent the isolate's estimated membership fractions in each cluster. Black lines separate isolates from different population groups, Figure S3: Plot of the Bayesian information criterion (BIC) vs. the number of clusters from k-means clustering showing a break at K = 3, indicating three clusters best describe the dataset for DAPC analysis, Figure S4: Bayesian clustering of only the North American D. septosporum multilocus haplotypes inferred using the program STRUCTURE at K = 2, K = 3 and K = 4. Each multilocus haplotype is represented by a vertical line partitioned into colored sections that represent the isolate's estimated membership fractions in each cluster. Black lines separate isolates from different population groups. The inset shows a graph of delta K, Figure S5: Graphical representation of the winning scenarios of demographic history for each of the eight DIYABC analyses. Abbreviations used on time scales refer to time parameters used during simulations, Figure S6: Model checking of the winning DIYABC scenarios for each of the eight conducted analyses, Table S1: Details of D. septosporum isolates used in this study including the host, geographic location, population group, microsatellite allele calls and mating-types, Table S2: Scenario descriptions, posterior probabilities with 95% credibility intervals of each scenario, and posterior predictive error of models, Table  S3: Pairwise population F ST (below the diagonal) and Nm, the estimated number of migrants per generation (above the diagonal) for the D. septosporum population groups, Table S4: Posterior distributions of parameters for winning scenarios of DIYABC analyses 1 to 8.

Data Availability Statement:
The data presented in this study are available in Table S1.