The first relies on quantitative probe-based PCR, while the second diagnostic uses a newly developed CRISPR-Cas12a system, which enables rapid detection with simpler equipment than quantitative real-time PCR. In summary, my thesis provides genomic resources and a knowledge of migration history and patterns for these two agricultural pests with significant economic implications. This information, combined with our new molecular diagnostics, should enable agricultural agencies to make better decisions for shipping and quarantine policy, and allow for rapid effective responses to future invasion events.Over the past decade, Drosophila suzukii , also known as the spotted-wing drosophila or the Asian vinegar fly, has become an incredibly invasive pest species and a threat to soft-skinned fruit agricultural production worldwide . Unlike the large majority of Drosophilidae , which preferentially breed in decaying plant material, female D. suzukii possess a serrated ovipositor, enabling them to lay eggs in fresh ripening soft-skinned fruits . First described in Japan as an agricultural pest of cherries, D. suzukii was primarily distributed across East Asia until researchers found wild specimens in Hawaii in 1980 . In 2008, D. suzukii was detected in California, and by 2009 was widespread across the Western U.S. coast . In the Eastern U.S., D. suzukii first appeared in Florida in 2009 , before again rapidly spreading across the entire east coast within a few years.
Meanwhile in Europe, D. suzukii was first detected in Spain and Italy in 2008 and rapidly spread across Europe, growing raspberries in container appearing in France, Switzerland, Austria, Germany, and Belgium by 2012. Subsequently, D. suzukii arrived in South America when it was detected in Brazil in 2013 , Argentina in 2014 , and Chile in 2015 . Its rapid spread across continents suggests that human transportation is likely a major factor, as eggs laid in fresh fruit are difficult to detect before shipment. Once established in a new continent, D. suzukii rapidly disperse to neighboring regions, aided by its ability to adapt to a wide range of climates through phenotypic plasticity . In the Western U.S. coastal states alone, estimated economic losses were as high as 511 million dollars per year, assuming a 20% average yield loss . Thus, there is much interest in understanding the patterns of migration and origin of these invasive populations, as these data can be used to inform shipping and quarantine policies and to identify routes of entry.Previous research on the population genomics of D. suzukii was performed using a relatively small number of molecular markers. Adrion et al. used six X-linked gene fragments from flies collected across the world, and detected signals of differentiation between European, Asian, and U.S. populations. However, they found no evidence of differentiation within the 12 U.S. populations sampled, possibly due to the limited power provided from a small number of markers. A follow-up study using 25 microsatellite loci of samples collected between 2013-2015 greatly improved estimations of migration patterns worldwide; the authors found evidence for multiple invasion events from Asia into Europe and the U.S. as well as an East-West differentiation in the 7 populations sampled in the continental U.S. .
However, using microsatellites alone may miss more subtle signals of population structure compared to genome-wide datasets, as increasing the number of independent loci genotyped increases accuracy of population parameter estimates, even when the number of biological samples is low . With the advent of affordable whole genome sequencing , it has become feasible to sequence hundreds of individuals to study population genomics, enabling improved inference of population structure using hundreds of thousands to millions of single nucleotide polymorphism markers . A study of D. suzukii in Hawaii used double digest restriction-site associated DNA sequencing to identify several thousand SNPs and observed population structure between islands . However, a comprehensive survey of D. suzukii in the continental U.S. using a large number of SNPs enabled by WGS has not been conducted. In this study, we leverage the power of WGS to individually sequence hundreds of D. suzukii samples to determine whether U.S. populations are stratified along a north-south cline corresponding to varying winter climates, as well as to detect whether migration is freely occuring between the Eastern and Western U.S. In addition, we include several populations from Asia, Europe, and Brazil to determine frequency and source of international migrations and compare genetic diversity between invasive and native populations. We expect these analyses and the large sequencing dataset will be of value in developing policies and furthering research into mitigating the harmful effects of D. suzukii worldwide.To determine if population structure exists in D. suzukii living in recently invaded locations, we sequenced wild caught individual D. suzukii flies collected from the continental U.S., Brazil, Ireland, Italy, South Korea, and China, as well as a laboratory strain from Hawaii and Japan .
After aligning sequences to the reference genome, we found that average read coverage was low for some individuals and populations, with mean coverage per cluster ranging from 5-11X. As low coverage can cause biases in genotype calling, we used methods that implemented genotype likelihoods wherever possible. We first used PCA and admixture proportion estimates to search for signs of population structure. When examining our Asian samples, we were surprised to discover that all the Namwon, South Korea samples as well as one Sancheong, South Korea sample clustered tightly with the Kunming, China population, rather than with the rest of the Sancheong samples . As several sister species to D. suzukii with similar morphological appearances occupy the same geographic ranges , we performed a phylogenetic analysis using the mitochondrial COX2 gene sequence to evaluate species identity . Based on phylogenetic inference, we determined that the Namwon, South Korea samples; Kunming, China samples; and one Sancheong, South Korea sample may actually be D. pulchrella. For this reason, these samples were excluded from further analyses.As sampling was heavily concentrated in the U.S., we first conducted PCA and admixture proportion estimation on each broad geographic region separately before analyzing all populations together . Among the Eastern U.S. samples, PCA did not separate samples by state or latitude, and no distinct populations emerged in admixture plotting at multiple clustering values . Among the Western U.S. samples, both the first principal component and varying values of k for admixture proportions separates Hawaii from the other sample sites; however, higher values of k and principal components do not further partition the remaining Western U.S. samples. Thus, it appears there is likely no strong population structure in a north to south cline in the U.S. Using a similar approach, we see that in the European samples, collections from Ireland and Italy partition as separate clusters in the first PC and when k=2 in admixture plotting. We also observe that samples from Asia partition into Japan and South Korea, which is unsurprising as the Japanese samples originate from a lab population. We then used PCA to analyze all samples together to examine how differentiated invasive populations were from each other and from the ancestral Asian samples . As subtler signals can be obscured by unequal population sampling , large plastic pots for plants we also analyzed a reduced dataset by subsampling 5 individuals from each region . When using all samples, the first principal component separates Eastern and Western U.S. populations, with Asian and European samples in-between. Samples from Pelotas, Rio Grande do Sul, Brazil, appear more related to Eastern U.S. samples, although one individual clusters more with the Western U.S. flies. We also noticed that all samples collected from the Alma Research Farm , Georgia clustered with the Western rather than Eastern U.S. samples, despite two other Georgia sites nearby that followed the expected pattern.
The second principal component then separates the European samples. When the data is sub-sampled to 5 individuals per cluster, the first and second components strongly separate Hawaii and Japan, respectively ; this signal was likely obscured by the large number of U.S. samples when all samples are analyzed together but is expected as these two populations were lab strains and have likely experienced significant genetic drift relative to wild relatives. The observations made from PCA are largely recapitulated when using sub sampled data to estimate admixture at varying levels of k . At k=3, we observe Japanese and Hawaiian samples form their own clusters, while all the wild collections form a third cluster. As k is increased up to 7, we see the appearance of Europe, Brazil, Eastern U.S., and South Korea samples as their own clusters, before samples from Europe are split into Ireland and Italy at k=8. We notice increased variability in cluster assignment in the U.S. populations, particularly when sub sampling, which likely reflects the large sample size and high within-population diversity . However, analysis using all individuals still clearly support Eastern and Western U.S. samples as distinct genetic populations . In addition, we also see that the AR Georgia population again cluster with the Western U.S. As we were unsure if this could be the result of a very recent migration or mislabeled samples, we decided to exclude this population from further analyses. To further quantify the amount of differentiation present between regions, we estimated Fst values between regions using the 20 largest contigs, spanning all 4 chromosomes and covering 54% of the reference genome . As expected, Fst between Hawaii or Japan to any wild population was high . Irish and Italian populations had intermediate levels of differentiation with the other wild populations and with each other , while Fst values between Brazil, South Korea, and both U.S. clusters were lower . These groupings broadly match those observed from PCA .While PCA and admixture proportion estimates were able to identify population clusters, they are unable to provide more detailed depictions of population history or migration events. To estimate the population history of these invasive populations, we used Treemix to generate a population admixture graph with inferred migration events based on co-variance of allele frequencies between clusters, testing models allowing between 0 to 10 migrations . We found that the model using six migrations captured the most variance of the data . Residuals of the model at m=6 are within +/- 5 standard errors between populations, suggesting the model fits the data well, despite the variance of Hawaii with itself appearing less well modeled . The strongest signal of admixture was found in the Western U.S., with an estimated Hawaiian admixture proportion of 41.0% , and was also observed in most models . To formally test for admixture, we used the F3 admixture statistic in the form F3 where popX represents any third population, and found significantly negative values for all populations , strongly supporting admixture of Hawaii into the Western U.S. We also used the F4 statistic, using the form F4 such that a negative value supports “B” and “C” admixture, while a positive value supports “A” and “C” admixture, assuming no migration occurred between the outgroup and either A or B. Using either D. biarmipes or D. subpulchrella as the outgroup, the tests F4 and F4 were significantly positive , again supporting this admixture. Thus, the Western U.S. population sampled is composed of nearly equal ancestry from a Hawaiian ancestor and the common ancestor of the U.S./Brazil populations. As Treemix assigns the edge with smaller weight to be the “migrant” edge by default, it may be unidentifiable whether the Hawaiian ancestor or the U.S./Brazil common ancestor should be called the migration source. We also observed two countries with U.S. admixture in the m=6 model. Ireland had an Eastern U.S. admixture of 25.3% , although at varying values of “m” the source of this admixture fluctuates between the Eastern U.S., Brazil, or the Eastern U.S./Brazil ancestor. However, in all cases the admixture strength and significance remain consistent. While no F3 statistic support was found, the F4statistics and were significantly negative, supporting Ireland’s Eastern U.S./Brazilian and European ancestry. As the U.S./Brazilian admixture weight is much less than the European admixture weight, this was likely due to a migration event from the Americas into Irish populations. The other out-of-U.S. admixture event, from the Western U.S. to South Korea , was seen when m=6, 8, and 10. F3 statistics all have significantly negative values, and the F4 statistics and are significantly positive, supporting a Western U.S./South Korea admixture.