The last important component for the design of a FAIR compliant sustainable information system will be that it is useful to a large group of diverse users. Like the data producers, users also have an important contribution to make in specifying the data models, the goals of the repositories and of the whole GrapeIS infrastructure. Data users can be very diverse and the priority of the IGGP are the researchers in the field of plant biology in public institutions or in private companies, breeders from the public and the private sector, engineers from extension services for grape and wine production, teachers and students. Some data can also be of interest for growers or for the general public and the GrapeIS initiative might in time help as well to transfer more of the knowledge produced by the scientific community to a broader public. Again, the IGGP international consortium will have an important role in organizing two-way interactions between all the stakeholders of the initiative: users, partners building the GrapeIS and funding agencies.Grapes , one of the first domesticated perennials, originated in the Near East 5000–8000 years ago1 and remain an economically and culturally important crop. In 2015, grapevines covered 7.5 million hectares and produced 76 million tons of grapes globally.Over the past millennia, human selection for traits of interest, especially those important to fruit production, have shaped the appearance of grapes. In particular, selection for hermaphroditic flowers increased grape production, as propagating both male and female plants was no longer required. While nearly half of all grapes grown are vinified into wine, 36% are consumed fresh and the rest are dried or used for juice.Desirable berry traits differ depending on the use of the grapes, and, thus,cultivo de frambuesas the different breeding targets for table and wine grapes have led to differences in berry and bunch size.
There is also evidence of selection for white berry color.While grape breeding has resulted in selection for several traits over the past millennia, current consumer preference is focused on a small number of elite cultivars. As a result, most grape cultivars have been grown for centuries—such as ‘Pinot Noir’, which has existed for more than a millennium—using vegetative propagation. These genetically frozen cultivars are highly susceptible to continually evolving pathogens.Selection for new traits, including disease resistance, is a slow and expensive process in grapes. Breeding of new grape cultivars is hindered by high inbreeding depression as well as a lengthy juvenile phase lasting 3–5 years. Even after fruit production, additional time may be required to assess traits important for wine production.Fortunately, using genetic markers linked to phenotypes of interest can decrease the time required to develop new cultivars by up to 10 years.In addition, recent work estimated that use of marker-assisted selection in grapes offered a cost-saving of 16–34%.The ability to save time and money when breeding makes grapes an attractive candidate for MAS.7.Using genetic markers, individuals can be tested for a trait at the seed or seedling stage. Thus, MAS offers the greatest potential for traits that are difficult and expensive to phenotype, such as disease resistance, or time-consuming to measure, such as fruit traits only visible after several years.Wild Vitis relatives have been previously used for hybrid grape breeding and are a promising source of resistance loci for introgression through MAS.For example, V. arizonica was used in the development of Pierce’s disease-resistant wine grapes,while Muscadinia rotundifolia was used to pyramid resistance from both powdery and downy mildew into V. vinifera. Markers have also been identified for many other traits in grape including berry color,flower sex,seedlessness and muscat aroma.
The discovery of markers for agriculturally important traits has facilitated the use of MAS in grapes; however, the technique is only worthwhile when the cost of phenotyping is higher than the cost of discovering new markers and genotyping cultivars.Decreasing DNA sequencing costs will continue to accelerate both marker discovery and the implementation of MAS in grape breeding. While sequencing costs have decreased, phenotyping remains a slow and expensive process.Fortunately, historical phenotype information already available in gene banks can be linked with genomic information for genetic mapping of important traits. The ability to leverage historical data from gene banks for genetic mapping has previously been demonstrated in potato,barley and apple.Similarly, in grape, years of phenotype information may already be available and exploitable for the purposes of genetic mapping. Unfortunately, standardized data formatting and annotation are not yet widely adopted in grapevine and remain an essential goal.To investigate the history of selection in grape, as well as the future potential of MAS, we evaluated associations between phenotypes and 6114 genome-wide single-nucleotide polymorphisms in 580 V. vinifera accessions from the United States Department of Agriculture grape germplasm collection. We report several significant genome-wide association study results, demonstrate the use of signatures of selection as complementary to GWAS and find that phenotype relationships as well as patterns of genetic variation have been shaped by human culture and geography.Phenotype data were downloaded from the USDA Germplasm Resources Information Network . Only accessions reliably identified as V. vinifera in Myles et al.were included. Measurements for flower sex were combined across years and samples with discordant values for flower sex across years were removed.
Additional phenotype data including skin color, berry length, berry width, berry size and cluster density were collected as part of the present study. Our cluster density measures were merged with measurements available from GRIN, and when discrepancies between measurements existed, those values were removed. In some cases, phenotype data were recoded to facilitate genetic mapping. A complete description of the phenotype data, including recoding procedures, is available in Supplementary Table S1. Phenotype data were only included in downstream analyses if measurements existed for at least 100 accessions, resulting in a final set of 33 unique phenotypes. While 2 years of data were available for four phenotypes, the correlation of trait values between years was often poor. The correlation between years was estimated using Pearson’s correlation for binary and quantitative phenotypes and Kendall’s rank correlation for ordinal phenotypes . For clarity, when 2 years of data were available for a given phenotype, data from the year with the greater sample size were included in the main portion of the manuscript. However, results for each year are presented separately in the Supplementary Material. Pairs of accessions were considered to have a clonal relationship if π^ , calculated using PLINK,28 exceeded 0.95. To avoid pseudoreplication, for each phenotype only the accession from a clonal group with the least amount of missing genotype data was included in downstream analyses. However, the accession’s phenotype was calculated as the average across all accessions within its clonal group. A Box–Cox transformation was applied to quantitatively measured traits when the distribution of observed values differed significantly from normality. The untransformed and transformed distributions for each phenotype are shown in Supplementary Figure S1. The phenotype distribution for ordinal traits is shown in Supplementary Figure S2. For binary traits, the majority phenotype was used instead of the mean when combining clones, and the distributions of these phenotypes are shown in Supplementary Figure S3. After all filtering steps, the final data set consisted of 33 phenotypes scored across 580 accessions and genotyped for 6114 SNPs.The correlations between all pairwise phenotype comparisons were computed using R v3.2.0.29 Correlations between binary/binary,maceta 40 litros quantitative/quantitative and quantitative/binary phenotype pairs were tested using Pearson’s correlation. Correlations between quantitative/ordinal and binary/ordinal phenotype pairs were tested using Spearman’s rank correlation coefficient. Finally, correlations between ordinal/ordinal phenotype pairs were tested using Kendall’s rank correlation. To correct for multiple comparisons, a Bonferroni correction was applied by multiplying P-values by the number of pairwise comparisons . Accessions were divided based on use as well as geographic origin . The East geographic region includes the Middle East as well as Russia, while the Central region includes Eastern Europe including Serbia, Hungary and Greece. Finally, the West region includes Western Europe such as France, Italy and Germany. A full list of the geographic origin of V. vinifera accessions in the USDA collection can be found in Myles et al.26 For each phenotype, we tested whether accessions with different uses and geography differed. We used a Fisher’s Exact test for binary phenotypes, a Mann–Whitney U-test for ordinal phenotypes and quantitative phenotypes. For the Fisher’s Exact test, we report the odds ratios. For the Mann–Whitney U-test, we report the W-test statistic. P-values were Bonferroni-corrected for multiple comparisons and all analyses were performed in R.Before assessing population structure, the genotype data were pruned for linkage-disequilibrium using PLINK by considering a window of 10 SNPs, removing 1 of a pair of SNPs if LD40.5, and then shifting the window by 3 SNPs and repeating the procedure. Principal component analysis was performed on the resulting 3196 SNPs genotyped in 580 accessions using the smartpca program in the EIGENSOFT package.
To investigate the degree to which population structure accounts for phenotypic variance within V. vinifera, we conducted linear regression for continuous and ordinal phenotypes, and logistic regression for binary phenotypes using trait values as response variables and eigenvalues for the first 10 principal components as predictors. McFadden’s pseudoR2 was calculated for logistic regression using the ‘pscl’ package32 in R v3.0.1. We define the phenotypic variance explained as the R2 of these models, for PC1, PC2 and PCs 3–10.Phenotypes that are strongly correlated with population structure are more likely to have been targeted by selection. Moreover, as population structure is a confounding effect in GWAS, phenotypes strongly correlated with population structure can be problematic to map using association mapping. We therefore examined the degree to which each phenotype is correlated with population structure. We found that the proportion of the phenotypic variance explained by genetic PCs 1 through 10 ranged from 2 to 43% across phenotypes . Most notably, PC1 explained a large proportion of the variance for berry shape and size measurements. This relationship is expected, given that PC1 is significantly correlated with berry size and all berry size and shape measurements are significantly correlated with each other . These observations suggest that selection for table grapes in the East and wine grapes in the West has resulted in berry size being strongly correlated with the overall genetic structure of grapes. In addition to berry traits, the only other phenotype for which the first 10 genotypic PCs explain over 30% of the phenotypic variance is seedlessness . In contrast to berry phenotypes, only a small proportion of the variance in seedlessness is explained by PC1 . Instead, PCs 3–10 explain 31% of the total variance. Seedlessness is a valued trait in commercially grown table grapes.A single grape cultivar ‘Sultanina’ is a primary source of seedlessness in table grapes and is a parent of many commercial seedless table grape varieties.Consistent with these observations, previous work on the accessions studied here found that Sultanina has 28 first-degree relationships with other accessions in our dataset.The repeated use of ‘Sultanina’ in the breeding of seedless accessions, and the resulting high degree of relatedness among all seedless accessions, is a likely contributor to the correlation between seedlessness and population structure observed here. An extension of using PCs to explain phenotypic variance is to perform genomic prediction, which uses all markers to predict phenotypes. Especially for complex traits controlled by numerous small effect loci, genomic prediction is emerging as a powerful tool in genomics-assisted breeding.Using fivefold cross-validation, we calculated prediction accuracies for all phenotypes . Prediction accuracies range from 0.10 for leaf size to 0.76 for berry length. We detected the highest prediction accuracies for phenotypes describing berry traits including berry length , size , shape , width , skin color , weight and firmness . These prediction accuracies are slightly higher than those previously reported in apple and rice, which had a maximum value of 0.55 for harvest season and 0.63 for flowering time, respectively.Complex quantitative traits such as those describing berry shape and size are better targets for improvement through genomic prediction than from single marker MAS. A genomics-assisted breeding scheme in which both MAS and GS are incorporated has been proposed in apple and may be a viable option in order to select for both monogenic and polygenic traits in grape.Finally, similar to previous work in apple by Migicovsky et al., genomic prediction accuracy was also highly correlated with the proportion of phenotypic variance explained by genetic PCs 1–10 .