Cultivar differences in berries are often ascribed to differences in aroma compounds

One focused on expression at one similar sugar level in both locations. Another identified a common set of genes whose transcript abundance changed in both locations. And the third one was a more comprehensive network analysis using all of the sugar levels and the two locations. We chose two very similar sugar levels to determine the differential gene expression between the two locations, since sugar levels were not exactly the same at harvest. We identified 5528 differentially expressed genes between the two locations in approach 1 at the sugar level closest to the 22°Brix level using DESeq2. DEGs will refer to this set of differentially expressed genes throughout this manuscript. Gene set enrichment analysis using topGO determined the top gene ontology categories for biological processes for these 5528 genes . Based on the number of genes identified, the top GO categories were cellular metabolic process , biosynthetic process , and response to stimulus . Other important and highly significant categories were response to stress and developmental process . There were 910 GO categories in total that were significantly enriched . The relationship between the top 25 GO categories can be seen in Additional file 4. We use the term “significantly” throughout this text to mean statistically significant at or below a padj-value of 0.05. Amongst the top stimulus subcategories with the largest number of genes were response to abiotic stimulus , flower bucket response to endogenous stimulus , response to external stimulus , and biotic stimulus .

Some other significant environmental stimuli GO categories included response to light stimulus , response to osmotic stress , and response to temperature stimulus . In approach 2, we examined which gene expression was changing with °Brix level in both locations to identify a common set of genes differentially expressed during berry development with very different environmental conditions. The significant differences in transcript abundance in each location was determined with DESeq2 using the lowest °Brix sampling as the control. For example, the control sample in RNO was the lowest sugar sampling at 20 °Brix; the transcript abundance of the three higher °Brix samplings were compared to the transcript abundance of the control. The genes that had significantly different transcript abundance relative to control in at least one of the comparisons were identified in RNO and BOD. These gene lists were compared and the common gene set consisting of 1985 genes for both locations was determined . Comparing this common gene list to the DEGs from approach 1 identified 907 genes that were common to both sets, indicating that this subset was differentially expressed between the locations at 22°Brix. The other 1078 genes did not differ significantly between locations. This 1078 gene subset list can be found in Additional file 5 . The GO categories most enriched in this gene set included response to inorganic substance, response to abiotic stimulus and drug metabolic process.

In approach 3, using a more powerful approach to finely distinguish the expression data for all sugar levels, Weighted Gene Coexpression Network Analysis identified gene sets common to and different gene expression profiles between BOD and RNO. All expressed genes for all °Brix levels were used in this analysis. Additional details of the analysis are described in the Materials and Methods section. Twenty-one modules or gene subnetworks were defined and a heat map was generated displaying the module-trait relationships . The grey module is not a real module but a place to put all genes not fitting into a real module; thus, it was not counted as one of the twenty-one gene modules above. Eight modules had similar gene expression profiles for BOD and RNO ; these included cyan, midnight blue, pink, green yellow, salmon, blue, grey60, and royalblue. This gene set consisted of 8017 genes . Comparing this common gene set from the WGCNA with the DEGs from approach 1 revealed that 524 genes in common were found in both sets. This subset was removed from the WGCNA to produce a gene list of 7492 common to both locations and not differing in their transcript abundance at 22°Brix . This represents 25% of the total 29,929 genes in all of the modules. This gene set was compared with the ap2-ap1 gene set from approach 1 and 845 genes were found in common in both sets. The remainder from ap2-ap1 provided an additional 232 genes to the common set of genes from ap3-ap1 not affected by location giving a total number of 7724 genes, representing 25.8% of the genes expressed. This gene set is listed in ap2-ap1_union_ ap3-ap1 tab in Additional file 5.

The GO categories most enriched in this gene set included general categories such asorganic substance bio-synthetic process and organelle organization. There were 785 enriched GO categories in total. In approach 3, further analysis of the gene modules using gene set enrichment analysis was performed with genes that had a kME > 0.80 for each module in the WGCNA . The similar gene sets in common with both locations with decreasing transcript abundance as sugar levels increased were enriched with the GO categories involving growth and water transport , and translation . The common gene sets with increasing transcript abundance as sugar levels increased were enriched with the GO categories involving gene silencing , aromatic compound metabolism , organic substance catabolism , and DNA metabolism . Most modules were positively or negatively correlated with BOD and RNO berries . The turquoise module was the largest module and consisted of 5029 genes; it had the most positive and negative correlations for BOD and RNO, respectively . This gene set was similar to the DEGs defined by DESeq2 with the largest differences between BOD and RNO at 22°Brix. Gene set enrichment analysis of genes within the turquoise module having a kME of 0.80 or higher revealed many common GO categories with the DEGs ; 81% of the GO categories from the turquoise module subset were also found in the 910 GO categories of the DEGs . Some of the most enriched GO categories in the turquoise module were organic acid metabolism, flavonoid metabolism, lipid biosynthesis, response to abiotic stimulus, isoprenoid metabolism, response to light stimulus and photosynthesis. The gene expression profiles of this module declined in transcript abundance with increasing sugar levels . The yellow module was another large module that was the second most positively correlated with BOD. This module was highly enriched with GO categories involving biosynthesis, defense responses and catabolic processes. The WRKY75 gene was in the top 4 hub genes in the yellow module . WRKY75 is a transcription factor that positively regulates leaf senescence. It is induced by ethylene, ROS and SA and is a direct target of EIN3. The green and brown modules were also large modules that were most positively correlated with RNO . The green module was highly enriched in the GO category involving response to chemical. The brown module was highly enriched in GO categories involving multiple catabolic processes. Thus, the WGCNA results, square flower bucket which utilized all of the expressed genes from all °Brix levels were consistent with the DESeq2 results that only compared transcript abundances at 22°Brix or between locations. The WGCNA results were more comprehensive and complemented the results of approaches 1 and 2 by identifying hub genes and gene subnetworks. These subnetworks were linked and further defined by their highly correlated coexpression profiles and enriched GOs.As a first approach to examining the 5528 DEG dataset, differences between the transcript abundance in berries with the lowest and highest °Brix levels at the two locations were determined . Eight examples of the many DEGs with the largest transcript abundance differences from BOD and RNO , HB12 , BSMT1 , HAD , STS24 , NAC073 , TPS35 , and MAT3 were selected and are presented in Fig. 2. The major point of showing this plot is to highlight the general trends of continuously increasing transcript abundance with sugar levels for these genes; half of these genes start at similar transcript abundance levels around 20°Brix for both BOD and RNO berries and increase in transcript abundance at a higher rate in BOD grapes as sugar levels increase. The other half increased at approximately the same rate for both locations but had higher amounts at the BOD location at the same sugar levels. These data were fitted by linear regression to compare the slopes of the lines. The slopes were significantly higher for EXL2, BSMT1, STS24, and TPS35 for BOD as compared to RNO berries, but not for the other four genes .

The significantly increased rate of change for transcript abundance in the BOD berries indicated that the berries in BOD may have ripened at a faster rate relative to sugar level. To get deeper insights into these dynamic gene sets from BOD and RNO, gene set enrichment analysis of the top 400 DEGs with the greatest increase in transcript abundance from the lowest sugar level to the highest sugar level was performed for each location. The top 400 DEGs for BOD berries were highly enriched in biosynthetic processes involving amino acid and phenylpropanoid metabolism; defense responses, response to fungus and response to ethylene stimulus were other highly enriched categories. The top 400 DEGs for RNO berries were enriched in response to oxygen-containing compound, response to hormone and response to abscisic acid .Eight examples of DEGs with the greatest decrease in transcript abundance with increasing sugar levels are presented in Fig. 3. They include lipid and cell wall proteins and an aquaporin . The data were fitted to linear regressions and the slopes statistically compared between BOD and RNO berries. In some cases, the slopes of the linear regression lines of the DEGs were not statistically different ; but in the other cases presented, there were similar amounts of transcript abundance around 20°Brix, but there were significantly different slopes. Again, there is a trend for the transcript abundance of many of these genes to change more significantly in BOD berries than RNO berries relative to sugar level.Berry ripening involves autophagy. Generally, there was an increase in the transcript abundance of genes involved in autophagy as sugar level increased and the transcript abundance in BOD berries was higher relative to RNO berries at the same sugar level . These trends are consistent with the hypothesis that BOD berries ripened faster than RNO berries relative to sugar level.Many aroma and flavor-associated compounds are synthesized in the late stages of berry ripening and sensitive to the environment. The major metabolic pathways affecting flavor and aromas in grapes and wines include the terpenoid, carotenoid, amino acid, and phenylpropanoid pathways. These pathways were identified by topGO to be highly enriched in the DEGs and the turquoise module. Some of the DEGs are involved in these pathways and will be presented in the following subsections. One of the main classes of cultivar specific aroma compounds is the terpene group. The transcript abundance of a number of terpene synthases were higher in BOD berries as compared to RNO berries . All but one of these increased in transcript abundance with increasing sugar levels. TPS35 is a β- ocimine synthase . TPS08 is a γ-cadinene synthase, TPS26 is a cubebol/δ-cadinene synthase, TPS4 and 10 are -α-bergamotene synthases, and TPS07 and 28 are germacrene-D synthases; these enzymes produce sesquiterpenes found in essential plant oils . TPS55 is a linalool/nerolidol synthase which synthesizes acyclic terpene alcohols; linalool contributes significantly to the floral aromas of grape berries and wines. TPS68 is a copalyl diphosphate synthase involved in diterpenoid biosynthesis and TPS69 is an ent-kaurene synthase. Both TPS68 and 69 are diterpene synthases and are part of the ent-kaurene biosynthesis pathway.Carotenoid metabolism is another biosynthetic pathway that contributes to flavor and aroma in grapes. There are a number of key genes that contribute to terpenoid and carotenoid metabolism that have a higher transcript abundance in BOD berries as compared to RNO berries. For example, DXR catalyzes the first committed step and HDS enzyme controls the penultimate steps of the biosynthesis of isopentenyl diphosphate and dimethylallyl diphosphate via the methylerythritol 4- phosphate pathway. Other examples are two phytoene synthases , g180070 and g493850; PSY is the rate-limiting enzyme in the carotenoid biosynthetic pathway.Amino acids contribute to the aroma and flavor of grapes and wines.