Protein sequences from Arabidopsis thaliana, Actinidia chinensis, and UniprotKB plant database were also used as evidence for genome annotation. We predicted a total of 128,559 protein-coding genes. Benchmarking Universal SingleCopy Orthologs analysis v.3 was performed to assess the completeness of the assembly and qual-ity of the genome annotation. The annotated gene set contains 1,394 out of 1,440 BUSCO genes . Functional annotation was assigned using Basic Local Alignment Search Tool 2GO to reference pathways in the Kyoto Encyclopedia of Genes and Genomes database. Comparative genomic analyses assigned genes to 16,909 orthogroups shared by six phylogenetically diverse plant species including five eudicots , each with distinct fruit types, and Zea mays as the outgroup. Transposable elements , both Class I and II, were identified and classified in the genome using the protocol described by Campbell et al.. Overall, 44.3% of the blueberry genome is composed of TEs . Consistent with previous reports, the most abundant Class I TEs were long terminal repeat retrotransposons , specifically the superfamily LTR/Gypsy followed by LTR/Copia, while for Class II transposons, bato bucket the miniature inverted repeat superfamily hAT was the most abundant. The quality of the genome was further assessed by examining the assembly continuity of repeat space using the LTR Assembly Index deployed in the LTR retriever package.
The adjusted LAI score of this blueberry genome is 14, and based on the LAI classification, this score is within the range of ”reference” quality . Estimation of the regional LAI in 3 Mb sliding windows also showed that assembly continuity is uniform and of high quality across the entire genome.The origin of highbush blueberry from either a single or multiple diploid progenitor species is a long-standing question. Previous reports have suggested that highbush blueberry may be an autotetraploid based on the segregation ratios of certain traits. However, an analysis of chromosome pairing among different cultivars revealed largely bivalent pairing during metaphase I, similar to patterns observed in known allopolyploids. To gain further insights into the polyploid history of highbush blueberry, we calculated sequence similarity and synonymous substitution rates between genes in homoeologous regions across the genome. The average sequence similarity is ∼96.3% among syntenic homoeologous genes. The average Ks divergence between syntenic homoeologous genes is ∼0.036 per synonymous site. The average Ks divergence between homoeologous genes can be used to not only identify polyploid events but also to estimate the divergence of the diploid progenitors from their most recent common ancestor. The Ks divergence between homoeologs in highbush blueberry is six times higher than that between orthologs of two A. thaliana lines that diverged roughly 200,000 years ago. Based on the relatively high Ks rate between homoeologous regions across the genome, this suggests that tetraploid blueberry is unlikely an autopolyploid that was formed from somatic doubling or failure during meiosis involving a single individual .
Furthermore, comparative genomics revealed that homoeologous regions are highly collinear, except a few notable chromosome-level translocations . These translocations were manually inspected and verified with both the raw sequence and Hi-C data. Rapid changes among homoeologous chromosomes is known to occur in newly formed allopolyploids. We also assessed the level of similarity and content of LTR transposable elements among the four haplotypes. As the most prevalent transposable elements in plants, LTR-RTs undergo continual ”bloat and purge” cycles within most plant genomes, resulting in a unique signature that may distinguish subgenomes in an allopolyploid. To examine the evolutionary history of LTR-RTs in the highbush blueberry genome, we calculated the mean sequence identity of LTR sequences among each of the four haplotypes . This analysis revealed that the majority of more recent LTRs are subgenome specific in highbush blueberry. In other words, the data suggest that LTRs proliferated independently in the genomes of each diploid progenitor , following the divergence from their MRCA, but prior to polyploidy. The pair-wise LTR difference of the two ancestors is 2.4%–2.6%. With Jukes-Cantor correction and synonymous substitution rate of , the estimated time of divergence for the diploid progenitors from their MRCA is between 0.94 to 1.02 million years ago. These date estimates and the average speciation rate for temperate angiosperms suggests that highbush blueberry is either an allopolyploid derived from two closely related species or an autopolyploid derived from the hybridization of two highly divergent populations of a single species.
To date the most recent polyploid event in highbush blueberry, we analyzed the unique LTR insertions present in each haplotype. Based on the pair-wise LTR difference between the four haplotypes, which is of 0.81%–0.89%, the polyploid event occurred approximately 313 to 344 thousand years ago. The substitution rate of LTR sequences is likely different from that of protein coding genes. Thus, more accurate date estimates will be possible once the LTR substition rate in highbush blueberry becomes available from future studies. After allopolyploidization, one of the parental genomes often emerges with significantly greater gene content and a greater number of more highly expressed genes. The emergence of a dominant subgenome in an allopolyploid is hypothesized to resolve genetic and epigenetic conflicts that may arise from the merger of highly divergent subgenomes into a single nucleus. However, classic autopolyploids, formed by somatic doubling, are not expected to face these challenges or exhibit subgenome dominance since all genomic copies were contributed by a single parent. This was recently supported by genome-wide analyses of a putative ancient autopolyploid . It’s important to note that subgenome expression dominance could still be observed in intraspecific hybrids and autopolyploids formed by parents with highly differentiated genomes. To explore this in highbush blueberry, we compared gene content and expression-level patterns between homoeologous chromosomes . While gene content levels were largely similar among homoeologous chromosomes, with a few notable exceptions , gene expression levels were highest for one of the four chromosome copies in the majority of gene expression libraries . Noteworthy, in the three fruit libraries, the most dominantly expressed often became the least expressed among the four homoeologous chromosomes or among the two lowest expressed copies . The most dominantly expressed in other tissues remained so in developing fruit for only two of the chromosomes . These homoeologous chromosome sets have undergone the most structural variation, which may have modified gene expression patterns . These analyses are based on a single biological replicate from a plant grown in a growth chamber. Thus, the findings reported here should be considered as preliminary. Future studies should further explore subgenome expression dominance in highbush blueberry, including at the individual homoeolog level, with additional biological replicates and across multiple environments.The progression of fruit development in blueberry is marked with visible external and internal morphological changes including in size and color . We profiled gene expression in fruit across seven developmental stages from the earliest stage through the final stage to identify genes differentially expressed during fruit development. Distinctive transitions in gene expression were observed between early fruit growth to start of color development and complete color change to ripened fruit. We found that the majority of genes upregulated during early fruit development were involved in phenylpropanoid biosynthesis, nitrogen metabolism, as well as cutin, suberin, and wax biosynthesis . In contrast, dutch bucket hydroponic genes involved in starch and sugar metabolism were highly expressed at the onset of and during fruit ripening . Moreover, principal component analysis showed the first two components accounted for 84% of the variation and separated the developmental stages into three groups: early developmental stages, petal fall and small green fruit; middle developmental stages, expanding green and pink fruit; and ,late developmental stages, complete fruit color change, unripe and ripe fruit . Genes associated with cell division, cell wall synthesis, and transport were found to be expressed the highest during the earliest developmental stages , which is consistent with previous work on other fruit species. In addition to genes regulating cell proliferation, defense response-related genes were also highly upregulated during the earliest developmental stages.
During the middle developmental stages, genes regulating cell expansion, seed development, and secondary metabolite biosynthesis were highly expressed. During late developmental stages and as the berry transitions to ripening, late embryogenesis, transmembrane transport, defense, secondary metabolite biosynthesis, and abscisic acidrelated genes were highly over represented. Blueberry is considered a climacteric fruit; however,unlike the ethylene-driven fruit ripening in other climacteric species, abscisic acid has been demonstrated to regulate fruit ripening in blueberry. In summary, global gene expression patterns mirror the morphological and physiological changes observed during blueberry development . We assessed the total antioxidant capacity in mature fruit across a blueberry diversity panel and the abundance of secondary metabolites responsible for its antioxidant activity in developing fruit. A diversity panel, composed of 71 highbush blueberry cultivars and 13 wild Vaccinium species, was evaluated for total antioxidant capacity in mature fruit using the oxygen radical absorbance capacity assay. Similar to previous reports, we observed a wide range in antioxidant capacity across cultivars, with ”Draper” having the highest levels of antioxidants . The observed variation in antioxidants among highbush blueberry, consistent with our results, were previously shown not to correlate with fruit weight or size. However, in another study, a correlation between fruit size and total anthocyanin levels was identified within a few select highbush blueberry cultivars but not across other Vaccinium species or blackberry. This inconsistency is likely due to sample size differences between studies. To further examine the antioxidant capacity in ”Draper” during fruit development, fruits from the seven aforementioned fruit developmental stages were assayed for antioxidant levels . The highest level of antioxidants was observed at the earliest ”petal fall” stage after which, the level of antioxidants declined during the middle and late developmental stages. This is consistent with previous reports on the antioxidant activity in blueberry during fruit maturation and similar to observations in blackberry and strawberry, wherein green fruit have the highest ORAC values. The antioxidant capacity in blueberry is influenced by various metabolites including anthocyanins. Using the same fruit development series, we quantified anthocyanin and flavonol aglycones in ”Draper” using liquid chromatography-mass spectrometry . Overall, as the fruit changed its exocarp color from pink to dark blue during ripening, delphinidine-type anthocyanins started to accumulate and were the most abundant compound in ripe fruit followed by cyanidin, malvidin, and petuni-din . Flavonols were also detected in all developmental stages, with quercetin glycoside being the most abundant , while myricetin glycoside and rutin were present at very low levels. Blueberry also has high levels of phenolic acids; among phenolics, chlorogenic acid was the most abundant. High levels of CGA were observed throughout fruit development, with the highest accumulation detected in young fruits . This correlates with the pattern of antioxidant capacity across different fruit stages, suggesting that CGA is one of the major metabolites contributing to high ORAC values in young developing fruit. CGA is derived from caffeic acid and quinic acid and has vicinal hydroxyl groups that are associated with scavenging reactive oxygen species. The antioxidant properties of CGA have been associated with preventing various chronic diseases.To better understand the biosynthesis of antioxidants in blueberry fruit, we identified homologs of previously characterized genes in other species involved in ascorbate, flavonols, chlorogenic acid, and anthocyanin biosynthesis. The key biosynthetic genes for these compounds exhibited a distinct developmental-specific pattern of expression . For example, genes involved in the conversion of leucoanthocyanidins to proanthocyanidins are highly expressed in the earliest and middle developmental fruit stages but not in ripening fruit . Conversely, genes involved in the conversion of leucoanthocyanidins to anthocyanins were highly expressed in mature and ripe fruit but not during early fruit developmental stages . Additionally, paralogs encoding the same anthocyanin pathway enzymes and genes involved in vacuolar localization of proanthcyanidins and aldehydes -2-hexenal, -2-hexenol, -3-hexenol. Both linalool and geraniol are associated with sweet floral flavor. However, linalool was reported to largely impart the characteristic blueberry flavor when combined with certain aldehydes. Here, we also identified and examined the expression of genes involved in the biosynthesis of linalool. Four of the linalool synthase homologs in tetraploid blueberry are highly expressed during late fruit development . This pattern of expression coincides with previous reports of linalool accumulation in ripened blueberry fruit. On the other hand, one homolog of linalool synthase, although it was expressed during fruit growth, did not show a clear fruit development-specific pattern. Investigating the underlying factors regulating these enzymes will facilitate genetic manipulations that may lead to further improving blueberry flavor in the future.