Some volatiles have been lost during domestication and breeding as a combined result of negative selection and linkage drag in tomato and watermelon . Likewise, gain and loss of terpene compounds during strawberry domestication and its genetic causes have been investigated . Recent advances in sequencing technology and analytical approaches have opened new opportunities to understand the chemistry and genetics of fruit flavor. Genome-wide association studies have revealed loci for flavor in a variety of fruit crops . Meanwhile, genomes-wide expression quantitative trait loci studies have the capability to bridge the gaps between GWAS signals and their underlying causative genes. Integration of GWAS and eQTL studies has led to discovery of a master metabolite regulator in tomato and a flesh-color-determining gene in melon . Long-read sequencing now allows assembly of genomes with high contiguity, and when coupled with parental short-read data , the two haplotypes of a heterozygous individual can be fully resolved. Phased assemblies have improved variant discovery, plant pot with drainage especially for large structural variants . The extent, diversity and impact of SVs increasingly are being studied in horticultural crops and have been shown to alter fruit flavor, fruit shape and sex determination .
Great opportunity exists to coherently integrate these multi-omics resources for the discovery of flavor genes. Garden strawberry is an allo-octoploid species with highly palatable non-climacteric fruit . It increasingly has been utilized as a model for Rosaceae fruit crops genomics and flavor research as a result of its short generation time, wide cultivation and high value. Through exploration of spatiotemporal changes in gene expression and homolog search, several flavor genes have been cloned and validated, including an alcohol dehydrogenase and several alcohol acyltransferases for esters, a nerolidol synthase 1 for terpenes and a quinone oxidoreductase for furaneol. Recently, QTL studies and transcriptome data analyses for strawberry volatiles using biparental crosses have detected QTL and causative genes for mesifurane and gamma-decalactone . Nevertheless, low mapping resolution and a lack of subgenome-specific markers have hampered further characterization of causal genes underlying other QTL. This problem recently was addressed by the development of 50K Fana SNP array using probe DNA sequences physically anchored to the octoploid ‘Camarosa’ genome . High heterozygosity combined with an allopolyploid genome presents difficulties for resolving causative genes and their haplotypes. To further the goal of discovering causative genes affecting flavor in strawberry, association studies with larger sample sizes and additional genetic resources such as eQTL and additional genomes are required.
Furthermore, these resources must span the breadth of natural variation in breeding germplasm. Here we present multi-omics resources consisting of an eQTL study representing the genetic diversity of strawberry breeding programs in the US, phased genome assemblies of a highly- flavored University of Florida breeding selection, a structural variation map in octoploid strawberry and a volatile GWAS of 305 individuals. These are combined to leverage the extensive metabolomic, genomic and regulatory complexity in strawberry for the discovery of natural variation in genes affecting flavor. Ultimately, the functional alleles identified will be selected in breeding to achieve superior flavor.The eQTL population consisted of 196 genotypes including 133 newly sequenced accessions . The University of Florida genotypes were grown at GCREC and collected in the spring of 2020 and 2021. The University of California-Davis collection of diverse selections from multiple breeding programs were grown at either Santa Maria CA or Oxnard CA, for day-neutral and short-day accessions, respectively, and collected in the spring of 2021. Four UC genotypes were collected at both sites to ensure sequencing and SNP quality. Total RNA was extracted from a bulk of three fully ripe fruits using a Spectrum™ Plant Total RNA Kit , after flash freezing in liquid nitrogen. Illumina 150-bp pair-end sequencing was performed on the Illumina NovoSeq platform by Novogene Co. . On average, 6.9 Gb of sequence data were obtained for each sample. Raw RNA-Seq data of 63 samples from previous published studies were retrieved from the NCBI SRA database . In order to quantify gene expression, short reads were trimmed for adapter sequences and low-quality reads with TRIMMOMATIC v.0.39 and aligned against the reference genome using STAR v.2.7.6a in the two-pass mode .
Only unique aligned reads were scored by HTSEQ v.0.11.2 in the union mode with the ‘–nonunique none’ flag supplied with the latest Fragaria_ananassa_v1.0.a2 annotation . All count files were compiled in R and normalized with the DESEQ package . To generate the marker dataset for eQTL mapping, SNPs and InDels were called using the mpileup and call commands. Markers were further hard-filtered using BCFTOOLS with the following steps: individual calls with lower than sequencing depth of three were set to missing using + setGT plugin; marker sites with quality < 30, missing rate > 0.3, heterozygous call rate > 0.98, minor allele frequency < 0.05, or number of alternative alleles > 1 were purged; the filtered markers were imported and analyzed in R, and only markers showing more than three matched calls in four duplicated sample pairs were retained. A total of 491 896 markers passed the three stages of filtering. The missing calls were imputed, and all calls were phased using BEAGLE v.5.2 using the default settings . The eQTL mapping was performed for 62 181 fruit expressed genes using the filtered markers. Linear mixed models implemented in GEMMA were used for association analysis . The relationship matrix was computed in GEMMA and supplied to explain relationship within populations, and the top five principal components with a total of 25.0% variance explained were imported as covariates to reduce effects from population stratification to signify the genetic variance underlying the target traits. The Bonferroni corrected 5% significance threshold was used, determined the by number of LD-pruned markers . The approach to define an eQTL was similar to that used in previous studies . Briefly, we first clustered all significant markers with distance < 100 kb and purged clusters with fewer than three markers. The lead marker with lowest P-value was used to identify the eQTL, and boundaries of eQTL were defined as the furthest flanking significant markers. Clusters in LD were merged and boundaries were updated. The longest distance between cis-eQTL boundaries and eGene boundaries was limitedto 500 kb. Because a substantial number of regulatory elements were found for fruit-expressed genes, a structural variant map would greatly facilitate the identification of potential causative SVs underlying the regulatory elements. To construct an SV map, pot with drainage holes we first assembled a phased genome of an UF accession. The genome of FL 15.89-25 was assembled into 1480 and 672 phased contigs with N50 of 12.8 and 12.4 Mb, respectively , with similar contiguity to other recent high-quality octoploid strawberry genomes . A Kmer-based approach revealed 97.1% and 99.2% completeness for the haploid assemblies based on parental Illumina short reads, which were corroborated by 98.1% and 98% completeness of the BUSCO eudicots odb10 genes . Phasing quality was evaluated by parent-specific Kmers; the average switching error and hamming error were 0.19% and 0.18% for the F12 haploid assembly , comparable to phased genomes in other species . The phased contigs were scaffolded into pseudochromosomes based on alignment to the ‘Camarosa’ reference genome, with 96.0% and 92.8% of phased contigs placed on 28 pseudochromosomes for the respective F12 and Bea haploid assemblies , consistent withprevious flow cytometry estimations . There were only 88 and 79 gaps in the final scaffolds, averaging 3.14 and 2.82 per chromosome for the respective F12 and Bea assemblies . Scaffolding quality was evaluated by a linkage map and public Hi-C data . High collinearity was observed between haplotypes . The FL 15.89-25 assemblies and three additional haploid assemblies were utilized to explore SV diversity in garden strawberry. These geographically and genetically diverse accessions empowered the discovery of SVs across all chromosomes except for a large portion of Chr 4B which may be under strong purifying selection .
Individual haplotypes had between 31 574 and 60 453 SVs relative to the PHASE1 assembly of ‘Royal Royce’ , with the WONG haplotype harboring the most SVs, consistent with the larger genetic distance of Asian populations to North American populations . Insertions and deletions were the most common SV types, together consisting of 88.3– 94.1% of SVs. All SVs across haplotypes werethen merged into a nonredundant set of SVs . In total, 56 342 deletions, 60 983 insertions, 12 016 translocations, 166 interspersed duplications, 236 tandem duplications and 137 inversions were identified. Unlike the SV composition of a tomato population in which the majority of SVs were singletons , an average of 62.6% strawberry SVs were shared by at least two haplotypes . We observed a gradually reduced number of new SVs every time a new haplotype was merged , suggesting this SV map surveys a substantial portion of SV diversity in cultivated strawberry. The majority of SVs were < 1 kb , whereas only 3.3% were > 10 kb . Structural variations were present extensively in exons , introns and promoter regions . Transposable elements were rich resources of SVs. We identified 34 379 deletions overlapped with TEs, especially inverted tandem repeats and long terminal repeats , consisting of 61.0% of total deletions, significantly higher than the genome-wide TE percentage of 38.42% .In order to investigate the genetic control of fruit volatiles, we performed volatile phenotyping and SNP array genotyping with 49 330 markers on a panel of 305 accessions from the UF strawberry breeding program, with 59 individuals overlapped with the eQTL panel . A total of 97 volatiles including esters, terpenes, aldehydes, alcohols, acids, ketones and lactones were quantified . Based on relationships among volatiles, we identified at least five clusters belonging to the same chemical class or biosynthetic pathway, including clusters of eight aldehydes, three ethyl esters, three hexanoic acid derivatives, seven medium-chain esters and three terpenes . Generally high narrow-sense heritability was observed across volatiles , ranging from 0.212 to 0.916, with a mean of 0.660. The highest value of h2 was found for mesifurane and the lowest for octanoic acid, ethyl ester . The lead SNP effects varied from 0.27 to 2.44 , with the largest effect for methyl anthranilate . Two hotspots which contained multiple signals of volatiles belonging to the same class or pathway were found for medium chain esters and for terpenes , which also were detected to in previous studies and reflected in chemical relationships . Our GWAS results confirmed previous homoeologous group assignments for these volatile QTL and clarified their subgenome and physical positions. The SNP AX-166515537 was the lead SNP for three esters, and a 14 Mb region on Chr 6A shared signals for six medium-chain esters. An LD analysis revealed three linkage blocks . The distal region of Chr 3C was associated with six volatiles including five terpenes . This 3.1-Mb region did not display clear LD block separation . Two significant markers for medium-chain ester hotspot and methyl thiolacetate were tested for their predictability of flavor characteristics . Some abundant volatiles including: 2-hexenal, -; butanoic acid, 2-methyl-; and pentanal were associated with multiple DNA variants , suggesting polygenic inheritance. Pentanal was associated with threeIn this study we leveraged eQTL, GWAS and haplotype-resolved genome assemblies of a heterozygous octoploid to identify allelic variation in flavor genes and their regulatory elements. Finetuning of metabolomic traits such as amylose content in rice and sugar content in wild strawberry recently were made possible via CRISPR-Cas9 gene-editing technology. Similar approaches can be taken in cultivated strawberry for flavor improvement, but not before the biosynthetic genes responsible for metabolites production and their regulatory elements are identified. Our pipeline has proven to be effective in identification of novel causal mutations for flavor genes responsible for natural variation in volatile content and can be further applied to various metabolomic and morphological aspects of strawberry fruit such as anthocyanin biosynthesis , sugar content and fruit firmness. These findings also will help breeders to select for genomic variants underlying volatiles important to flavor. New markers can be designed from regulatory regions of key aroma volatiles, including multiple medium-chain volatiles shown to improve strawberry flavor and consumer liking , methyl thioacetate contributing to overripe flavor and methyl anthranilate imparting grape flavor . In the present study, a new functional HRM marker for mesifurane was developed and tested in multiple populations . These favorable alleles of volatiles can be pyramided to improve overall fruit flavor via marker assisted selection. Strawberry also shares common volatiles with a variety of fruit crops.