Several additional wild octoploid subspecies have since been used as parents in breeding, creating an admixed population of F. × ananassa individuals with genomes that are mosaics of phylogenetically and demographically diverse progenitor genomes. While several subgenome origin hypotheses have emerged from cytogenetic, phylogenetic, and comparative genetic mapping studies, a complete hypothesis for the origin and evolution of the octoploid genome was only recently proposed with the publication of the “Camarosa” reference genome. Through phylogenetic analyses of the transcriptomes of all described extant diploid species, including four subspecies of Fragaria vesca, the putative subgenome donors found in the octoploid were identified as F. vesca subsp. bracteata, Fragaria iinumae, Fragaria viridis, and Fragaria nipponica. Edger et al. provided strong support for earlier hypotheses that F. vesca and F. iinumae were two of the four subgenome donors. Until the octoploid reference genome was published, plastic nursery plant pot the origin of the other diploid subgenome donors had remained unclear, although multiple hypotheses had been proposed.
Liston et al. then reasoned that Edger et al. may have misidentified two of the progenitors due to bias from excluding in-paralogs in their phylogenetic analyses. To address this concern, Edger et al. developed a chromosome-scale assembly of the F. iinumae genome and reanalyzed the original data with in-paralogs. The revised analysis supported the original model that the genome of octoploid strawberry originated through successive stages of polyploidization involving four progenitor species: diploid × diploid → tetraploid × diploid → hexaploid × diploid → octoploid ancestor. In addition, the chromosome-scale genome assembly showed that the diploid subgenomes were not static building blocks walled off from one another. Rather they have dynamically evolved through homoeologous exchanges, which are well-known in neopolyploids. Homoeologous exchanges in octoploid strawberry were found to be highly biased toward the F. vesca subsp. bracteata subgenome replacing substantial portions of the other subgenomes. However, homoeologous exchanges are not unidirectional. Although the chromosomes are architectural mosaics of the four diploid subgenome donors and their octoploid descendants, F. × ananassa is strongly allo-octoploid. Because the F. × ananassa chromosomes are complex admixtures of genes with different phylogenetic histories via homoeologous exchanges, Edger et al. developed a nomenclature that precludes oversimplified oneto-one assignments to a specific diploid progenitor.
The F. × ananassa genome has not only been reshaped by polyploidization events, especially homeologous exchanges, gene-conversion, and selection , but by repeated interspecific hybridization in breeding that has resulted in the introgression of alleles from phylogenetically and demographically diverse F. chiloensis and F. virginiana ecotypes. At this point in time, the decades long debate among geneticists and evolutionary biologists about the origin of the F. × ananassa genome seems to have reached an initial zenith. Remaining disagreements might only be settled when chromosome-scale assemblies of the other hypothesized diploid progenitors are assembled and analyzed. Aside from the question of subgenome origin, what other evolutionary questions might be worthy of exploration at this juncture? First, while the four extant relatives of the diploid progenitors have been putatively identified, the history and timing of the intermediate polyploids remain poorly understood. When and where were the tetraploid and hexaploid ancestors formed? Are any of the known wild polyploids endemic to Asia descendants from these intermediate polyploids? Which subgenome is dominant in these polyploids? Second, a single dominant subgenome was uncovered in Fragaria × ananassa that controls many important traits including fruit quality. Just how deterministic is subgenome dominance? In other words, is it possible to resynthesize the octoploid with a different degree of subgenome dominance, or with a different subgenome becoming dominant?
The answer to this question could have implications for genetic improvement of the cultivated species.Genotyping advances in strawberry have naturally followed advances in humans, model organisms and row crops. The development of the Affymetrix Axiom® iStraw90 single-nucleotide polymophism genotyping array was a significant advance that enabled the facile production and exchange of genotypic information across laboratories with high reliability, minor amounts of missing data, and negligible genotyping errors. The ease-of-use, speed of analysis, simplicity of data management, and outstanding reproducibility of SNP genotyping arrays have been important factors in their continued adoption in strawberry and other plant species with complex genomes. Underlying computational challenges associated with genotyping by sequencing and other nextgeneration sequencing facilitated approaches have limited their widespread application in octoploid strawberry thus far. The challenges are similar across species, but obviously exacerbated in allogamous polyploids: uneven and inadequate sequencing depth, copy number uncertainty, heterozygote miscalling, missing data, sequencing errors, etc., all of which challenge the integration of DNA variant information across studies. As with the other DNA marker genotyping approaches reviewed here, the first GBS study in octoploid strawberry utilized the diploid F. vesca reference genome in combination with a phylogenetic approach for aligning, classifying, and calling DNA variants. Recently, Hardigan et al. whole-genome shotgun sequenced 88 F. × ananassa, 23 F. chiloensis, and 22 F. virginiana germplasm accessions. Strikingly, 80% of the short-read DNA sequences uniquely mapped to single subgenomes in the octoploid reference. Approximately, 90M putative DNA variants were identified among F. × ananassa, F. chiloensis, and F. virginiana individuals, whereas 45M putative DNA variants were identified among F. × ananassa individuals. An ultra-dense framework was then developed of genetically mapped DNA variants across the octoploid genome by WGS sequencing 182 full-sib individuals from a cross between F. × ananassa “Camarosa” and F. chiloensis subsp. lucida “Del Norte”. Large expanses of homozygosity within the commercial hybrid parent prevented complete end-toend mapping of all 28 octoploid chromosomes in F. × ananassa as was accomplished with the wild parent, further demonstrating the value of dense NGS data for understanding sources of genotyping and mapping challenges in the octoploids.
As these WGS-GBS and GBS mapping results demonstrate, several NGS-based genotyping approaches should work well in combination with the octoploid reference genome. In summary, while the complexity of the octoploid genome has historically complicated DNA variant genotyping and genetic mapping in strawberry, the chief technical challenges were addressed with: the development of a high-quality octoploid genome assembly; WGS resequencing of numerous octoploid individuals that shed light on the extent of intra- and inter-homoeologous nucleotide variation; identification and physical mapping of DNA variants across the octoploid genome; and comparative genetic mapping of the wild octoploid progenitors of F. × ananassa using SNPs anchored to the octoploid reference genome. DNA variants genotyped with different platforms and approaches predating the octoploid reference genome were independent and disconnected, seedling starter pot resulting in the proliferation of linkage group nomenclatures, absence of a universal linkage group nomenclature, uncertainty in the completeness of genome coverage, and inability to cross-reference physical and genetic mapping information across studies, populations, and laboratories. The DNA marker sequences from many of the previously published mapping experiments were either not readily available or too short or nonspecific to enable unambiguous mapping to the octoploid reference genome. The one exception was the genetically mapped double digest restriction-associated DNA sequence markers described by Davik et al., which were used by Edger et al. for scaffolding the octoploid reference genome. Most F. vesca DNA probe sequences used to assay SNPs on the iStraw35 and iStraw90 SNP arrays were too short and nonspecific to unambiguously determine their physical marker locations in the octoploid genome. Hence, genotypes produced with these SNP arrays could not always be effectively utilized for genome-wide association studies or other applications requiring subgenome resolution. Moreover, none of the previously published iStraw90 based genetic mapping studies have shared SNP marker genetic locations, complete genetic maps, or other critical enabling information needed to identify corresponding linkage groups across laboratories. These long-standing issues were resolved with the development of a new 850,000-SNP genotyping array populated exclusively with DNA variants and reference DNA sequences that unambiguously mapped to single homoeologous chromosomes in the octoploid reference genome. Using the 850,000 SNP array, a second array with 50,000 subgenome specific SNPs, including 5819 genetically mapped SNPs from the iStraw35 array was developed facilitating the integration of genetic and physical mapping information across studies. These new arrays provide telomere-to-telomere coverage and target common DNA variants within and among domesticated populations. Although the full set of iStraw SNP probe DNA sequences could not be unambiguously aligned to a single octoploid subgenome, the true physical position for 97% of the retained iStraw probes were identified using linkage disequilibrium with the newly developed SNPs probes anchored to the octoploid reference genome.
Comparative mapping of SNPs in several wild and domesticated populations facilitated the integration of earlier linkage group nomenclatures and the development of a universal linkage group nomenclature substantiated by the observation of genome-wide synteny among diverse octoploid genetic backgrounds. These recent advances in genotyping and mapping are expected to have tremendous and immediate impacts on applied research in genetics and breeding of strawberry. But other research questions arise which have bearing on the utility of these new tools and resources, particularly with regard to diversity among genomes that is currently undescribed. For example, what large-scale structural variations exist in octoploid Fragaria germplasm? Recent advances in long read sequencing platforms resulted in significant decreases in costs and increases in read lengths and should soon permit inexpensive assessments of structural variants across the cultivated strawberry pangenome. On a smaller scale, what percentage of genes in cultivated strawberry exhibit presence–absence variation? Recent pangenome studies in plants have revealed that a significant proportion of gene content exhibits presence–absence variation. For example, nearly 20% of the genes in Brassica oleracea are found in only certain genotypes and are enriched with functions encoding major agronomic traits. This suggests that genes in strawberry will be missed when utilizing a single octoploid reference genome and genotyping resources based on that genome alone. To construct a useful pangenome, how many individuals need to be included to capture most variation in gene content? These questions will soon be addressed as additional octoploid genomes become available.For many years genome-assisted breeding in strawberry lagged behind agronomic crops and even many specialty crops. However, surveys conducted by the RosBREED consortium and funded by the NIFA Specialty Crop Research Initiative have documented the rapid rise in the use of DNA information in strawberry breeding in the last decade. In 2010, only 43% of surveyed strawberry breeders had employed DNA markers or other genomics-based tools. By early 2019, data on 12 of the 14 active strawberry breeding programs in the U.S. indicated that all but one of these 12 programs had used DNA information for at least one of four purposes. The most common application was for verifying the identity or better understanding the lineage of plant materials used in the program. Two-thirds of the programs had used DNA markers or other genomics-based tools to choose parents and plan crosses, and seven of the 12 had used DNA information for seedling selection. Two-thirds of the programs were involved in upstream research of direct relevance to their programs, e.g., creating or validating DNA tests of particular applicability for their plant materials and breeding goals. Some of these were onetime or infrequent applications; however, seven of the 12 programs reported using at least one application of DNA information “on an ongoing, routine basis” . Among the many breeding-relevant loci discovered in the cultivated strawberry genome, flowering, and fruit quality loci have been prominent, as would be expected in a high-value fruit commodity. These, include discovery of the locus controlling day-neutrality or PF and its sub-genome localization as well as multiple loci controlling volatile compounds such as gamma decalactone, mesifurane, and methyl anthranilate. For uncovering disease resistance loci, quantitative trait locus mapping has been the most prominent approach. While traditional biparental populations have been effective for QTL discovery, pedigree-based analysis in multiparental populations using FlexQTL™ has been increasingly applied, as pedigree breeding and maintenance of clones across generations are common in strawberry. Pedigree-based analysis in complex family structures has allowed the simultaneous detection of multiple QTL alleles and the quantification of their phenotypic effects across diverse genetic backgrounds, as demonstrated for the FaRPc2 locus. The use of DNA tests in breeding has been greatly enhanced by RosBREED efforts in marker development and validation. Assays for SNP detection such as kompetitive allele-specific polymerase chain reaction and high-resolution melting have become the tests of choice for breeding applications due to an abundance of SNP information from array genotyping, accuracy and ease of scoring, and resilience to crude strawberry DNA extracts.