For whole genome sequencing a wild-type sugar apple fruit was purchased from a retail source in the United States, seeds from the fruit were planted, and one plant grown in the UC Davis Conservatory was sampled for sequencing with voucher herbarium samples stored as DAV225058 and DAV225059. The Hawaiian seedless line was obtained as budwood from Frankie’s Nursery , grafted to a wild-type A. squamosa rootstock and grown in the UC Davis Conservatory with a voucher sample stored as DAV225060.For genetic inheritance studies, three different wild types were used: M1, M2 and M3 and a seedless Bs line. The authors planned the crosses with different wild types for two propositians: inheritance studies and for initial steps in production of desirable seedless lines. Plants were grown at the Experimental Farm and molecular analysis was performed at Molecular Biology laboratory of the State University of Montes Claros, latitude 15º48′09’’S, longitude 43°18′32’’W and altitude 516 m. For phenotypic characterization of seedless versus seeded two strategies were applied: fruits were harvested, pulped, 30 plant pot and examined for the presence or absence of seeds ; or flowers either fresh or fixed in formalin acetic acid-alcohol were dissected to separate the ovules from the carpel tissue.
The wild-type ovules present a domed shape opposite the funiculs, while the mutant ovules come to a point at this position . Filial generations , self-fertilization , backcrosses with wild-types parents M1, M2, M3 , and backcrosses with mutant parent Bs were obtained. Segregations were evaluated for conformity to predicted ratios with the Chi-square test using the Genes statistical software .DNA was extracted from young leaf samples with hexadecyltrimethylammonium bromide buffer as described by Doyle and Doyle and separated from polysaccharides as described by Cheung et al. . Primers used in PCR are listed in Supplementary Table 1. PCR was performed with DreamTaq and the included reagents with an initial denaturation at 94 °C for 3 min; 35 cycles with denaturation at 94 °C for 30 s, annealing at 56 °C for 30 s, and extension at 72 °C for 1.5 min; and a final extension of 72 °C for 4 min. For reactions using the AsINODel primers a 60° annealing temperature was used. PCR products were electrophoresed on 1.2% agarose buffered with 1×TBE or SB and DNA visualized by staining with ethidium bromide an illumination with ultraviolet light. For sequencing, PCR products were processed with ExoSAP-IT or Quiapure and sequenced using amplification primers on an ABI 3500 or 3730 genetic analyzer at Análises Moleculares Ltda. or the University of California Davis CBS DNA Sequencing Facility .Whole genome sequencing was performed by the North American author prior to initiating the current collaborative effort.
The lines available for sequencing were a wildtype North American commercial line and the Hs line, and these were used for this part of the analysis. DNA for whole genome sequencing was isolated from young leaves by grinding in 100 mM TRIS–Cl, 20 mM EDTA, 1.4 M NaCl, 2% CTAB, 1% each polyvinylpyrrolidone and sodium metabisulfte pH 8.0. Samples were treated with 70 µg/ml RNAaseA , extracted with 1:24 mixture of isoamyl alcohol and chloroform, and precipitated with isopropanol. Samples were dissolved in 10 mM TRIS pH 8.0, 1 mM EDTA, adjusted to 0.3 M Na Acetate, pH 4.8, precipitated with 2 volumes of ethanol and dissolved in 10 mM TRIS pH 8. Wild-type A. squamosa DNA was processed and sequenced at the University of California, Davis Genome Center . For PacBio sequencing, DNA fragments greater than 10 kb were selected by BluePippin electrophoresis and were sequenced on a PacBio RSII or Sequel Single Molecule, Real-time device. This resulted in 2.46 million reads with an average read length of 8 kb comprising more than 29 Gbases, or approximately 37 X genome representation. For Illumina sequencing the DNA was sheared and fragments of an average size of 400 bp were selected and sequenced on a HiSeq 4000 apparatus by the paired-end 150 bp method resulting in approximately 390 million sequences. The sequences were trimmed of poor quality regions and primer sequences with Sickle and Scythe , respectively, resulting in 229 Gbases or approximately 124 X genome representation. Hs DNA was similarly processed and sequenced by QuickBiology resulting in 408 million sequences and approximately 130 X genome representation.
The PacBio reads of wild-type DNA were assembled using Canu with default settings, producing 3519 contigs. The wild-type Illumina reads were aligned with the assembly using BWA MEM with default settings, and Pilon used the alignment to correct the contigs, changing 148 k single nucleotide errors and adding a net of more than 1.8 Mbases of insertions for a final assembly of 707.7 Mbases with an average contig length of 201 kb and an N90 of 93.9 kb. A BLAST search with the known A. squamosa INO gene sequence identifed a 587 kb contig containing the INO gene . BWA MEM aligned the Hs Illumina readswith the assembly and Tablet was used to examine the alignment with the 587 kb contig containing INO. BLAST was used to search one half of the set of Hs Illumina sequence reads for those extending across a detected deletion and the resulting sequences were aligned and assembled using Sequencher 5.4.1 .Genetic diversity was assessed among varieties of seedless sugar apple Bs, Ts, and Hs, with the fertile parent M2 as a contrasting control. Sixty-seven pairs of SSR microsatellite markers, described for A. cherimola were used, with fifteen having been described by Escribano et al. and fifty-two by Escribano et al. . DNA extraction was performed as described above for the markers association of seedless trait with INO deletion. Amplification utilized an initial denaturation at 94 °C for 1 min; 35 cycles at 94 °C denaturation for 30 s, annealing at 48–57 °C depending on the primer; and extension at 72 °C for 1 min; and a final extension of 72 °C for 7 min. The amplification products were separated by 3.0% agarose gel electrophoresis bufered, stained and visualized as above. To calculate diversity, the amplification data of the SSR primers were converted into numerical code per locus for each allele. The presence of a band was designated by 1 and the absence by 0. Although the microsatellite markers can be codominant, grow raspberries in a pot molecular analyses of the locus were performed based on the presence/absence of each amplified fragment. The established binary matrix was used to obtain estimates of genetic similarities between genotype pairs, based on the Jaccard coefficient. The Genes statistical program was used for data processing.The results of the phenotypic analysis of the parents M2 and Bs, F1, F2, backcrosses with the wild-type parent M2 and with mutant parent Bs are displayed in Table 1. In generation F1, all individuals presented fruits with seeds. In the F2 population, among the plants in reproductive stage during the evaluation period, 48 formed fruits with fully developed seeds and ten presented only seed rudiments, characterized by the absence of seeds . Considering segregation hypotheses expected for one, two and three genes , the Chi-square test revealed that the trait under study segregated at a 3:1 ratio , consistent with a monogenic inheritance. These results were corroborated by data from backcrosses with the parent M2 , where all plants that produced fruits had seeds, consistent with the 1:0 ratio, while the plants evaluated through back crossing with the mutant parent Bs , showed segregation of 1:1 for presence and absence of seeds. Taken together, these results corroborate the monogenic inheritance found in the analyses of F2 generations, indicating that a single recessive locus controls the seedless trait in Bs A. squamosa.
Previously described molecular markers for the presence of the INO gene were tested on parents M1, M2, M3, and Bs and displayed the expected band patterns. These markers generated amplification products only in the three wild-type parents, with no amplification of any fragment in Bs for any of the primer pairs used . The dominant marker LMINO primer-set was also used to amplify DNA from F1 plants obtained from crosses between genotypes of A. squamosa with the mutant Bs . All evaluated F1 individuals produced fruits with seeds in the feld and amplified the products with all primer pairs, as shown in the Supplementary Fig. 3B. The same procedure was applied in order to genotype segregating generations in seedling stage in the nursery . Figure 2 shows a sample of individuals amplified with the LMINO1/2 primers and the results confirm the discriminatory capacity of those genetic markers . The field confirmation of presence/absence of seeds in the fruits in these generations F2, BCM and BCBs was obtained later . In the F2 generations of the three crosses , there was a segregation of the products of the amplification of the LMINO markers that correlated exactly with the presence/absence of seeds. Fertile plants in this generation uniformly produced an amplification product with the LMINO1/2 primer set, while plants producing no product produced only seedless fruit . The same complete cosegregation pattern seen in F2 individuals for the presence/absence of seeds and PCR product was also observed in backcross populations of BCBs . For BCM backcross plants, the formation of INO amplification products was observed in all DNA samples tested for these uniformly fertile/seed bearing plants. The χ2 test was performed with the data generated in the molecular analysis to confirm the segregation of the dominant amplification . F1 plants displayed the expected genotypic ratio of 1:0 that had been linked to the trait of seeded fruits. In F2 generations, six segregation hypotheses expected for one, two and three genes were tested . Considering a signifcance of 5% probability, the frequencies of genotypes ft a ratio of 3:1, but allowed rejection of the other predicted ratios, confirming the hypothesis that a single locus confers the phenotype for the trait under study, with the dominant allele responsible for the presence and the recessive allele for the absence of the amplification product. To identify the homogeneity between the F2 crossings , statistical techniques were applied to verify whether the differences observed in the results could be explained by chance or not. The heterogeneity test was not significant and indicated, with a 55% likelihood, that the results of the χ2 were consistent for the populations of the three families studied, confirming the expected segregation . To further support the hypothesis of segregation in F2 generations, BCM and BCBs backcrosses were used. Similarly, the heterogeneity of segregation between the families of the BCBs backcrossing was not significant . BCM and BCBs progenies analyzed separately displayed segregation in a manner consistent with the hypothesis of a single gene. In BCBs backcrossing, carried out between generations F1 and the parent Bs, the proportion was close to 1:1 presence/absence of seeds in the fruits. χ2 test was applied and the deviations between the observed and expected frequencies were not significant. In the BCM backcrossing between generations F1 and parents , the proportion was 1:0 presence/absence of seeds in the fruits. These results confirmed the monogenic inheritance found in the analyses of F2 generations consistent with a single recessive allele being responsible for the seedless trait in A. squamosa considering the 3:1 segregation hypothesis.Whole genome shotgun sequencing was used to determine the characteristics of the INO gene deletion event. A draft wild-type A. squamosa genome was assembled through sequencing of total DNA isolated from a plant grown from seed derived from commercially available A. squamosa fruit. Genomic DNA was sequenced by both long-read Single Molecule, Real-Time sequencing and short read paired-end 150 base methods. The long reads were assembled into a draft sequence that was corrected with the higher coverage short read sequences. The resulting assembly comprised 707 Mbases of DNA in 3,519 contigs, with average contig length of 201 kb. A BLAST search with a previously published A. squamosa INO gene sequence was used to identify a 587 kb contig that included the INO gene . Total Hs A. squamosa DNA was used to produce a second short-read sequence set and this was aligned with the assembled wild-type sequence. Visualization of the alignment of the Hs sequences with the 587 kb contig including INO revealed a clear absence of reads over a region of 16,020 bp indicating a 16 kb deletion that included the INO gene . The alignment program truncates read sequences where they do not align with the reference sequence, so a deletion or a deletion with a heterologous insertion would appear similar in this visualization.