A unity was then added to the abundance value due to the presence of zero entries

The identification of annotated and novel miRNAs was carried out applying a conservative and robust pipeline as described by Jeong et al. and Zhai et al. , and successfully deployed in various published studies . Shortly, in order to recognize the conserved miRNAs, all small RNAs sequenced in the libraries were initially compared against all annotated vvi-miRNAs deposited in miRBase . Subsequently, the whole set of small RNAs passed through the five filters designed according to the properties of validated plant miRNAs and their precursors , keeping track of known miRNAs throughout the filtering. The filters included, but were not limited to, minimum abundance threshold , size range , maximum hits to the grapevine genome , strand bias , and abundance bias [/total ≥ 0.7]. For each possible precursor found, the most abundant read was retained as the biologically active miRNA and in cases where both the 3′ -end and the 5′ -end reads were highly abundant , the two tags were kept. All the known vvi-miRNAs identified by the pipeline were manually inspected, to ensure that the tags identified as known miRNAs were assigned correctly to their actual precursor, and to retrieve the most abundant isoform within the tags mapping in each precursor.

Regarding the novel miRNA candidates identified using this pipeline, nft channel only those for which the most abundant tag was 20, 21, or 22 nt were retained. They were compared with all the known mature plant miRNAs in miRBase to identify homologs. Finally, novel candidates passed through a manual evaluation of precursor secondary structures, using the plant version of the UEA sRNA hairpin folding and annotation tool and the Mfold web server , with default settings.A miRNA was considered as “expressed” only when the values of both biological replicates were greater than or equal to the threshold set at 10 TP4M. We defined a miRNA as “vineyard-, cultivar-, or stage-specific” when it was expressed only in a given vineyard, cultivar or one specific developmental stage. Differentially expressed miRNAs were identified using the CLCbio Genomics Workbench using multiple comparison analysis. We loaded the total raw redundant reads from our 48 libraries in the CLCbio package and trimmed the adaptors, considering only reads between 18 and 34 nt. We annotated miRNAs against the user defined database, comprehending our set of 122 MIRNA loci and their corresponding mature sequences. For each library, the total counts of read perfectly mapping to the miRNA precursors was considered as the input of the expression analysis. Given the main focus of our work, we aimed at identifying miRNAs differentially expressed between the two cultivars in the same environment and developmental stage , or between the three vineyards in the same cultivar and in the same developmental stage . For this reason, we considered each developmental stage and we performed the Empirical Analysis of digital gene expression , an implementation of the “Exact Test” present in the EdgeR Bioconductor package, as implemented in CLCbio software and estimating tag wise dispersion with pairwise comparisons and setting the significance threshold to FDR-adjusted p ≤ 0.05.

The normalized reads of all miRNAs identified in this study and also the cluster abundances obtained from the static clustering analysis were submitted to another adhoc normalization [log10 or log10 ] for correlation analysis. This normalization was chosen because of the enormous range of abundance values that produced a logunimodal distribution and may cause significant biases in the correlation analysis when performed using TP4M or HNA values. After this addition, a value of zero still corresponds to zero of the log10 function, thus making consistent the comparisons among profiles. The dendrogram was generated using the function hclust and the Pearson correlation was calculated using the function cor in R, based on the log10 or log10 values for miRNAs and sRNA-generating loci respectively. Pearson’s correlation coefficients were converted into distance coefficients to define the height of the dendrogram. Heat maps were produced using MeV based on TP4M values of miRNAs abundance. The Venn diagrams were produced using the function vennDiagram in R, based on the miRNA list for each cultivar, environment and developmental stage.Small RNA libraries were constructed and sequenced for 48 samples of grapevine berries . We obtained a total of 752,020,195 raw redundant reads . After adaptors trimming, 415,910,891 raw clean reads were recovered, ranging from 18 to 34 nt in length . Eliminating the reads mapping to rRNA, tRNA, snRNA, and snoRNA sequences, 199,952,950 reads represented by 20,318,708 distinct sequences, i.e., non-redundant sequences found in the 48 libraries , were perfectly mapped to the V. vinifera PN40024 reference genome . The libraries were analyzed to assess the size distributions of mapped reads. Distinct peaks at 21- and 24-nt were observed in all the libraries. Consistent with previous reports in grapevine and other plant species , the 21- nt peak was the highest, comprising a higher proportion of redundant reads, whereas the 24-nt peak was less abundant.

A few exceptions regarding the highest peak in the small RNA size profile were observed: Ric_SG_ps had the highest peak at 24- nt whereas Mont_CS_ps and Mont_SG_bc did not show a clear difference between the 21- and the 24-nt peak. Using the Pearson coefficients we observed a strong association between the replicates as indicated by the high coefficients . To facilitate access and utilization of these data, we have incorporated the small RNAs into a website . This website provides a summary of the library information, including samples metadata, mapped reads, and GEO accession numbers. It also includes pages for data analysis, such as quick summary of the abundances of annotated microRNAs from grapevine or other species. Small RNA-related tools are available, for example target prediction for user-specified small RNA sequences and matching criteria. Finally, and perhaps most importantly, a customized browser allows users to examine specific loci for the position, abundance, length, and genomic context of matched small RNAs; with this information, coupled with the target prediction output, users can develop and assess hypotheses about whether there is evidence for small RNA-mediated regulation of grapevine loci of interest.In order to investigate whether the overall distribution and accumulation of small RNA is affected by the interaction between different V. vinifera genotypes [Cabernet Sauvignon and Sangiovese ] and environments [Bolgheri , Montalcino and Riccione ], we investigated the regions in the grapevine genome from where a high number of small RNAs were being produced , by applying a proximity-based pipeline to group and quantify clusters of small RNAs as described by Lee et al. . The nuclear grapevine genome was divided in 972,413 adjacent, non-overlapping, hydroponic nft fixed-size windows or clusters. To determine the small RNA cluster abundance, we summed the hits-normalized-abundance values of all the small RNAs mapping to each of the 500 bp clusters, for each library . To reduce the number of false positives, we considered a cluster as expressed when the cluster abundance was greater than the threshold for a given library, eliminating regions where few small RNAs were generated, possibly by chance. Libraries from bunch closure, representing green berries, and 19 ◦Brix representing ripened berries, where used in this analysis. From the 972,413 clusters covering the whole grapevine genome, 4408 were identified as expressed in at least one sample. As showed in Figure 1, CS-derived libraries have a higher number of expressed clusters when compared to SG-derived libraries of the same developmental stage and from the same vineyard. The exceptions were the Sangiovese green berries collected in Riccione and Sangiovese ripened berries collected in Montalcino, which have a higher number of expressed clusters than the respective CS ones. The two cultivars show a completely different small RNA profile across environments. When Cabernet berries were green, a higher number of sRNA-generating regions were found active in Bolgheri than in Montalcino and Riccione. Differently, ripened berries had the highest number of sRNA producing regions expressed in Riccione, while Bolgheri and Montalcino show a similar level of expressed clusters . Sangiovese green berries instead show the highest number of active sRNA-generating regions in Riccione, and this number is twice the number found in Bolgheri and Montalcino that is similar. Ripened berries collected in Montalcino and Riccione show almost the same high level of sRNA-generating clusters, whereas those collected in Bogheri present a lower number . We also noted that when cultivated in Bolgheri, neither Cabernet Sauvignon or Sangiovese change dramatically the number of expressed clusters during ripening, while in Riccione Cabernet Sauvignon shows a 2-fold increase of sRNAproducing clusters, which is not observed in Sangiovese. Next, the small RNA-generating clusters were characterized on the basis of the genomic regions where they map, i.e., genic, intergenic and transposable elements. In general, when the berries were green, the numbers of sRNA-generating loci located in genic and intergenic regions were roughly equal in all environments and for both cultivars, except for Sangiovese berries collected in Riccione, which show a slight intergenic disposition of sRNA-producing regions . Differently, in ripened berries on average 65% of the sRNA-generating loci were in genic regions, indicating a strong genic disposition of the sRNA-producing clusters . The shift of sRNA-producing clusters from intergenic to mostly genic is more pronounced in Cabernet Sauvignon berries collected in Riccione, with an increase of approximately 20% of expressed clusters in genic regions when berries pass from the green to the ripened stage. When comparing the clusters abundance among libraries, we found that 462 clusters were expressed in all libraries. The remaining 3946 expressed clusters were either shared among groups of libraries or specific to unique libraries. Interestingly, 1335 of the 4408 expressed clusters were specific to Riccione-derived libraries . The other two environments showed a much lower percentage of specific clusters, 263 and 140 in Bolgheri and Montalcino respectively . Comparing the expressed clusters between cultivars or developmental stages, we did not observe a similar discrepancy of specific clusters toward one cultivar or developmental stage; roughly the same proportion of specific clusters was found for each cultivar and for each developmental stage . Among the 1335 specific clusters of Riccione, 605 were specific to Cabernet Sauvignon ripened berries of and 499 to Sangiovese green berries. Other smaller groups of expressed clusters were identified as specific to one cultivar, one developmental stage or also one cultivar in a specific developmental stage. When comparing the expressed clusters with the presence of transposable elements annotated in the grapevine genome , we noticed that approximately 23% of the sRNA-generating regions were TE-associated. Sangiovese green berries from Riccione have the highest proportion of TE-associated expressed clusters, while Cabernet Sauvignon ripened berries also from Riccione show the lowest proportion of TE associated expressed clusters. Sangiovese berries have the highest percentage of expressed clusters located in TE when cultivated in Riccione, compared to the other two vineyards. Interestingly, Cabernet Sauvignon berries show the lowest proportion of TE-associated clusters when growing in Riccione , independently from the berry stage. In all the libraries, Long Terminal Repeat retrotransposons were the most represented TE. More specifically, the gypsy family was the LTR class associated with the highest number of sRNA hotspots. The other classes of TE associated with the sRNA-generating regions can be visualized in Figure 3B.To determine the global relationship of small RNA-producing loci in the different environments, cultivars and developmental stages, we performed a hierarchical clustering analysis. As showed in Figure 4, the libraries clearly clustered according to the developmental stage and cultivar and not according to the environments. Ripened and green berries had their profile of sRNA-generating loci clearly distinguished from each other. Inside each branch of green and ripened samples, Cabernet Sauvignon and Sangiovese were also well separated, indicating that, the cultivar and the stage of development in which the berries were sampled modulate the outline of sRNA-producing loci more than the environment. Notwithstanding the evidence that developmental stage and variety have the strongest effect in terms of distinguishing samples clustering, we were interested to verify the environmental influence on small RNA loci expression in the two cultivars. Thus, for each sRNA-generating cluster we calculated the ratio between cluster abundance in Cabernet Sauvignon and Sangiovese in each environment and developmental stage, thereby revealing the genomic regions with regulated clusters, considering a 2-fold change threshold, a minimum abundance of 5 HNA in each library and a minimum sum of abundance of 30 HNA . Figure 5 shows how different environments affect the production of small RNAs.