The other two methods include probabilistic graphical models and meta-prediction

Consistently, six genes, CrPORA, CrCAO, CrCHIH, CrCHLM, CrGGDR and CrMPEC, involved in the process of chlorophyll biosynthesis were more highly expressed, while two genes, CrCLH1 and CrRCCR, involved in chlorophyll degradation were less expressed in WT than in MT . Similarly, auxin could also affect the gene expression of carotenoid and chlorophyll metabolism during tomato fruit ripening . In this study, we did not measure the IAA content in fruit peels for technical reasons. Although peel coloration and pulp maturation are two different processes in the same ripening fruit, they are normally synchronized in the majority of the world citrus growing areas. Similarly, in our case, the changes in peel color were closely coupled with the changes in pulp sugar and acid content. It is therefore relatively safe to assume that the changes in IAA content in pulps should similarly occur in peels, plastic gardening pots and no fruit peel IAA data should interfere with drawing a reliable conclusion.

It must be pointed out that the citrus fruit may delay or stop CC during fruit ripening in rare cases, such as in some very early satsuma mandarins, and in hot, tropical regions.Organisms attain their form and function by readouts from an intricate web of regulatory relationships between DNA, RNA, proteins and metabolites. The era of large-scale biology promises to provide insights into this web of regulation at the whole genome level and has spurred growth of computational methods that allow us to look at diverse readouts and generate a comprehensive frame-work for how molecules generate morphological phenotypes. The number of sequenced genomes grows apace; NCBI currently lists >2500 genomes , and the number of plant genomes is currently listed at >100. However, the challenges associated with whole genome sequencing and assembly have caused many researchers to turn to other types of genome scale data. Since 2009, when RNAseq was described as a recently developed technology that had the potential to revolutionize our understanding of the complexities of eukaryotic transcriptomes, the technology has evolved and proved useful for identifying links between transcription factor activity and transcript abundance, for the generation of transcriptomes in non-model species through de novo assembly methods, for detection of genomic variants, and for identification of splicing variants.

Continued improvements in efficiency combined with reduction in cost of sequencing have made sequencing technology available even to fields that traditionally did not rely on them. In a recent review Alvarez and co-authors reported on >500 studies that relied on either microarray or RNAseq methods in the last 10 years and present the potential to analyze gene expression in an ecological context across multiple taxa. In the field of evolutionary developmental biology the numbers are even more staggering. How can this explosion of genome scale data be leveraged to better understand how organisms develop, evolve, respond to biotic and abiotic stimuli and function in the context of their environment? Network analysis, an offshoot of graph theory used in mathematics and computer science to model relationships between objects, was utilized extensively in the social sciences and has become a method of choice for identifying relationships between units of biological data.Early gene networks, such as those produced by Davidson et al., were generated using perturbation assays and direct experimental data to create a directed network of developmental regulatory control. While these networks were small they were a large step in advancing our understanding of developmental processes that was not obtainable by analysis of just one or two genes at a time.

Many early network models consisted of one of two types of mathematical analysis, the Logical or Boolean model , and the dynamic systems model. The Switch model consisted of genes being either in an ‘On’ or ‘Off’ state, which could be regulated by other genes. This method demonstrated feedback loops as well with genes regulating the activity of other genes , but could not allow for variable states of expression. Dynamic systems utilizing differential equations allowed for variable expression states of genes and nonlinear interactions, but were limited by computational power and the lack of large-scale transcriptomic data availability. Transcription factor– promoter interaction studies provided an additional method of gene regulatory network construction. These interactions showed a multi-tiered, or hierarchical, structure to gene regulation in networks, with a top, core, and bottom tier of transcription factors and their targets. This tiered structure revealed an interesting aspect of gene networks and biological processes, as the top tiered transcription factors and their targets tended to be noisy or have a high degree of variability in expression, while the bottom tier showed very low noise and stricter regulation of expressional states. Jothi et al. hypothesized that the increased variability in top-level gene regulation allowed for greater adaptability, while low variability in the bot-tom tier acted as a buffer against inadvertent changes in the higher tiers that could be detrimental.In the post-genomic era, and with the large volume of whole genome transcriptional data available, gene net-work construction has become readily available to most researchers. Figure 1 represents a flowchart of the potential analyses discussed below. Before constructing a network, genes often need to be subset into interest groups in order to facilitate data visualization and focus analysis on specific biological questions. This involves differential expression analysis in conjunction with dimensionality reduction and clustering methods such as PCA, k-means, hierarchical, self-organizing maps, and t-distributed stochastic neighbor embedding. Each of these methods attempts to reduce high dimensionality data such as gene expression patterns, either over time, different tissue types or treatments, into a representative and more easily interpretable two dimensional structure.

PCA has been the most utilized dimensionality reduction method, using Euclidean distances to measure dissimilarity and determine placement within a two dimensional space. However, the output often does not represent the actual relation- ship of objects from higher dimensional space as it measures distinct orthogonal components within each PC representing the greatest amount of variance without regard to overall gene-to-gene correlations. With k-means, a user-defined number of clusters are used, and a mean vector is calculated for a cluster to assign new members, and then the mean recalculated. This iterative process reduces the level of dissimilarity of objects within the cluster, thus giving a better representation of object relationships in higher dimensional space within the limitation of the defined number of clusters . Despite this improvement, k-means is highly susceptible to noise distortion, or the influence of outliers on the overall mean and structure of a cluster. Hierarchical clustering builds a tree with nodes that represent clusters through multiple different methods including matrix construction by gene pair similarity measures, blueberry pot size and then identifying those genes with the highest degree of similarity. While hierarchical clustering provides a more informed output, merging errors or smaller cluster merging can result in the loss of more interesting local groups of genes. Each of the previous methods relies on Euclidean distances for similarity/ dissimilarity measures between genes within higher dimensional space, which does not conform to a linear relationship by its nature. SOM and t-SNE employ non-linear distance measures to ap-proximate the relationship between genes within higher dimensional space, often providing a much more realistic representation of gene similarity in two dimensions. Once distinct clusters of genes with similar expression patterns have been identified gene ontology or gene set enrichment, tests can be performed to identify the nature of genes within clusters. Networks can then be constructed from an individual cluster or multiple clusters sharing similar biological functions. There are three primary network types; gene regulatory networks which give directionality to the interaction between nodes or genes, association networks which are non- directional but show direct interaction between associated genes, and gene coexpression networks which are non-directional and can show direct or indirect interactions between associated genes. With transcriptomic datasets GCNs offer the most versatile gene interaction exploratory tool, using gene expression patterns to deter- mine potential associations and modularity. This is especially useful in non-model organisms where the function of most or many genes has not been determined, and regulatory interactions remain unknown. Of the four primary network construction methods, the two most commonly utilized are correlation and supervised networks. Correlation network construction consists of determining a correlation between two genes based on expressional changes, with Pearson’s moment correlation coefficient being the most common method . PMCC identifies linear correlations, but suffers from the inability to deal with outliers or genes which may have a nonlinear relationship. Spearman’s rank correlation coefficient deals with both of these issues, as it is more robust to outliers and accommodates non-linear relationships. Maximum information correlation allows detection of the strength of any type of linear or nonlinear correlations between genes, and Partial correlation coefficient can be employed to quantify the association between two genes when conditioning on other genes to infer direct dependencies among variables in a network. In addition, it has been reported that Network deconvolution can allow one to infer direct effects from an observed correlation matrix containing both direct and indirect effects. On the other hand, supervised network construction utilizes regression models , which deal with the response of genes to a set of predictor genes. Supervised network construction deals well with cascade expression changes, but is less reliable when dealing with feedback loops, a feature of the regression analysis where response and predictor variables are set and not necessarily interchangeable during construction. A combination of several mathematical techniques is preferable to obtain a more accurate representation of the gene associations.

There are two types of PGM methods, Bayesian and Markov, with the former providing interaction directionality of gene relationships, and the latter using neighborhood selection methods similar to linear regression in supervised learning. Bayesian PGM is highly sensitive to experimental design and requires computationally intensive methods for interpreting Bayesian networks. The possibility of misinterpreted causal relationships among genes from gene expression data makes this method less appealing. However, when applied correctly the method can provide gene relationship information not obtained with some other methods with large scale, high dimensionality data. Meta-prediction includes meta-analysis and ensemble learning, however each utilizes multiple methods of network construction, and then creates a consensus relationship among gene expression patterns. Meta- Prediction methods, through the use of multiple methods, may provide a more robust network than any one method on its own.Once the GCN has been constructed, the interaction among genes can be determined, and other information such as gene function and biological processes regulated can be obtained. Since transcriptionally coordinated genes are often functionally related, GCN can be used for gene function prediction. Especially a comparative GCN analysis across species can yield more accurate gene function predictions because conserved gene modules are more likely to be functionally relevant. Hub genes, modularity and network restructuring are discussed in the following section. One of the more appealing aspects of GCNs is that whole transcriptome data can be combined with other large scale networks such as metabolic or protein–protein interactions to give a wider view of the biological processes to which specific clusters of genes belong. Interestingly, the transcriptomic data avail-able is outstripping computational capabilities of many researchers, creating a technological bottleneck rather than a biological one.A major challenge in biology is to understand the genetic basis of morphological evolution. Evo-devo studies aim to understand the developmental mechanisms that are modulated over time to give diverse phenotypic outputs. Most evo-devo studies, even though pursued on a gene-bygene level, have underscored the importance of gene expression regulation, suggesting that rewiring of developmental GRNs should be a crucial factor driving morphological evolution. Large-scale genomics tools can be used to investigate rewiring of developmental GRNs as crucial factors driving morphological evolution. Studies determining GRNs within an evo-devo context help us determine how developmental GRNs are reorganized to generate morphological diversity. Recent interaction mapping studies have showed the ability of differential analysis to reveal massive rewiring in the architecture of an interactome during cellular or adaptive responses.Our previous GCN analysis using cross-species and tissue-specific RNA-seq data had revealed the modular structure of the GRN controlling leaf development in the domesticated tomato and its wild relatives. Comparisons of the networks among species with experimental data showed that changes in a module regulating the key KNOX1 TFs made a significant contribution to the variation in leaf complexity.