Membranes that separate compartments are rendered as gray bars, with both sides labelled, and transporters are shown as breaks in the gray bar with pairs of brown ovals on either side to suggest a channel. This new feature makes intracellular transport within pathways clearer and easier to visualize. PMN 15 is an extensive and regularly-updated database of compounds, pathways, reactions, and enzymes for 126 plant and green algae species and subspecies as well as a pan-species reference database called PlantCyc. We examined the qualityof the data contained in the databases by assessing the accuracy of pathway prediction via manual validation of a randomly-selected subset of predicted pathways. Using two publicly available transcriptomics datasets, we demonstrated how PMN resources can be leveraged to characterize and gain insights from omics data. The present work demonstrates that the Plant Metabolic Network can be a useful tool for various analyses of plant metabolism across species. PMN 15 differs from other metabolic pathway databases in several ways: the quantity of curated and computational information,25 liter plant pot its comprehensive set of tools, and its specific focus on plants. Other, comparable databases include KEGG , Plant Reactome , and Wiki Pathways .
Like PMN, these databases contain metabolic pathways along with their associated reactions, compounds, and enzymes. KEGG pathways represent broad metabolic reactions shared among many organisms, and it is common to map genes or compounds to KEGG pathways alongside Gene Ontology annotations for enrichment analyses. However, because KEGG pathways represent a generalized set of reactions leading to many possible compound classes , it lacks the granularity to analyze metabolism on a species-specific level . For example, a recent study identified enriched KEGG pathways among genes belonging to gene families that were expanded in Senna tora compared with its relatives . Enrichment analysis of the same genes using PMN’s StoraCyc 1.0.0 identified individual phenylpropanoid biosynthetic pathways enriched among the gene set, such as coumarin biosynthesis. PMN and MetaCyc feature structured data that is both human- and machine-readable, making it possible for users to obtain pathway structure and other data for their own offline analysis and enabling features such as the Pathway Co-Expression Viewer to be easily incorporated. Wiki Pathways is another pathway centric database. Wiki Pathways is not plant-focused, and takes a crowd-sourced approach, in contrast with PMN’s focus on expert curation. Plant Reactome, another metabolism database, is specific to plants and green algae as PMN is. However, Plant Reactome uses Oryza sativa as a reference species to predict reactions and pathways to the 106 other species currently in the database and uses gene orthology to predict the presence of a pathway, where a pathway is predicted in a species if at least one rice ortholog for an enzyme in that pathway is present in that species . Pathway prediction in PMN, on the other hand, is more stringent via its implementation through the PathoLogic and SAVI pipelines.
The ability of PMN to enable research is dependent on the accuracy of its data. We therefore evaluated the performance of PMN’s metabolic reconstruction pipeline both in its entirety and using only computational prediction. The manual pathway validation revealed a number of pathways predicted to be present outside of their known taxonomic range, such as momilactone’s predicted presence across Poaceae despite being known to exist only in rice and a few other species, some outside of Poaceae . While some of these results may reflect compounds that are, in fact, more widely distributed than currently thought, many such cases likely result from inaccurate prediction of enzymatic function by E2P2. The performance of enzyme function prediction using a sequence similarity approach can suffer when dealing with highly similar enzymes of a shared family . In cases like momilactone, where the pipeline has predicted the pathway in species closely related to species known to possess it, it may be the case that the predicted species do have most of the enzymes necessary to catalyze the pathway, but that one or a few of the predicted enzymes actually have a different function in vivo. This may draw attention to cases where enzymes have gained new functions and allow for exploration of how enzymes evolve. Meanwhile, cases of universal plant pathways being predicted only in Brassicaceae may indicate the pitfalls of an overemphasis on Arabidopsis in curation and research, as key enzymes might be predicted less reliably outside of this clade. This might be the case if there are Brassicaceaespecific variations that may result in a failure to reliably predict orthologs. A focus on curating information from diverse species may improve the accuracy of the computational prediction, requiring less semi-automated curation to fix such errors. Additionally, incorporating evidence from recently published species-specific metabolomics reference datasets may help corroborate PMN’s prediction of metabolites, for which there is currently little experimental support . Pathway misannotation in the naïve prediction pipeline could also be the result of PathoLogic’s incorrect integration of enzyme annotation with reference reactions.
In addition to incorporating enzyme predictions, PathoLogic can infer pathways for a given species based on a number of additional considerations. For example, if a species contains an enzyme which catalyzes a reaction unique only to one pathway in the PGDB, the pathway is likely to be predicted to be present. Additionally, if all reactions of a pathway are predicted to be present, the pathway is likely to be predicted as. Using PathoLogic without taxonomic pruning thus provides increased prediction sensitivity while also increasing false positives . By design, SAVI removes false-positive and adds false-negative pathways predicted by PathoLogic. Our analyses indicate that the predominant function of SAVI and PathoLogic’s taxonomic pruning currently is to remove false-positives and consequently restrict the taxonomic range of predicted pathways, consistent with previous analyses of SAVI’s performance . Interestingly, our manual pathway assessment revealed that, in certain cases, SAVI should have increased the range of a predicted pathway and added it to more species than it was predicted for by PathoLogic. For example, the phytol salvage pathway is predicted to be present in all photosynthetic organisms . While PathoLogic incorrectly restricted the predicted range of this pathway to include only angiosperms even without taxonomic pruning, SAVI did not correct this incorrect taxonomic restriction, nor did it assign the pathway to the few angiosperm species not predicted by PathoLogic to contain the pathway. Examples like this may represent errors in the manual curation decisions used by SAVI to make its correction, or it may reflect new information added to the literature after those curation decisions were made. Both possibilities represent important information in accurately representing metabolism across species and highlight the need to regularly update the curation rules upon which SAVI operates. We therefore reclassified the final pathway assignments in PMN 15 for each pathway whose classification after SAVI implementation was determined to be anything other than “Expected”. Through the continual process of introducing new species — and thus new pathways — into PMN, along with regular curation of those new pathway predictions, SAVI’s correction performance,black plastic plant pots and thus the overall value of data in PMN, should continue to improve over time. The results of the manual pathway validation suggest that additional systematic manual checks and validation may be productive. The manual validation reported here focused on the phylogenetic distribution of pathways, but this is only one aspect of the data found in PMN. Future reviews will focus on reviewing the previously-curated data in plant-specific pathways, both to review for accuracy and to check for research published after the pathway was last updated that may have been missed by curators when it was published. Semi-automated curation could also play a role; nearly half of PMN compounds, for example, do not have ChEBI links, and scripts could be written to identify ChEBI and other external links like this that should be added, to be vetted by curators before inclusion in PMN. PMN is organized primarily by species, and a significant component of the expansion over its history has been in the form of adding new species and subspecies to it. In order for this to be a worthwhile endeavor and useful to the plant biology research community, the species databases need to be meaningfully differentiated from one another in ways that accurately reflect their metabolic differences. Multiple correspondence analysis was therefore performed to determine whether related species would cluster together, an indication that underlying biology is driving the differences in their database contents.
The analysis revealed that some plant groups such as Brassicaceae, Poaceae, the green algae, and non-flowering plants each clustered together, showing that these major groups of plants can be readily differentiated based on their metabolic complements. Within the eudicots, however, there was little separation apart from the grouping of Brassicaceae. Other groups such as Rosaceae and Solanaceae did not separate from the other eudicots, even though both groups are known to have unique metabolism, suggesting that more research and curation on members of these groups is needed. This analysis also indicated that despite being represented by a number of PMN species, the unique metabolisms of these groups remain understudied. The separation of Brassicaceae from the other groups may reflect a more comprehensive body of knowledge about the metabolism of Arabidopsis due to its status as a model plant and, as a result, a larger number of Brassicaceae specific pathways being known than for compounds specific to other clades. This is reflected in the large percentage of pathways and enzymes in PMN that are curated to the species. The same might be true of the grasses, a clade that contains economically important crops such as maize, rice, wheat, and switch grass. These results suggest that study of representative members of a group could help differentiate the group as a whole and suggest that much of current knowledge is limited to common pathways. The focus on Arabidopsis in the database also carries a risk of biasing studies that utilize the PlantCyc database as a source, though as this reflects a similar bias in plant research in general this issue may not be limited to PlantCyc and PMN. More detailed studies of the metabolism of other groups are needed to understand what makes them unique. PMN has been used in a variety of ways by the plant research community. One common use is to find metabolic information about a specific area of metabolism, such as finding sets of biosynthesis genes for a particular compound or sets of compounds under study, or finding pathways associated with a set of genes highlighted by an experiment. Clark and Verwoerd used AraCyc to determine different biosynthetic routes for anthocyanin pigments and to predict minimal sets of genes which could be mutated to eliminate pigment production. Pant et al. performed metabolite profiling on phosphorus-deprived Arabidopsis wild type plants and phosphorus-signaling mutants. PMN was used to find genes in the biosynthetic pathways of metabolites which showed altered concentration in the mutants and P-deprived plants. Saptari and Susila examined the expression of hormone biosynthesis genes during somatic embryogenesis in Arabidopsis and rice. The authors used PMN to identify hormone biosynthetic genes and performed expression analysis on the identified gene set. Kooke et al. used AraCyc to identify genes involved in glucosinolate and flavonoid metabolism, and then examined the relationship between methylation of these genes and metabolic trait values. Uhrig et al. examined diurnal changes in protein phosphorylation and acetylation, and used PMN’s pathway enrichment feature to identify AraCyc pathways enriched for proteins associated with these protein modification events. A second common use of PMN is to study broader patterns in plant metabolism. Hanada et al. explored two rival hypotheses which attempt to explain the large number of Arabidopsis metabolic genes for which single mutantsshow weak or no phenotypes, and used data from PMN to determine the connectivity of different metabolites in the network. Chae et al. compared primary and specialized metabolism in plants and green algae and found that specialized metabolism genes have different evolutionary patterns from primary metabolism genes. Moore et al. used AraCyc in assembling lists of enzymecoding genes involved in primary and specialized metabolism, and then explored associations between various qualities and metrics of the genes and their involvement in primary or specialized metabolism. The PlantClusterFinder software was also used in that analysis. Song et al. set out to test the hypothesis that stoichiometric balance imposes selection on gene copy number. AraCyc pathways were used as a source of functionally-related gene groups to test for reciprocal retention.