Other insights into host range have been made in terms of plant immunological studies

While there are three distinctive subspecies of X. fastidiosa, and it would be desirable to be able to use those subspecies for management decisions, so far the subspecies have not been found to have sufficient resolution to define host range or to infer risk . Understanding the molecular basis of plant host specificity in X. fastidiosa is vital for predicting and acting upon host shifts, but these are processes yet to be described . Xylella fastidiosa is a member of the group Xanthomonadaceae, and phylogenetically clusters sister to Xanthomonas albilineans, technically within the paraphyletic genus Xanthomonas, although Xylella is considered a separate genus . Xylella spp., and Xanthomonas albilineans, are the only xylem limited Xanthomonadaceae and have convergently reduced genomes compared to the rest of the genus . Xylella also lacks a Type III Secretion system , a loss compared to its higher order taxonomic group. As the purpose of the T3SS in phytopathogens is to deliver effectors into living plant cells the loss has been hypothesized to be due to X. fastidiosa primarily interacting with non-living tissue; insect cuticle and mature xylem vessels . While the molecular basis of host range is not understood, cut flower bucket there are consistent patterns in the ability of particular X. fastidiosa isolates to infect specific plant hosts regardless of their environmental condition .

This implies that genetics, as opposed to only environmental conditions, underlie the relationship between isolates and plant hosts that allow for colonization. Recurring pathogen specificity to a particular host can be either explained through phylogenetic signal, where members of a clade have shared traits that allow for pathogenesis in that host, or by pathological convergence, where more distantly related strains have separately acquired mechanisms for virulence. Both processes have underlying genetics but each shows different phylogenetic patterns . Lastly, we have seen that deletion of rpfF, which controls cell-cell signaling via a diffusible signal factor , can expand the host range of X. fastidiosa . For example, removing the O-antigen from the exterior of X. fastidiosa cells allows the plant to quickly recognize X. fastidiosa and initiate immune responses, thus decreasing its likelihood of colonization of the plant . O-antigens are highly variable and evolve rapidly, and often are shown to have coevolutionary histories between symbiotic organisms as they are the first exposed part of any bacterium . In terms of phylogenetic methods, cophylogenies have shown no cospeciation between plant hosts and X. fastidiosa or any other congruence between the evolutionary histories of X. fastidiosa and its plant hosts . Based on the current data, it is not generally possible to tell if X. fastidiosa is undergoing host jumps or range expansions, however the data available so far suggests that both are occurring given that in certain situations we see strains able to infect multiple hosts while in other situations we see multiple strains co-existing in nature but no cross infections of hosts .

Using the influx of whole genome data generated in the past several years, we searched the genomes of X. fastidiosa for correlations with plant host species. The first method we pursued was conducting ancestral state reconstructions. Ancestral state reconstructions use genetic data , with a known phenotype for each taxon, to characterize the most likely state that each ancestral node of the tree would have possessed for the phenotype of interest. This tool has been used to understand host-pathogen interactions via ancestral state reconstructions in fungi and trematodes parasite systems . Ideally, we would be able to ask: what was the most likely ancestral host of the ancestor of all X. fastidiosa? If we can understand patterns in the past, it can help us better build models to predict future hosts based on the genomic changes associated with historical host shifts. Following the ancestral state reconstructions, we looked further into the pan-genome by calculating correlations between plant host types and the presence/absence of each gene. This study aimed to compare the commonly used genetic datasets available for phylogenetic analyses of X. fastidiosa both to compare phylogenetic topologies as well as ancestral host states from each dataset. We hypothesized that the pathogen phylogeny would be correlated with hosthistory and that we could observe this trend through ancestral state reconstruction. If there is no relationship between host and the phylogeny, there should not be conclusive ancestral state reconstruction results. We hypothesize that by using either the core genome of X. fastidiosa, pangenome phylogenetic tree, or both, it would be possible to estimate the likelihood of hypothetical plant hosts for ancestral nodes of inTherest .

This would show that the host is largely dependent and predictable based on the phylogeny of bacterial relationships and would lead to further pursuing allelic differences in core-genome and/or gene gain/loss in the pan-genome and estimate how either or both are correlated with plant host identity. While not biologically meaningful, since MLST data is still frequently used in X. fastidiosa management, we included that datatype in our analysis for comparison as well.Phylogenetic reconstruction of disparate regions and sizes are topologically similar The pan-genome of all sequences and the outgroup X. taiwanensis – Wufong1_PLS229 contained 17,024 genes . The alignment of MLST genes totaled 4,146 bp in length, while the core genome comprised 1,411 concatenated regions in a total of 354,816 bp. Non-recombinant regions identified with ClonalFrameML comprised only 32% of the core genome , leaving an alignment consisting of only 112,819 base pairs . The alignment contained 130 pairs of sequences that were completely identical to each other, highly reducing the amount of within subspecies differentiation that is possible with this dataset and creating large polytomies of indistinguishable sequences within subspecies fastidiosa as well as within subsp. pauca . Due to this lack of within-subspecies resolution, the phylogeny with recombinant regions removed is only suitable for between subspecies comparisons due to the extensive data loss in removing recombinant regions. The strains and locations in the alignment with recombination can be visualized in the supplemental materials . While between subspecies topologies are similar among the four trees generated, they are not identical. The core genome tree shows consensus of taxonomic division into three subspecies, however, subsp. sandyi and morus could be either part of subsp. fastidiosa or each their own small subspecies without affecting the monophyly of subsp. fastidiosa. . The non-recombinant tree is similar except that subsp. morus is clusThered within subsp. fastidiosa. The pan-genome splits the most basal of the three subspecies, subsp. pauca, into a paraphyletic cluster, however places multiplex, fastidiosa, morus, and sandyi similarly to the core phylogeny . The MLST tree shows subsp. morus as the outgroup to subsp. fastidiosa while subsp. sandyi falls within subsp. fastidiosa. The other difference among the four topologically similar trees is variation in branch length. Since the pangenome tree was built with gene presence/absence data, it was calculated in gene changes per site. Phylogeny and alignment information is summarized in Table 1. A 16S rDNA phylogeny was also built as a comparison , but the phylogeny provided very poor differentiation among strains .Within subspecies fastidiosa, the core genome rearticulates the three PD clades that were found in Castillo et al. . Within the clade defined as PD-III, the sequence similarity in the core has led to extensive polytomies, with many sequences indistinguishable in the core . The three PD clades are also articulated in the non-recombinant phylogeny, and the pan-genome phylogeny, however the MLST tree does not differentiate these clades from one another. Not poorly resolved, the MLST does have high bootstrap support for clades that conflict with trees constructed with core and pan genome trees, flower display buckets suggesting that using MLST genes has the potential to subvert the analysis of relationships between taxa, while showing strong bootstrap support. Within subspecies multiplex, there have typically been considered two groups, the non-recombining “non-IHR”, multiplex, as well as the recombining outgroup “IHR” multiplex .

The core genome tree as well as the MLST tree both articulate these two groups, the clade “non-IHR”, as well as the non-monophyletic recombining group, “IHR”. The non-recombinant tree and the pan-genome tree do not re-create these groupings . All phylogenies but the pan genome show a consistent split in subsp. pauca between the strains isolated from the Italian OQDS outbreak and the mixed host strains from Brazil. Within the OQDS strains, as well as several very closely related strains from Costa Rica, there is no clear resolution at this genomic scale. Within the Brazilian clade, strain Hib4 is the outgroup in all phylogenies except the MLST. subsp.Interrogating the results of the ancestral state reconstruction to the genus level of the core-genome phylogeny shows undetermined hosts at the deepest nodes . However, the ancestral node of the subspecies fastidiosa has a significant association with the plant genus Coffea, which persists throughout subspecies fastidiosa as the most likely ancestral host for all strains isolated from South and Central America. This changes for the Pierce’s disease of grapevines clade, where the ancestral host of all nodes except one is Vitis, the one exception being an ancestral Prunus node. Subspecies sandyi and morus are undetermined in ancestral hosts. Subspecies multiplex has a more dynamic history, with Vaccinium shown to be the most likely ancestral host for the subspecies, and then within the clade, a switch to a large group of nodes whose most likely host in Prunus, as well as two nodes depicting Platanus and Olea. Subspecies pauca does not have a determined ancestral host of the whole subspecies, and internal nodes switch several times between Citrus and Coffea, and once to Olea. In terms of the genera across the reconstructions, while the deep nodes , are often undetermined, there is more resolution within subspecies . The node that is consistent across the four reconstructions is that there is a high likelihood of the genus Coffea being the ancestral host of the node representing the introduction of subsp. fastidiosa from Central to North America. The genus Vaccinium was predicted as the most likely ancestral host of subsp. multiplex in the core genome phylogeny, whereas in the nonrecombinant phylogeny, the ancestor of all but one strain of subsp. multiplex is the genus Prunus. All four trees agree upon the ancestor of the internal “non-IHR” multiplex clade being Prunus. In terms of the transition models chosen for each reconstruction, most trees had lower AIC scores when using the equal rates model with fewer parameters than the symmetrical rates model, the exception being for the pan-genome super order reconstruction having a lower AIC score with the symmetric model than the equal rates .At the node representing the ancestor of the species X. fastidiosa, both the nonrecombinant core and pan-genome phylogenies predict that the clade Rosid is the most likely ancestral host . The core and MLST phylogenies predict Asterid to be the ancestral host, but at lower likelihoods of 87 and 78%, respectively, which are visualized, along with all likelihoods under 95%, as undetermined . There is enough discordance between reconstructions a consistent pattern at this host depth is unlikely.Bacteria isolated from the genera Coffea and Vitis as well as the super-orders Asterid and Rosid have X. fastidiosa genes with which they are significantly correlated; totaling 30 genes . Ten of these 30 genes are significantly correlated with both Asterids and Rosids, with paired, opposite relationships . Some correlations are of significance due to elevated presence of the gene among strains found in a particular host, while most are significant due to an absence of particular genes in the host of interest. Since lineage-specific interdependencies are accounted for with the phylogeny, the correlated genes are representative of convergent processes, either evolutionarily or via lateral gene transfer, not shared ancestry by descent. Genes that are significant mark repeated non-vertical descent changes in the pan-genome of strains in convergent patterns specific to the hosts of interest. While most identified genes are hypothetical proteins, genes shown to be correlated with host were fitB_1 system, involved in in-host migration, vbhT , socA , and a HTH-type transcriptional regulator .In this paper, we show that there is a genetic basis to the host range of X. fastidiosa. We demonstrate that both the phylogeny and gene gain and loss in the pan-genome are connected to plant host of the diverse species X. fastidiosa, and that an Asterid of undetermined genus was the most likely ancestral plant host of X. fastidiosa.