Modular Evolution of Secretion Systems and Virulence Plasmids in Agrobacterium PDF
Document Details
![HappySard4759](https://quizgecko.com/images/avatars/avatar-9.webp)
Uploaded by HappySard4759
University of Tehran
2022
Lin Chou, Yu-Chen Lin, Mindia Haryono, Mary Nia M. Santos, Shu-Ting Cho, Alexandra J. Weisberg, Chih-Feng Wu, Jeff H. Chang, Erh-Min Lai, Chih-Horng Kuo
Tags
Summary
This research article investigates the modular evolution of secretion systems and virulence plasmids within the Agrobacterium species complex. The authors utilize 35 genome assemblies to analyze genetic diversity and evolution of the complex, focusing on the type VI and type IV secretion systems. The study identifies variation in effector genes and highlights the importance of phylogeny-guided sampling for robust evolutionary genomics analysis.
Full Transcript
Chou et al. BMC Biology (2022) 20:16 https://doi.org/10.1186/s12915-021-01221-y RESEARCH ARTICLE Open Access Modular evolution of secretion systems and virulence...
Chou et al. BMC Biology (2022) 20:16 https://doi.org/10.1186/s12915-021-01221-y RESEARCH ARTICLE Open Access Modular evolution of secretion systems and virulence plasmids in a bacterial species complex Lin Chou1†, Yu-Chen Lin1†, Mindia Haryono1, Mary Nia M. Santos1,2,3, Shu-Ting Cho1, Alexandra J. Weisberg4, Chih-Feng Wu1,4, Jeff H. Chang4, Erh-Min Lai1,2,5 and Chih-Horng Kuo1,2,5* Abstract Background: Many named species as defined in current bacterial taxonomy correspond to species complexes. Uncertainties regarding the organization of their genetic diversity challenge research efforts. We utilized the Agrobacterium tumefaciens species complex (a.k.a. Agrobacterium biovar 1), a taxon known for its phytopathogenicity and applications in transformation, as a study system and devised strategies for investigating genome diversity and evolution of species complexes. Results: We utilized 35 genome assemblies, including 14 newly generated ones, to achieve a phylogenetically balanced sampling of A. tumefaciens. Our genomic analysis suggested that the 10 genomospecies described previously are distinct biological species and supported a quantitative guideline for species delineation. Furthermore, our inference of gene content and core-genome phylogeny allowed for investigations of genes critical in fitness and ecology. For the type VI secretion system (T6SS) involved in interbacterial competition and thought to be conserved, we detected multiple losses and one horizontal gene transfer. For the tumor-inducing plasmids (pTi) and pTi-encoded type IV secretion system (T4SS) that are essential for agrobacterial phytopathogenicity, we uncovered novel diversity and hypothesized their involvement in shaping this species complex. Intriguingly, for both T6SS and T4SS, genes encoding structural components are highly conserved, whereas extensive diversity exists for genes encoding effectors and other proteins. * Correspondence: [email protected] † Lin Chou and Yu-Chen Lin contributed equally to this work. 1 Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan 2 Molecular and Biological Agricultural Sciences Program, Taiwan International Graduate Program, National Chung Hsing University and Academia Sinica, Taipei, Taiwan Full list of author information is available at the end of the article © The Author(s). 2022 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Chou et al. BMC Biology (2022) 20:16 Page 2 of 20 Conclusions: We demonstrate that the combination of a phylogeny-guided sampling scheme and an emphasis on high-quality assemblies provides a cost-effective approach for robust analysis in evolutionary genomics. We show that the T6SS VgrG proteins involved in specific effector binding and delivery can be classified into distinct types based on domain organization. The co-occurrence patterns of VgrG-associated domains and the neighboring genes that encode different chaperones/effectors can be used to infer possible interacting partners. Similarly, the associations between plant host preference and the pTi type among these strains can be used to infer phenotype- genotype correspondence. Our strategies for multi-level investigations at scales that range from whole genomes to intragenic domains and phylogenetic depths from between- to within-species are applicable to other bacteria. Furthermore, modularity observed in the molecular evolution of genes and domains is useful for inferring functional constraints and informing experimental works. Keywords: Agrobacterium, Genome, Secretion system, Virulence, Plasmid, Molecular evolution Background provide information such as gene content or phylogen- Understanding bacterial biology, notably for purposes of etic relationships, which are critical in understanding tackling pathogenicity, requires the ability to identify biological entities. Second, phylogenetic relationships biological entities [1, 2] at the species level and to infer among closely related bacterial strains often cannot be their evolutionary relationships. However, many bacterial resolved with confidence, yet such information is funda- groups are currently unresolved and are classified as spe- mental for evolutionary analysis. Third, comparative cies complexes. These uncertainties regarding species genomics studies are often limited by taxon sampling boundaries hamper research, communication, and and/or assembly quality of available genome sequences. policy-making such as in healthcare guidelines, pathogen In this study, we utilized the Agrobacterium tumefaciens quarantine regulations, and biological resource manage- species complex, also known as Agrobacterium biovar 1 ment. Based on barriers to homologous recombination, , as the study system for developing strategies that an analysis of > 20,000 bacterial genome sequences from provide appropriate sampling and utilize multifaceted gen- 91 species belonging to 13 phyla found that 21 of the omic analysis to resolve species boundaries and to investi- previously recognized species comprise multiple bio- gate molecular evolution of key traits. These bacteria are logical species. These 21 groups include those that known as the causative agents of crown gall disease that are important as pathogens (e.g., Mycobacterium tuber- affects over 90 plant families. More importantly, the culosis, Pseudomonas aeruginosa, and Vibrio cholerae) or development of Agrobacterium-mediated transformation beneficial microbes (e.g., Lactobacillus casei and Sinorhi- has provided a critical tool for genetic manipulation in zobium meliloti). This finding highlights the ubiquity of plant sciences and agricultural biotechnology [14, 15]. species complexes across bacterial lineages, even for Due to their importance, this complex has been studied those that are extensively studied. for over a century and was found to harbor extensive For such complexes, comprehensive understanding of phenotypic and genetic diversity that continues to con- the genetic diversity organization is required for robust found efforts to resolve their taxonomy [12–14]. Various species delineation, which in turn is essential for provid- methods, such as DNA-DNA hybridization, biochemical ing a reliable framework to interpret experimental find- characteristics, and molecular markers have been used to ings and to gain insights into the biology. The use of define 10 genomospecies (i.e., G1-G9 and G13), which genomic information has long been suggested as a have continually been associated to new nomenclature, a powerful approach for defining species boundaries be- process that causes greater confusion than resolution [5, cause the comprehensive genetic information can pro- 6, 16–18]. For example, the reference strain C58 used in vide definitive and potentially quantitative guidelines. many A. tumefaciens studies [15, 19, 20] belongs to G8, However, several issues regarding genomic studies of for which the name Agrobacterium fabrum was proposed bacterial species have remained unresolved. First, while in 2011. This has resulted in mixed usage of two genomospecies defined by overall genome divergence names with different meanings (i.e., A. tumefaciens for the were suggested to represent distinct biological entities entire complex and A. fabrum for G8) in databases and lit- [5–9], the exact criteria for establishing the species erature. Compounding confusion, the name Agrobacter- boundaries are disputed. Although 95% average nucleo- ium radiobacter refers to A. tumefaciens G4 [18, 21] and tide identity (ANI) across the conserved parts of ge- is also a heterotypic synonym of A. tumefaciens. nomes was proposed as a universal boundary for Hereafter, we use A. tumefaciens in reference to the entire defining species in bacteria , this criterion was chal- species complex and specific designations (i.e., G1–G9 lenged. Additionally, ANI values alone do not and G13) for the genomospecies. Chou et al. BMC Biology (2022) 20:16 Page 3 of 20 Previous characterizations found that A. tumefaciens we identified 12 strains that represent six poorly charac- strains have multipartite genomes with one circular terized genomospecies (Table 1). Additionally, two Agro- chromosome, one linear chromosome, and highly vari- bacterium larrymoorei strains, which represent the most able plasmids [9, 23–25]. Consistent with the high levels closely-related sister lineage of A. tumefaciens [9, 18], of genetic divergence inferred from DNA-DNA were included as the outgroup. Whole-genome sequen- hybridization , cross-genomospecies comparisons typ- cing with substantial efforts in iterative improvements of ically found that < 80% of the genes are conserved [7, the assemblies based on experimental and bioinformatic 26]. Strikingly, > 32,000 horizontal gene transfer (HGT) approaches were conducted for these 14 strains. Add- events have been inferred to have shaped the evolution- itionally, we selected 21 representatives from the 98 A. ary history of A. tumefaciens. Because the HGT pat- tumefaciens genome assemblies available from GenBank terns indicated co-transfers of genes that encode (Table 1) to yield a dataset with maximal genetic di- coherent biochemical pathways, it was hypothesized that versity without emphasis on including pathogenic purifying selection on those acquired gene clusters and strains. To ensure balanced sampling, we selected be- overall gene content drove the ecological diversification tween two and five strains for each of the 10 recognized among genomospecies. Moreover, the oncogenic A. tumefaciens genomospecies. Importantly, 19 of these plasmids that determine Agrobacterium phytopathogeni- 35 assemblies, including nine produced in this study, are city exhibit complex modularity and transmission pat- complete and most others are nearly complete (i.e., aver- terns, which further contributed to the diversification of age N50 = 1.3 Mb; cf. the two chromosomes are ~ 2.9 these pathogens and their global spread. However, and ~ 2.3 Mb, respectively). despite the progresses, those better-characterized geno- Based on the homologous gene clustering results among mospecies (e.g., G1, G4, G7, and G8) and pathogenic these strains, we identified a core genome of 2093 single- strains were highly overrepresented in previous genom- copy genes, which correspond to ~ 40% of the genes anno- ics studies [7–9], and such biases may affect our under- tated in each individual genome sequence. Compared to standing of agrobacterial diversity and evolution. previous studies that conducted genome-based phylogen- To develop effective strategies for investigating bacterial etic analysis for Agrobacterium or higher taxonomic ranks species complexes such as A. tumefaciens, we started by [8, 9, 29], the more focused sampling in this study yielded performing targeted genome sequencing for strains in un- a higher core gene count by one-to-two orders of magni- derrepresented lineages to achieve a balanced taxon sam- tude. This increase in core gene count and the improve- pling of the study system. The sampling scheme was ment in taxon sampling allowed for the inference of a based on information from two previous phylogenetic well-resolved maximum likelihood phylogeny of the A. analyses of the A. tumefaciens species complex and its sis- tumefaciens species complex (Fig. 1A). Each of the 10 cur- ter lineages, one based on recA and the other based rently recognized genomospecies forms a distinct mono- on 24 conserved genes. We also limited analyses to phyletic clade with > 80% bootstrap support. Additionally, only high-quality assemblies, which enabled detailed ex- we identified two novel genomospecies, G21 and G22, aminations of replicon-level synteny and confident infer- each represented by a single strain. The pattern of overall ences of gene presence/absence. The global view of genome similarities exhibits a discrete multimodal distri- genomic diversity and resolved phylogeny provided a ro- bution that supports use of a 95% ANI cutoff for delineat- bust framework for focused investigations of the genetic ing bacterial species and quantifies the divergence of elements involved in key aspects of agrobacterial fitness the A. tumefaciens complex from its most closely related and ecology, namely the type VI secretion system (T6SS) sister lineage (Fig. 1B). for interbacterial competition [7, 27] and the virulence These 12 A. tumefaciens genomospecies were classified plasmids for phytopathogenicity [13–15]. Taken together, into seven groups based on 90% ANI and further the investigations, scaling from whole-genome, whole- assigned to three supergroups according to the phyl- replicon, gene clusters, individual genes, and intragenic ogeny (Fig. 1A). Based on the time-calibrated phylogeny protein domains, provided novel and detailed information reported in Weisberg et al. , the most recent common on the evolution and genetic diversity of bacteria import- ancestor (MRCA) of A. tumefaciens emerged ~ 48 mil- ant in plant pathology and biotechnology. Moreover, the lion years ago (Mya) with a 95% highest posterior dens- strategies developed in this work are applicable to the ity (HPD) interval of 38.5–58.0 Mya, the three study of other bacterial species complexes. supergroups diverged ~ 40 Mya (95% HPD interval 31.5–48.1 Mya), and most of the recognized genomospe- Results cies emerged ~ 2–7 Mya (95% HPD interval 1.2–9.7 Genome sampling, molecular phylogeny, and divergence Mya). The large 95% HPD intervals suggest uncertainties Based on existing knowledge of A. tumefaciens diversity regarding these time estimates. Regardless of the exact [5, 6, 16–18] and availability of genomic resources [7–9], divergence time, the inferred rapid radiation in the early Chou et al. BMC Biology (2022) 20:16 Page 4 of 20 Table 1 List of the genome sequences used in this study. These include 14 new genomes derived from this study and 21 additional representatives from GenBank. Two Agrobacterium larrymoorei strains are included as the outgroup. Species name abbreviations: At, Agrobacterium tumefaciens; Al, Agrobacterium larrymoorei Species Strain Accession Assembly Coding sequences Pseudo-genes Geographic origin Isolation source At G1 1D1108 GCF_003666425 Complete 5312 167 MD, USA Euonymus sp. At G1 5A GCF_000236125 50 contigs 5343 169 MT, USA Soil At G1 Ach5 GCF_000971565 Complete 5184 145 CA, USA Achillea ptarmica At G1 N2/73 GCF_001692195 29 5345 157 OR, USA Cranberry scaffolds At G1 S56 GCF_900014385 6 scaffolds 5414 218 ? Plant At G2 CFBP5494 GCF_900013495 5 scaffolds 5469 233 France Human At G2 CFBP5496 GCF_005144405 9 contigs 5137 223 France Human At G2 CFBP5875 GCF_005221365 Complete 4570 146 Belgium Ditch water At G3 CFBP6623 GCF_005221385 Complete 5081 180 France Antiseptic flask At G3 CFBP6624 GCF_005221425 Complete 5157 148 France Human At G4 183 GCF_004023565 Complete 5051 199 Tunisia Prunus dulcis At G4 186 GCF_002591665 Complete 5298 207 CA, USA Juglans regia At G4 12D1 GCF_003667905 Complete 5005 167 ? ? At G4 1D1460 GCF_003666445 Complete 5290 269 CA, USA Rubus sp. At G5 CFBP6625 GCF_005221465 Complete 5446 198 France Food At G5 CFBP6626 GCF_005221445 Complete 5080 199 France Human At G5 F2 GCF_000219665 8 contigs 5085 135 Harbin, China Soil At G6 CFBP5499 GCF_005221325 Complete 5654 240 South Africa Dahlia sp. At G6 CFBP5877 GCF_005221345 Complete 5463 236 Israel Dahlia sp. At G7 1D1609 GCF_002943835 Complete 5539 254 CA, USA Medicago sativa At G7 CFBP4996 GCF_005144435 10 contigs 5741 213 UK Flacourtia ramontchi At G7 CFBP7129 GCF_005221405 Complete 5927 285 Tunisia Pyrus communis At G8 12D13 GCF_003667945 Complete 5083 178 ? ? At G8 1D132 GCF_003667725 Complete 5176 169 CA, USA Cerasus pseudocerasus At G8 ATCC31749 GCF_002916755 4 contigs 4560 811 China Plant At G8 C58 GCF_000092025 Complete 5355 29 NY, USA Cerasus pseudocerasus At G9 CFBP5506 GCF_005144495 15 contigs 4236 157 Australia Soil At G9 CFBP5507 GCF_005144505 16 contigs 5299 253 Australia Soil At G9 GBBC3283 GCF_007002765 48 contigs 4946 203 Belgium Solanum lycopersicum At G13 CFBP6927 GCF_900012615 9 scaffolds 4722 94 France Rhizospheric soil from Prunus persicae At G13 S2 GCF_000723345 60 contigs 5574 167 ? ? At G21 MAFF210266 GCF_007002865 26 contigs 5199 162 Japan Cucumis melo At G22 KCJK1736 GCF_001641425 41 contigs 4933 162 FL, USA Bos taurus feces Al CFBP5473 GCF_005145045 Complete 4777 142 FL, USA Ficus benjamina Al CFBP5477 GCF_005144425 24 contigs 4657 95 Italy ? history of A. tumefaciens as shown by the short branch framework for our downstream examination of gene and lengths likely prevented confident resolution of those domain phylogenies. deeper relationships in previous studies [8, 9, 29]. With The high levels of assembly completeness provided improvements in the taxon sampling of this work, we confident inference of gene content comparisons. The observed ~ 70% bootstrap support for those early nodes principal coordinate analysis and hierarchical clustering (Fig. 1A). This organismal tree provides a strong results indicated that all 12 A. tumefaciens genomospecies Chou et al. BMC Biology (2022) 20:16 Page 5 of 20 Fig. 1 Relationships among representatives of the Agrobacterium tumefaciens species complex. The sister species Agrobacterium larrymoorei (A. l.) is included as the outgroup. A Maximum likelihood phylogeny based on a concatenated alignment of 2093 single-copy genes shared by all 35 genomes (635,594 aligned amino acid sites). Bootstrap support values in the range of 60–80% are labeled. Strains with complete genome assemblies are highlighted with an asterisk (“*”). The genomospecies assignments (i.e., G1-G9, G13, and G21-G22) are labeled to the right of strain names. The three A. tumefaciens supergroups are indicated by the colored background of the genomospecies assignments. Information to the right of the genomospecies assignments shows the grouping of genomes according to different cutoff values of genome-wide average nucleotide identity (ANI), the presence/absence of type VI secretion system (T6SS)-encoding gene cluster (green: present; white: absent), copy number of vgrG (white background: absent), number of plasmids, and the tumor-inducing plasmid (pTi) type based on k-mer profile (white background: absent). B Pairwise genome similarities based on the percentages of genomic segments mapped and the ANI values Chou et al. BMC Biology (2022) 20:16 Page 6 of 20 are similar to one another while distinct from A. larry- state in the MRCA of the A. tumefaciens complex, pres- moorei (Additional file 1: Figure S1A and S1C). Nonethe- ence of the T6SS genes appears to be a more parsimoni- less, with the exception of G4 and G7, these ous hypothesis based on the presence of these genes in genomospecies are distinguishable based on gene content the outgroup (Fig. 2). However, the lack of synteny con- (Additional file 1: Figure S1B and S1D). This finding sug- servation between the linear chromosomes of A. tumefa- gests that despite the extensive HGT inferred within this ciens and A. larrymoorei and the variable locations of complex , the genomospecies defined by 95% ANI likely vgrG homologs (Additional file 1: Figure S2) suggest that represent distinct biological entities. multiple independent origins are also possible. For broader scales, the T6SS genes have a patchy distribu- Diversity and evolution of the T6SS genes tion among Rhizobiaceae [33, 36], indicating that these The confident inference of organismal phylogeny and genes are not essential for these bacteria and have high gene content afforded by high-quality genome assem- rates of gains and losses. Consistent with this, multiple blies provided a robust framework for evolutionary ana- pseudogenes confirmed by manual curation of annota- lysis of key traits, particularly for those influence by the tion were found (e.g., tssH in S56 and tssA/vgrG in absence or loss of genes. We first focus on genes encod- ATCC31749) (Fig. 2), suggesting that for some strains ing the T6SS, a phage tail-like contractile nanomachine these gene clusters are in the process of degradation and commonly found in Proteobacteria and used to inject ef- will be eventually lost. fectors into eukaryotic or bacterial cells. The T6SS has Examination of synteny revealed that the imp operon, major roles in pathogenesis, symbiosis, and interbacterial which encodes the majority of T6SS structural compo- competition [30–33]. For A. tumefaciens, the T6SS is a nents , is more conserved in gene composition and key weapon for in planta competition between different order than the hcp operon, which often has different genomospecies and against other bacteria. Thus, genes downstream of vgrG (Fig. 2). This genetic diversity investigating the diversity and evolution of T6SS genes may play a key role in interbacterial competition because may shed light on a trait that influences the ecology and genes downstream of vgrG include those that encode ef- evolution of A. tumefaciens. fector and immunity (EI) protein pairs. The agrobac- In a previous study that examined four A. tumefaciens terial T6SS effectors often correspond to different genomospecies, T6SS-mediated anti-bacterial activity toxins, and the cognate immunity proteins provide pro- was observed for all 11 strains sampled and thought to tection against self-intoxication [7, 27, 33, 36]. The rapid be a conserved trait of this species complex. To our evolution of EI gene pairs is illustrated by three exam- surprise, among the 33 A. tumefaciens strains examined ples. First, despite the low levels of sequence divergence in this work, a patchy distribution of the T6SS genes among G1 strains (i.e., > 98% ANI in all pairwise com- was observed (Fig. 1). Gene absences are in strains cor- parisons), different genes are found downstream of their responding to previously under-characterized genomos- vgrG homologs and this variation is not consistent with pecies and were confirmed by examining syntenic either the species phylogeny or the T6SS core gene phyl- regions and using TBLASTN to search entire gen- ogeny. Second, strain CFBP4996 of G7 has homologs of ome sequences. For strains encoding a T6SS, corre- the same EI gene pair as strains 12D1 and 183 of G4, ra- sponding genes are consistently located on the linear ther than with other members of G7. Third, in both G3 chromosome and mostly form a cluster of ~ 20 genes or- strains, vgrG and the associated EI genes are located ganized as two adjacent and oppositely oriented imp and elsewhere on the linear chromosome, rather than being hcp operons [7, 35] (Fig. 2). Some strains harbor a part of the hcp operon (Fig. 2 and Additional file 1: accessory loci containing vgrG (involved in effector de- Figure S2). These results suggest that recombination in- livery) and other T6SS genes located elsewhere on the volving gene modules has contributed to the genetic di- linear chromosome [7, 33, 35] (Additional file 1: Figure versity of these A. tumefaciens T6SS EI gene pairs. S2). The T6SS gene phylogeny is largely congruent with the species tree (Fig. 2). One notable exception is that Modularity of VgrG and its associated EI pair the MRCA of G1 appears to have acquired the T6SS The knowledge that vgrG homologs encode proteins with genes from a G8-related donor. Consistent with this in- distinct C-terminal domains responsible for binding speci- ference, the T6SS genes in G1 strains are located in a ficities of different T6SS effectors for delivery suggested different chromosomal location compared to strains of that each vgrG homolog and its downstream EI gene pair other genomospecies (Additional file 1: Figure S2). Based may evolve as a functional module [33, 36]. Here, we on these observations, it is likely that a T6SS gene clus- sought to investigate the patterns of gene co-occurrence ter was present in the MRCA of G8-G6-G14-G2-G4- and intra-module recombination to better understand the G7-G9-G3-G15 and at least two independent losses have diversity and evolution of these genes. For in-depth inves- occurred in G6 and G14-G2. Regarding the ancestral tigation of vgrG evolution, we began by examining domain Chou et al. BMC Biology (2022) 20:16 Page 7 of 20 Fig. 2 Phylogeny and organization of the T6SS gene cluster. The maximum likelihood phylogeny was inferred based on a concatenated alignment (5960 aligned amino acid sites) of 14 core T6SS genes, including tagE, tagF, tssM, tssL, tssK, tagH, tssG, tssF, tssE, tagJ, tssC40, tssC41, tssB, and tssD. Two other core genes, tssA and tssH, are excluded because some homologs are pseudogenized. Genes downstream of tssD (e.g., tai, tae, and vgrG) are excluded due to variable presence. The species phylogeny on the right is based on Fig. 1. Genes are color-coded according to annotation, and syntenic regions are indicated by gray blocks architecture of VgrG proteins and uncovered eight distinct Intriguingly, despite conservation of domain architecture domains (Additional file 1: Figure S3). Based on differ- within each subtype (Fig. 3), the phylogenies inferred from ences in domain composition, the 44 vgrG homologs, in- the three domains encoded by all vgrG homologs do not cluding 17 associated with the main T6SS gene cluster have the same topology (Additional file 1: Figure S5). For and 27 associated with accessory loci, were classified into domain 1, sequences from the same subtype do not always six major types and nine subtypes (Fig. 3). Only three of form monophyletic groups. For domains 2 and 3, the short the domains are present in all VgrG variants. The N- sequence lengths limited the phylogenetic resolution; none- terminal domain 1 is the most conserved (Additional file theless, low divergence within the same subtype and high di- 1: Figure S3) and the only one found in databases. This vergence between subtypes were observed. These patterns domain corresponds to TIGR03361, which accounts for ~ suggest that each domain-encoding region evolved inde- 66–77% of the protein length and the bulk of the struc- pendently and can recombine between subtypes. tures that forms a trimeric complex analogous to a phage At the level of gene cluster organization, vgrG homo- tail spike [38, 39] (Additional file 1: Figure S4). It is worth logs within a subtype can have distinct downstream noting that the C-terminal end of domain 1 was identified genes (e.g., A1 and B1), regardless of whether they are as a recombination hotspot in a related study on agrobac- associated with the main T6SS gene cluster (Fig. 4 and terial T6SS genes. For domain 5 that was found in all Additional file 2: Dataset S1). These findings suggest vgrG homologs belonging to subtypes A1-A3 and E1 (Fig. that in addition to domain shuffling among vgrG homo- 3), the presence of this domain is perfectly correlated with logs , recombination also facilitated novel vgrG-ef- the presence of a downstream DUF4123-domain- fector pairings in the evolution of these T6SS genes. containing gene (Fig. 4). Because this DUF4123 domain When T6SS diversity was examined in a phylogenetic acts as an adaptor/chaperone for effector loading onto context, numbers, and types of vgrG homologs, as well VgrG in A. tumefaciens and Vibrio cholerae [41, 42], as their linked EI genes, lack strong correlations with this strong co-occurrence suggests specific interactions species phylogeny (Fig. 5). Based on our manual curation between VgrG domain 5 and DUF4123. Thus, combining of vgrG-associated genes, a total of 63 putative effector domain analysis with gene co-occurrence provides a new genes were identified (Additional file 2: Dataset S2). strategy for predicting the interacting domains of VgrG Among these, peptidoglycan-targeting toxins and nucle- with other T6SS components. ases are the two most commonly found categories with Chou et al. BMC Biology (2022) 20:16 Page 8 of 20 Fig. 3 Domain organization and classification of vgrG homologs. Six major types with two types having subtypes (A1-3 and D1-2) within them were identified and labeled on the right. For each homolog, the genomospecies assignment is provided in square brackets, followed by the locus tag. The gene names are provided in parenthesis for those functionally characterized homologs (i.e., vgrG1-2 for C58 homologs and vgrGa-d for 1D1609 homologs) Chou et al. BMC Biology (2022) 20:16 Page 9 of 20 Fig. 4 Gene neighborhoods of vgrG homologs. The grouping and labeling of vgrG homologs are based on the convention used in Fig. 3. For each vgrG homolog, three upstream genes are plotted to illustrate if it is associated with the main T6SS gene cluster or not, and 10 downstream genes are plotted to illustrate putative effector/immunity genes. Two A1-type homologs from the strain S56 (i.e., AGR1B_RS27940 and AGR1B_RS27960) are in close association with each other and plotted together 21 each. This finding is consistent with an investigation et al. , EI4/11 in Wu et al. , EI7 in Santos et al. of > 1000 T6SS effectors sampled from 466 species in , and EI2/5/8/9/10 in Wu et al.. the phylum Proteobacteria , which also found these two as the dominant categories and suggested that they The virulence plasmids and associated genes play complementary roles in T6SS-mediated competi- The tumor-inducing plasmids (pTi) are an important tion. Based on our homologous gene clustering results, component of A. tumefaciens genomes. These large these 63 putative effector genes were classified into 20 accessory replicons harbor the virulence (vir) regulon families (Fig. 5). Among these, 11 were experimentally genes that encode the Vir proteins and type IV secretion validated in previous studies, including EI1/6/15 in Ma system (T4SS) for processing and delivering a transfer Chou et al. BMC Biology (2022) 20:16 Page 10 of 20 Fig. 5 Phylogenetic distribution of vgrG homologs and T6SS effector/immunity genes. The species tree is based on Fig. 1. Gene presence is illustrated with colored cells in the heatmap, gene copy numbers are labeled when applicable DNA (T-DNA) into plant cells, and are essential for Figure S6), and sequence divergence of core genes (Add- agrobacterial phytopathogenicity [13–15]. Among the 35 itional file 1: Figure S7). Moreover, their T-DNA regions strains examined, we identified 15 pTi sequences (Table also differ from the typical sizes of ~ 18–26 kb observed 2). Two novel putative pTi (i.e., pTiCFBP4996 and in types I–III pTi (Fig. 6). For the tumorigenic strain pTiCFBP5473) were found in the 14 newly sequenced CFBP5473 (Additional file 1: Figure S8), the predicted strains. This efficiency of discovering novel pTi types is T-DNA border sequences flank an exceptionally large surprising, given our previous study that defined pTi (~ 93 kb) region that contains all of the vir regulon genes types I–VI was based on extensive sampling of diverse in addition to the typical T-DNA-associated genes (e.g., historical collections containing 162 Agrobacterium synthesis of opine and plant hormone) (Fig. 6). This re- strains. This finding demonstrates the importance flects either a translocation of a T-DNA border sequence and usefulness of a phylogeny-guided approach for in- or reliance on a non-canonical T-DNA border sequence vestigating genetic diversity. Our recent examination of that we could not identify. For pTiCFBP4996, its 7-kb > 4000 Rhizobiaceae plasmids sampled from 1251 strains T-DNA is predicted to contain only four genes (i.e., two representing 222 species-level taxa assigned these two correspond to opine synthesis and two encode hypothet- novel putative pTi to types X and XI. Both types are ical proteins). Plant hormone synthesis genes, which are rare; type X is found in CFBP5473 and only one other A. necessary to cause visible disease symptoms, were not larrymoorei strain (AF3.44), CFBP4996 is the only strain identified in this predicted T-DNA region or elsewhere that harbors a type XI plasmid. These two novel pTi are on this plasmid. Consistent with predictions, strain distinctive in their large sizes (Table 2), gene CFBP4996 did not induce tumor formation when inocu- organization (Fig. 6), gene content (Additional file 1: lated onto stems of tomato plants (Additional file 1: Chou et al. BMC Biology (2022) 20:16 Page 11 of 20 Table 2 List of the pTi sequences analyzed. These include 15 from the genome data set listed in Table 1 and five additional representatives downloaded from GenBank. The pTi type assignments are based on k-mer profiles. A type V pTi from Agrobacterium vitis is included as the outgroup. Species name abbreviations: At, Agrobacterium tumefaciens; Al, Agrobacterium larrymoorei; Av, Agrobacterium vitis Species Strain pTi pTi type Opine type Accession Size (bp) Coding sequences Geographic origin Isolation source At G1 1D1108 pTi1D1108 I.b ? NZ_CP032925 176,213 159 MD, USA Euonymus sp. At G1 Ach5 pTiAch5 II Octopine NZ_CP011249 194,264 153 CA, USA Achillea ptarmica At G1 183 pTi183 I.b ? NZ_CP029048 192,674 173 Tunisia Prunus dulcis At G4 186 pTi186 I.b ? NZ_CP042277 177,577 159 CA, USA Juglans regia At G4 1D1460 pTi1D1460 I.a ? NZ_CP032929 214,233 198 CA, USA Rubus sp. At G4 MAFF301001 pTiSAKURA I.b Nopaline NC_002147 206,479 195 Japan Cerasus pseudocerasus At G6 CFBP5499 pTiCFBP5499 III ? NZ_CP039893 220,025 178 South Africa Dahlia sp. At G6 CFBP5877 pTiCFBP5877 III ? NZ_CP039902 220,025 181 Israel Dahlia sp. At G7 1D1609 pTi1D1609 II Octopine NZ_CP026926 166,117 138 CA, USA Medicago sativa At G7 CFBP4996 pTiCFBP4996 XI ? NZ_CM016551 605,495 521 UK Flacourtia ramontchi At G7 CFBP7129 pTiCFBP7129 I.b ? NZ_CP039927 189,955 176 Tunisia Pyrus communis At G8 1D132 pTi1D132 I.b ? NZ_CP033026 177,577 160 CA, USA Cerasus pseudocerasus At G8 C58 pTiC58 I.a Nopaline NC_003065 214,233 197 NY, USA Cerasus pseudocerasus At G9 CFBP5507 pTiCFBP5507 I.a ? NZ_CM016546 248,634 225 Australia Soil ? Bo542 pTiBo542 III Agropine NC_010929 244,978 223 Germany Dahlia sp. ? Chry5 pTiChry5 III Chrysopine KX388536 197,268 210 FL, USA Chrysanthemum sp. ? EU6 pTiEU6 I.b Succinamopine KX388535 176,375 194 CT, USA Euonymus sp. Al CFBP5473 pTiCFBP5473 X ? NZ_CP039694 404,101 360 FL, USA Ficus benjamina Al CFBP5477 pTiCFBP5477 I.b ? NZ_CM016547 190,036 169 Italy ? Av S4 pTiS4 V Vitopine NC_011982 258,824 193 Hungary Vitis sp. Figure S8). Considering that pTiCFBP4996 encodes all among pTi types were observed (Fig. 7). For example, essential vir genes required for T-DNA processing and while all pTi harbor a conserved virE3a that facilitates transfer, CFBP4996 may serve as a naturally disarmed T-DNA protection and entry into host , type I pTi strain capable of T-DNA transfer without causing harbor one or two additional copies of virE3 that belong diseases. to different sequence types (i.e., sharing the same anno- For replicon-level comparisons, types II and III pTi are tation but classified as distinct homologs due to se- more similar to each other than to type I pTi based on quence divergence). Similarly, virF, which encodes an F- gene content (Additional file 1: Figure S6) and core gene box protein that is a putative host-range determinant phylogeny (Additional file 1: Figure S7). Within type I, , can be classified into three sequence types with dis- the two subtypes (i.e., I.a and I.b) are distinguishable by tinct distributions. Those less well-characterized vir gene content (Additional file 1: Figure S6) but do not genes, such as virD3 [46, 47], virJ , and virP, are also form mutually exclusive clades in the core-gene phyl- distributed differently among pTi types. Finally, in ogeny (Additional file 1: Figure S7). All putative pTi, in- addition to the presence/absence of individual genes, the cluding the two novel types and pTiS4 (i.e., type V from overall organization of vir regulons also differ among distantly-related Agrobacterium vitis), contain genes for these pTi (Additional file 1: Figure S9). All type I pTi are the T4SS that mediates T-DNA transfer into plant cells conserved in sharing a ~ 40-kb region that contains all (i.e., virB1-B11 and virD4) and the corresponding two- vir genes. In comparison, locations of vir genes are more component regulatory system (i.e., virA and virG) (Fig. variable among type II/III pTi; virF/P (and virQ/H if 7). Strain 1D1609 is a notable case because its virA and present) are located ~ 5–50 kb away from the main vir virJ are located on another plasmid, rather than the pTi gene cluster. Other than vir genes, the gene content and. For the other vir regulon genes, several differences organization of T-DNA are also different (Fig. 8). All Chou et al. BMC Biology (2022) 20:16 Page 12 of 20 Fig. 6 Molecular phylogeny and global alignment of pTi. The maximum likelihood phylogeny was inferred based on the concatenated alignment of 21 core genes and 8534 aligned amino acid sites (Supplementary Figure S6A). The species phylogeny on the right is based on Fig. 1. The pTi sequences derived from this study are highlighted in bold. For those with relevant information available, the genomospecies assignments are indicated in square brackets and the opine types are indicated in parentheses. For the alignment, all plasmids are visualized in linear form starting from the replication genes. The key gene clusters are color-coded according to functions, and the predicted T-DNA regions are indicated by the black horizontal bracket above each plasmid. Syntenic regions are indicated by gray blocks type I pTi have one single T-DNA region, while types II [8, 10]. Regardless of the exact mechanisms, these pat- and III have two and type V have four, respectively terns supported application of the biological species con- (Fig. 6). Within the T-DNA regions, the plant hormone cept, which is based on genetic barriers, to these A. synthesis genes (i.e., tms1/iaaM, tms2/iaaH, and ipt) are tumefaciens genomospecies and other bacteria. With the most conserved ones, while others are more variable the continuing drop in sequencing cost, ANI analysis (Fig. 8). Taken together, this genetic variation may con- can serve as a standard approach for accurate classifica- tribute to the host range differences observed among tion of additional strains , which in turn could facili- strains harboring different types of pTi. For example, tate research and communication, and ideally leads to based on a comparison among > 100 strains, there are improvements in bacterial taxonomy for basic works and strong associations between type I and III pTi with applications. woody and herbaceous plants, respectively. More- It is worth noting that despite our effort, the 33 strains over, previous tumorigenesis assays demonstrated that included in this study do not fully capture the diversity strains harboring types I and II pTi tend to have higher of the A. tumefaciens species complex. For example, the virulence against Brassicaceae and Asteraceae hosts, re- species Agrobacterium arsenijevicii , Agrobacterium spectively. nepotum (G14) , Agrobacterium viscosum (G15) , and two unnamed genomospecies G19/G20 are also Discussion members of this species complex. As more strains are Biological entities at the species level and above characterized in the future, it is likely that higher levels Based on the divergence of core gene sequences (Fig. of diversity within this group will continue to be discov- 1A) and gene content (Additional file 1: Figure S1), 95% ered. Furthermore, several potentially confusing issues ANI is a reliable approach for defining species within regarding Agrobacterium taxonomy remain to be re- the A. tumefaciens complex, as is the case for most other solved. For example, strain MAFF210266 that we re- bacteria. The discrete multimodal distribution of ferred to as a representative of G21 shares 98% ANI genome similarities (Fig. 1B) suggested that there are with an important strain K599 (=NCPPB2659). genetic barriers between different genomospecies, which However, K599 harbors a root-inducing plasmid (pRi) may be explained by neutral processes and/or selection associated with hairy root disease and was named as Chou et al. BMC Biology (2022) 20:16 Page 13 of 20 Fig. 7 Distribution of key vir genes among the putative pTi. The pTi sequences derived from this study are highlighted in bold. For those with relevant information available, the genomospecies assignments are indicated in square brackets and the opine types are indicated in parentheses. Gene presence and absence are indicated by filled and empty circles, respectively. The main vir genes include the components of the type IV secretion system (T4SS; virB1-B11 and virD4). For virE3, the three sequence types are listed separately. For virF and virD3, the sequence types are labeled inside the circles. For 1D1609, the genes virA and virJ are located on another plasmid and plotted as absent in this figure. The locus tags are provided in Supplementary Dataset S1B Agrobacterium rhizogenes (a.k.a. Agrobacterium biovar 2 could be due to biases in the sampling of available ge- or G10 , all deprecated synonyms for Rhizobium nomes, or the all-against-all pairwise comparisons in- rhizogenes), which is phylogenetically divergent from the cluded mostly distantly-related species. In our study A. tumefaciens species complex. Recently, during the re- that provided a detailed examination of closely related vision of this work, the name Agrobacterium G21 was species, the ~ 85–93% ANI among A. tumefaciens geno- independently proposed and strain K599 was reclassified mospecies (Fig. 1B) indicated that the A. tumefaciens to G21.This example highlights the dynamic nature complex is indeed a coherent entity with high divergence of pTi/pRi transmission within the agrobacteria-rhizobia from its closest sister lineage within the same genus (i.e., complex and advocates a classification scheme based A. larrymoorei). The driving forces for maintaining spe- on genome-wide ANI, rather than plasmid content or cies complexes and the prevalence of such above-species phenotype. level entities are interesting questions that require fur- At the above-species level, A. tumefaciens genomospe- ther investigations. For the A. tumefaciens complex, al- cies exhibited some intriguing patterns of genome diver- though the nomenclature originated from its gence. In a previous study that compared ~ 90,000 phytopathogenicity, it is well-established that this group prokaryotic genomes, it was extremely rare to find ANI contains both pathogenic and non-pathogenic strains values in the range of 82–96%. In other words, that differ in the possession of an oncogenic plasmid strains either belong to the same natural biological entity (i.e., pTi) or not. The promiscuous nature of their at the species level and have > 95% ANI, or belong to pTi [9, 56–58] (Fig. 6) suggested that lineages within this different species and have < 82% ANI. This observation complex may experience frequent transitions between Chou et al. BMC Biology (2022) 20:16 Page 14 of 20 Fig. 8 Organization of the transfer DNA (T-DNA) on tumor-inducing plasmids (pTi). Two unusual putative pTi sequences (i.e., pTiCFBP4996 and pTiCFBP5473) are excluded, and other pTi sequences derived from this study are highlighted in bold. For those with relevant information available, the genomospecies assignments are indicated in square brackets and the opine types are indicated in parentheses. Genes are color- coded according to annotation, syntenic regions are indicated by gray blocks pathogenic and non-pathogenic lifestyles, and such taxonomy. Such situations demonstrated the chal- shared ecological niches may be the force that maintains lenges of establishing a new standardized taxonomy even the coherence of this species complex. Compared to sis- when the practical issues of transitioning from the ter lineages (e.g., A. larrymoorei and A. rubi), the more current taxonomy are not considered, and perhaps is to diverse host range of A. tumefaciens [12, 13] may be be expected given the highly variable evolutionary rates linked to the higher diversity of pTi types , which across different lineages. Given these considerations, may have facilitated the divergence of this complex into other aspects of biology (e.g., physiology, ecology) may multiple genomospecies. To test this hypothesis, better play more important roles in defining those higher taxo- sampling of these sister lineages is required. nomic ranks. At the genus level and above, genome-based classifica- tion is more challenging. The ANI approach is expected Units and modularity of molecular evolution to have limited resolution when nucleotide sequence For evolutionary studies, the levels at which selection identities are below ~ 80%. Moreover, the fractions and other processes operate on have been a topic that of genome sequences alignable for ANI value calculation received much attention. For prokaryotes, levels are highly variable for genus-level comparisons , from the entire genome to individual functional domains which raises concerns on the robustness of applying the within genes are of particular interest. Based on our re- ANI method to higher taxonomic ranks. To resolve this sults, all of these levels must be considered to compre- challenge, analysis of protein sequence divergence hend the complex patterns. among core genes was proposed as a suitable approach At the whole-genome level, the clear species boundar-. However, while it may be desirable to establish a ies based on overall similarity (Fig. 1 and Additional file standardized taxonomy with a defined range of genomic 1: Figure S1) suggested that the entire genome largely divergence for each taxonomic rank, large variations in evolves as a single coherent unit. This result is consist- the divergence values at a given rank were observed ent with previous findings that at the global level HGT among different taxonomic groups in previous attempts has very little impact on the reconstruction of organis- [59–61]. These variations created situations where some mal phylogeny [64, 65], despite the extensive HGT in- families contain higher divergence levels than some or- ferred in bacterial evolution [65–68] and the importance ders or lower divergence levels than some genera, even of HGT in adaptation [69–71]. A possible explanation after normalization and a full revision of the current for these seemingly conflicting observations is that most Chou et al. BMC Biology (2022) 20:16 Page 15 of 20 of the acquired genes are lost quickly , presumably that may reflect functional constraints were observed at finer due to the strong mutational bias towards deletions ob- scales. For example, each vgrG is linked to its cognate ef- served in bacteria [72, 73]. Additionally, acquired genes fector/chaperone genes, and each effector is linked to its are subjected to the selection that drives species diversi- cognate immunity gene. Such modularity is expected to be fication , which is expected to act on all genes in a maintained by selection, similar to the observations regard- genome together. ing co-transfers of genes involved in associated biochemical At the level of individual replicons, chromosomes and pathways. However, the linkage could be broken down plasmids certainly have distinct evolutionary histories (Fig. by recombination at within- or between-species levels, as 6). Because novel chromosome/plasmid combinations may evident in the diversity of vgrG gene neighborhoods, even lead to speciation , and the spread of plasmids has im- for those homologs belonging to the same type (Fig. 4). portant implications on the evolution of virulence [9, 74] Fourth, at the level of individual genes, the within-genome and antimicrobial resistance , further investigations on diversity of vgrG homologs (Fig. 5) provides further support the evolution of plasmids and their compatibilities with to the hypothesis that HGT is more important than duplica- chromosomes are important [9, 76, 77]. Additionally, for tions in driving gene family expansions in bacteria. Fi- bacteria with multiple chromosomes, examining the evolu- nally, at the intra-gene level, within- or between-species tion of individual chromosomes may provide novel insights. recombination may be important in generating novel com- In the case of A. tumefaciens, the multipartite genome was binations of domains, thus promoting the diversification of hypothesized to originate from intragenomic gene transfer homologs (Fig. 3 and Additional file 1: Figure S6). from the ancestral circular chromosome to a plasmid, Taken together, these observations illustrated the followed by linearization of this plasmid to form the second- complexity of biological systems. While it is difficult to ary chromosome [23, 78]. The secondary chromosome is draw up generalized rules or to estimate the relative im- known to exhibit higher levels of divergence in overall portance of each evolutionary process at different levels, organization, gene content, and sequences [23, 26]. In this it is important to consider and examine these complex- regard, it is curious to note that the apparently rapid- ities to better understand organisms of interest. evolving T6SS genes are all located on the secondary chromosome, rather than the more conserved primary chromosome. For future studies, it may be interesting to Conclusions compare the molecular evolution of T6SS and other genes In summary, by using a group of important bacteria as the between species with mono- and multi-partite genomes. study system, this work utilized a strategy of phylogeny- At the levels of gene clusters and below, several interesting conscious genome sampling for systematic investigations observations were made based on the loci of the two secre- of a species complex. This approach requires prior know- tion systems investigated in this work. First, although these ledge regarding the extant phylogenetic diversity of the two systems may provide some fitness advantages (e.g., study system, which is important in the planning stage of T6SS for interbacterial competitions and T4SS for host ex- large-scale genome-sequencing projects. In addition to ploitation), complex patterns of gains and losses were ob- improving the cost-effectiveness, this strategy is also crit- served (Figs. 2 and 6). These patterns suggest that there is ical in obtaining an unbiased picture that does not over- not a strong selective pressure to maintain these genes and emphasize certain subgroups. Furthermore, the emphasis non-adaptive stochastic processes are important. Alterna- on generating and utilizing high-quality assemblies im- tively, these genes may be subjected to heterogeneous select- proves the confidence in gene content analysis. With the ive pressures. Regardless, there is a certain degree of continuing advancements in sequencing technologies and modularity regarding their evolution, such that the presence bioinformatic tools, this emphasis becomes increasingly patterns are all-or-none and no partial cluster was found for accessible. For this study system, our examination of bio- the chromosomal T6SS genes or the plasmid-encoded T4SS logical boundaries at the species level and above improves genes. This is particularly evident for the T6SS genes, as the understanding of how natural biodiversity is orga- when the main cluster was lost (e.g., G2 and G6), no nized. The targeted analysis of those secretion system accessory vgrG loci located elsewhere was found (Additional genes and oncogenic plasmids provides novel insights re- file 1: Figure S3). Second, for both systems, genes for the garding the key genetic variations involved in the fitness structural components are conserved, while those for the ef- and ecology of these soil-borne phytopathogens that need fectors and others are not (Figs. 2, 5, and 7). This is consist- to compete in complex microbiota and invade plant hosts. ent with the expectation that opposite selective forces may Moreover, the multi-level analysis of their genetic diversity act on these two categories of genes, with purifying selection from whole-genome to intra-genic domains highlights the against changes to preserve the apparatus of a functional se- complexity of these biological systems. The strategy and cretion system and positive selection for more diverse effec- findings of this work provide useful guides for future stud- tors and other accessory components. Third, modularity ies of other bacteria. Chou et al. BMC Biology (2022) 20:16 Page 16 of 20 Methods v2.6.0 with e-value cutoff set to 1e−15 and Genome sequencing OrthoMCL v1.3 were used to infer the homologous A total of 14 strains were acquired from the French gene clusters. The result was converted into a matrix of International Center for Microbial Resources (CIRM) 35 genomes by 17,058 clusters, with the value in each Collection for Plant-associated Bacteria (CFBP) (Table cell corresponding to the copy number. This matrix was 1). These include 12 strains that belong to the A. tume- further converted into a Jaccard distance matrix among faciens species complex and two A. larrymoorei strains genomes using the VEGAN package v2.5-6 in R, then as the outgroup. processed using the principal coordinates analysis func- The procedure for whole-genome shotgun sequencing tion in the APE package and visualized using was based on that described in our previous studies [7, ggplot2 v3.3.2. The hierarchical clustering analysis 80, 81]. All bioinformatics tools were used with the de- was performed using PVCLUST v3.4.4. fault settings unless stated otherwise. Briefly, total gen- For phylogenetic analysis, homologous sequences were omic DNA was prepared using the Wizard Genomic aligned using MUSCLE v3.8.31 for maximum likeli- DNA purification kit (Promega, USA). The Illumina hood inference by PhyML v.3.3.20180214. The pro- paired-end sequencing libraries were prepared using portion of invariable sites and the gamma distribution KAPA LTP Library Preparation Kits (Roche Sequencing, parameter were estimated from the data set, and the USA) with a targeted insert size of ~ 550 bp. The Illu- number of substitute rate categories was set to four. The mina MiSeq platform was used to generate 300 × 2 reads bootstrap supports were estimated based on 1000 with an average coverage of 306-fold per strain (range replicates. 141- to 443-fold). The raw reads were quality trimmed using a Q20 cutoff and used for de novo assembly based Analysis of the type VI secretion system genes on Velvet v1.2.10 with the settings “-exp_cov auto To identify the T6SS-associated genes, C58 [27, 36, 37] -min_contig_lgth 2000 -scaffolding no.” The contigs and other strains [7, 33] that have been characterized ex- were oriented by mapping to those complete genome as- perimentally were used as the references. Based on the semblies available (Table 1) using MAUVE v2015-02-13 known T6SS genes in these genomes, homologous genes. Due to the difficulties of identifying appropriate in other genomes were identified based on the reference genomes for several evolutionary branches, OrthoMCL result. To screen for novel T6SS effector, four strains (i.e., CFBP5473, CFBP5875, CFBP5877, and chaperone, and immunity genes, genes that are located CFBP6623) were selected for PacBio long-read sequen- near vgrG (i.e., three upstream and ten downstream) cing and PacBio HGAP v3 assembly. These PacBio- were examined manually by using the NCBI conserved based assemblies were used as a guide for scaffolding, ra- domain database (CDD) and the Phyre2 protein fold ther than the finalized results. recognition server ; the e-value cutoff was set to To improve the Illumina-based draft assemblies, an itera- 0.01. A few genes with a hit to the DUF4123 domain tive process was used to examine the raw reads mapping (i.e., a known T6SS chaperone) but have an e-value results and to incorporate gap-filling results based on PCR above the cutoff were manually added back to the list of and Sanger sequencing. This process was repeated until T6SS-associated genes (e.g., CFBP5477_RS20350). To the complete assembly was obtained or the draft assembly confirm the absence of specific T6SS genes, the genome could not be improved further. The finalized assemblies sequences were used as the subjects and the protein se- were submitted to the National Center for Biotechnology quences of known genes were used as the queries to run Information (NCBI) and annotated using the Prokaryotic TBLASTN searches. Genome Annotation Pipeline (PGAP). For classification of the vgrG homologs, we developed a domain-based scheme. The conserved N-terminal Comparative and evolutionary analysis TIGR03361 domain was first identified by the NCBI The genomes analyzed are listed in Table 1. The proce- CDD searches. A global alignment of all homologs was dures for genome comparisons were based on those de- used to determine the exact boundaries of this domain. scribed in our previous studies [7, 85–87]. Briefly, After this TIGR03361 domain was removed, the pairwise genome similarities were calculated using Fas- remaining C-terminal sequences were processed using tANI v1.1. For comparisons of plasmids, FastANI MEME v5.1.1 to identify conserved domains that was executed with the custom settings that reduced frag- meet these criteria: (1) present in at least two sequences, ment length to 1000 bp and minimum matched frag- (2) with zero or one occurrence per sequence, (3) with a ments to 25. For global alignments of chromosomes and size between 30 and 300 a.a., and (4) with an e-value plasmids, the syntenic regions were identified by lower than 0.05. The results were manually curated to BLASTN v2.6.0 and visualized using genoPlotR break down large domains that are composed of smaller v0.8.9. For gene content comparison, BLASTP domains. Pairwise BLASTP searches were conducted to Chou et al. BMC Biology (2022) 20:16 Page 17 of 20 verify that each domain is unique and no two domains week-old seedlings. Bacterial strains were transferred from have a BLASTP e-value of lower than 1e−05. For each stock to 5 mL 523 broth and cultured overnight at 28 °C domain, the consensus sequence was generated using in a shaker incubator (250 rpm), then sub-cultured for 4 h Jalview v2.10.5 and sequence conservation was visu- prior to inoculation. Bacterial cells were washed and re- alized using WebLogo server v3. For functional pre- suspended in 0.9% NaCl solution with a concentration of diction, the consensus sequence of each domain was OD600 0.2. The stem was punctured with a sterilized sew- used to query against NCBI CDD and Phyre2. Addition- ing needle, and 5 μL of bacterial suspension was added to ally, one representative from each subtype of vgrG ho- the wounding site. The plants were collected 3 weeks after mologs was used for structure modeling using Phyre2 inoculation and 1-cm stem segments centered at the with the “normal” mode. The chain D of PA0091 VgrG1 wounding site were cut for weighing. (PDB identifier: 4MTK) was selected as the template. The predicted structures were visualized using PyMOL Supplementary Information v1.2r3pre (Schrödinger, USA). The online version contains supplementary material available at https://doi. For the EI gene pairs identified, EI1 through EI11 were org/10.1186/s12915-021-01221-y. named based on the nomenclature proposed previously Additional file 1: Figure S1. Gene content dissimilarity among the , and novel pairs were named starting from EI12. Agrobacterium genomes. (A) and (B): principal coordinate analysis with When only the putative effector (E) or the putative im- and without the outgroup A. larrymoorei, respectively. The % variance munity (I) genes were found, those genes were classified explained by each axis is provided in parentheses. (C) and (D): hierarchical clustering with and without the outgroup A. larrymoorei, in the format of “E??” or “I??”, respectively. For some of respectively. Figure S2. Global alignment of the linear chromosomes. the EI pairs that were identified previously based on ad- Locations of T6SS-hcp operons and vgrG homologs are labeled. Figure jacency to T6SS genes but lacked high-confidence anno- S3. Logo plots of the putative protein domains identified among vgrG homologs. For each domain, the length and the number of homologs tation (i.e., EI2, EI3, EI5, and EI8), we chose a more with the domain is labeled. Domain 1 is the only domain with a conservative approach and annotated those genes as corresponding database entry (TIGR03361). Figure S4. Predicted hypothetical proteins. structures of VgrG homologs. Regions are colored according to the scheme used in the domain analysis (Fig. 3). The chain D of PA0091 VgrG1 (PDB identifier: 4MTK) from Pseudomonas aeruginosa was selected Analysis of the tumor-inducing plasmids and type IV as the template. The C-terminal parts that could not be confidently in- secretion system genes ferred are omitted. In all cases, the coverage (i.e., percentage of the se- quence included in the structure prediction) are at least 75%, the The list of 20 putative pTi sequences analyzed is pro- sequence identity to the template is at least 30% and the confidence vided in Table 2. These included all of the 15 complete score is 100%. Figure S5. Maximum likelihood phylogenies of vgrG-asso- sequences determined in this study and five representa- ciated domains. (A) Domain 1 (TIGR03361; VI_Rhs_Vgr super family), (B) Domain 2 (unknown function), and (C) Domain 3 (unknown function). tives from GenBank that are important in Agrobacterium Figure S6. Principal coordinate analysis of gene content among the pu- research. Our definition of putative pTi was based tative pTi analyzed. Figure S7. Maximum likelihood phylogeny of pTi on the presence of the main T4SS genes (virB1-B11 and based on the concatenated alignment of shared single-copy genes. (A) All of the 20 pTi sequences analyzed; 21 core genes and 8534 aligned virD4) and at least one predicted T-DNA region. The amino acid sites. (B) Excluding the two novel pTi; 40 core genes and pTi typing was performed based on k-mer profile clus- 15,473 aligned amino acid sites, all branches received > 80% bootstrap tering with a reference set of 143 oncogenic plasmids in support. Figure S8. Tomato tumor assay of strains 12D1, CFBP4996, and CFBP5473. Mock was inoculated with sterilized water as a negative con- Rhizobiaceae and a second set that contains > 4000 trol and strain C58 was included as a positive control. Strain 12D1 harbors Rhizobiaceae plasmids. For T-DNA identification, a plasmid with opine transporter and catabolism genes but lacks vir regu- putative T-DNA borders were identified based on the lon genes and identifiable T-DNA. CFBP4996 and CFBP5473 harbor novel types of putative tumor-inducing plasmids (pTi). (A) Tomato stems at motif YGRCAGGATATATNNNNNKGTMAWN. three weeks after inoculation. Scale bar: 0.25 cm. (B) Weight distribution Genes involved in opine metabolism and T4SS of five biological replicates (1-cm segments of the stem centered at the were identified based on the annotation and hom- inoculation site). The letters indicate ANOVA results. Figure S9. Gene organization of the vir regulons on pTi. Syntenic regions are indicated by ologous gene clustering results produced by OrthoMCL. grey blocks. The virulence (vir) genes are highlighted in red, the conjuga- Additionally, putative T4SS effectors were identified tion (tra) genes are highlighted in yellow, and other genes are plotted in using the T4SEpre tool in EffectiveDB with white. the minimal score set to 0.8. All protein sequences of Additional file 2: Dataset S1. List of vgrG-associated genes. Information including genomic location, RefSeq annotation, and domain pTi-encoded genes were used as the queries. prediction are included. Dataset S2. Locus tags of the vir regulon genes on pTi. The virA/J of 1D1609 are located on another plasmid and are Tumorigenesis assay highlighted by “*”. The virB7 of pTiChry5 and pTiEU6 are unannotated in the GenBank RefSeq records so no locus tag is available but the gene Tomato tumorigenesis assays were performed to presence was confirmed by BLASTN searches. evaluate the virulence of selected strains. The plants (culti- var Known-You 301) were maintained in growth cham- Acknowledgements bers with a 16-/8-h light/dark regime and a constant We thank Ai-Ping Chen, Hsin-Ying Chiang, Shu-Jen Chou, Mei-Jane Fang, Ya- temperature of 22 °C. Inoculation was performed on 3- Yi Huang, Wen-Sui Lo, and Javier F. Tabima for technical assistance. Sophien Chou et al. BMC Biology (2022) 20:16 Page 18 of 20 Kamoun provided helpful comments that improved the writing of this 5. Popoff MY, Kersters K, Kiredjian M, Miras I, Coynault C. Position taxonomique manuscript. The bacterial strains were imported under the permits 103-B-003 de souches de Agrobacterium d’origine hospitalière. Ann Inst Pasteur and 104-B-002 issued by the Council of Agriculture of Taiwan. The Sanger se- Microbiol. 1984;135:427–42. quencing service and the Illumina sequencing library preparation service 6. Costechareyre D, Bertolla F, Nesme X. Homologous recombination in were provided by the Genomic Technology Core (Institute of Plant and Mi- Agrobacterium: potential implications for the genomic species concept in crobial Biology, Academia Sinica). The Illumina MiSeq sequencing service was bacteria. Mol Biol Evol. 2009;26:167–76. provided by the Genomics Core (Institute of Molecular Biology, Academia 7. Wu C-F, Santos MNM, Cho S-T, Chang H-H, Tsai Y-M, Smith DA, et al. Plant- Sinica). The PacBio sequencing and data processing service was provided by pathogenic Agrobacterium tumefaciens strains have diverse type VI effector- Genomics BioSci & Tech. Co. Ltd. (New Taipei City, Taiwan). The Institute of immunity pairs and vary in in-planta competitiveness. Mol Plant Microbe Plant and Microbial Biology (Academia Sinica) and the Department of Botany Interact. 2019;32:961–71. and Plant Pathology (Oregon State University) provided computing 8. Lassalle F, Planel R, Penel S, Chapulliot D, Barbe V, Dubost A, et al. Ancestral resources. genome estimation reveals the history of ecological diversification in Agrobacterium. Genome Biol Evol. 2017;9:3413–31. Authors’ contributions 9. Weisberg AJ, Davis EW, Tabima J, Belcher MS, Miller M, Kuo C-H, et al. Conceptualization: CHK. Funding acquisition: JHC, EML, CHK. Investigation: Unexpected conservation and global transmission of agrobacterial virulence LC, YCL, MH, MNS, AJW, CFW. Methodology: LC, YCL, MNS, STC, CHK. Project plasmids. Science. 2020;368:eaba5256. administration: CHK. Supervision: JHC, EML, CHK. Validation: LC, YCL, MH, 10. Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High MNS, AJW. Visualization: LC, YCL. Writing—original draft: LC, YCL, CHK. throughput ANI analysis of 90 K prokaryotic genomes reveals clear species Writing—review and editing: LC, YCL, AJW, CFW, JHC, EML, CHK. All authors boundaries. Nat Commun. 2018;9:5114. read and approved the final manuscript. 11. Murray CS, Gao Y, Wu M. Re-evaluating the evidence for a universal genetic boundary among microbial species. Nat Commun. 2021;12:4059. 12. Young JM. Agrobacterium—taxonomy of plant-pathogenic Rhizobium Funding species. In: Tzfira T, Citovsky V, editors. Agrobacterium Biol Biotechnol. New Research in the Chang lab was supported by the National Institute of Food York: Springer; 2008. p. 183–220. Available from: http://link.springer.com/ and Agriculture, US Department of Agriculture awards 2014-51181-22384 chapter/10.1007/978-0-387-72290-0_5. and 2020-51181-32154. Research in the Lai lab was supported by Academia 13. Kado CI. Historical account on gaining insights on the mechanism of crown Sinica and the Ministry of Science and Technology of Taiwan (MOST 104- gall tumorigenesis induced by Agrobacterium tumefaciens. Front Microbiol. 2311-B-001-025-MY3 and 107-2311-B-001-019-MY3). Research in the Kuo lab 2014;5:340. was supported by Academia Sinica and the Ministry of Science and Technol- 14. Nester EW. Agrobacterium: nature’s genetic engineer. Front Plant Sci. 2015;5: ogy of Taiwan (MOST 109-2628-B-001-012 and 110-2628-B-001-020). The fun- 730. ders had no role in study design, data collection and interpretation, or the 15. Hwang H-H, Yu M, Lai E-M. Agrobacterium-mediated plant transformation: decision to submit the work for publication. biology and applications. Arab Book. 2017;15:e0186. 16. Mougel C, Thioulouse J, Perrière G, Nesme X. A mathematical method for Availability of data and materials determining genome divergence and species delineation using AFLP. Int J The 14 new genome sequences are available in NCBI under BioProject Syst Evol Microbiol. 2002;52:573–86. accessions PRJNA534385-PRJNA534397 and PRJNA534399. 17. Portier P, Saux MF-L, Mougel C, Lerondelle C, Chapulliot D, Thioulouse J, et al. Identification of genomic species in Agrobacterium biovar 1 by AFLP Declarations genomic markers. Appl Environ Microbiol. 2006;72:7123–31. 18. Costechareyre D, Rhouma A, Lavire C, Portier P, Chapulliot D, Bertolla F, Ethics approval and consent to participate et al. Rapid and efficient identification of Agrobacterium species by recA Not applicable. allele analysis: Agrobacterium recA diversity. Microb Ecol. 2010;60:862–72. 19. Hellens R, Mullineaux P, Klee H. A guide to Agrobacterium binary Ti vectors. Trends Plant Sci. 2000;5:446–51. Consent for publication 20. Lee L-Y, Gelvin SB. T-DNA binary vectors and systems. Plant Physiol. 2008; Not applicable. 146:325–32. 21. Lassalle F, Campillo T, Vial L, Baude J, Costechareyre D, Chapulliot D, et al. Competing interests Genomic species are ecological species as revealed by comparative The authors declare that they have no competing interests. genomics in Agrobacterium tumefaciens. Genome Biol Evol. 2011;3:762–81. 22. Young JM, Pennycook SR, Watson DRW. Proposal that Agrobacterium Author details radiobacter has priority over Agrobacterium tumefaciens. Request for an 1 Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan. Opinion. Int J Syst Evol Microbiol. 2006;56:491–3. 2 Molecular and Biological Agricultural Sciences Program, Taiwan International 23. Slater SC, Goldman BS, Goodner B, Setubal JC, Farrand SK, Nester EW, et al. Graduate Program, National Chung Hsing University and Academia Sinica, Genome sequences of three Agrobacterium biovars help elucidate the evolution of Taipei, Taiwan. 3Graduate Institute of Biotechnology, National Chung Hsing multichromosome genomes in bacteria. J Bacteriol. 2009;191:2501–11. University, Taichung, Taiwan. 4Department of Botany and Plant Pathology, 24. Goodner B, Hinkle G, Gattung S, Miller N, Blanchard M, Qurollo B, et al. Oregon State University, Corvallis, Oregon, USA. 5Biotechnology Center, Genome sequence of the plant pathogen and biotechnology agent National Chung Hsing University, Taichung, Taiwan. Agrobacterium tumefaciens C58. Science. 2001;294:2323–8. 25. Wood DW, Setubal JC, Kaul R, Monks DE, Kitajima JP, Okura VK, et al. The Received: 19 August 2021 Accepted: 23 December 2021 genome of the natural genetic engineer Agrobacterium tumefaciens C58. Science. 2001;294:2317–23. 26. Haryono M, Cho S-T, Fang M-J, Chen A-P, Chou S-J, Lai E-M, et al. References Differentiations in gene content and expression response to virulence 1. Rosselló-Mora R, Amann R. The species concept for prokaryotes. FEMS induction between two Agrobacterium strains. Front Microbiol. 2019;10:1554. Microbiol Rev. 2001;25:39–67. 27. Ma L-S, Hachani A, Lin J-S, Filloux A, Lai E-M. Agrobacterium tumefaciens 2. Fraser C, Alm EJ, Polz MF, Spratt BG, Hanage WP. The bacterial species deploys a superfamily of type VI secretion DNase effectors as weapons for challenge: making sense of genetic and ecological diversity. Science. 2009; interbacterial competition in planta. Cell Host Microbe. 2014;16:94–104. 323:741–6. 28. Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Ostell J, Pruitt KD, et al. 3. Bobay L-M, Ochman H. Biological species are universal across life’s domains. GenBank. Nucleic Acids Res. 2018;46:D41–7. Genome Biol Evol. 2017;9:491–501. 29. Ormeño-Orrillo E, Servín-Garcidueñas LE, Rogel MA, González V, Peralta H, 4. Konstantinidis K, Ramette A, Tiedje JM. The bacterial species definition in Mora J, et al. Taxonomy of rhizobia and agrobacteria from the Rhizobiaceae the genomic era. Philos Trans R Soc B Biol Sci. 2006;361:1929–40. family in light of genomics. Syst Appl Microbiol. 2015;38:287–91. Chou et al. BMC Biology (2022) 20:16 Page 19 of 20 30. Hernandez RE, Gallegos-Monterrosa R, Coulthurst SJ. Type VI secretion 53. Mafakheri H, Taghavi SM, Puławska J, de Lajudie P, Lassalle F, Osdaghi E. Two system effector proteins: effective weapons for bacterial competitiveness. novel genomospecies in the Agrobacterium tumefaciens species complex Cell Microbiol. 2020;22:e13241. associated with rose crown gall. Phytopathology. 2019;109:1859–68. 31. Jurėnas D, Journet L. Activity, delivery, and diversity of type VI secretion 54. Valdes Franco JA, Collier R, Wang Y, Huo N, Gu Y, Thilmony R, et al. Draft effectors. Mol Microbiol. 2021;115:383–94. genome sequence of Agrobacterium rhizogenes strain NCPPB2659. Genome 32. Smith WPJ, Vettiger A, Winter J, Ryser T, Comstock LE, Basler M, et al. The Announc. 2016;4:e00746-16. evolution of the type VI secretion system as a disintegration weapon. PLoS 55. Singh NK, Lavire C, Nesme J, Vial L, Nesme X, Mason CE, et al. Comparative Biol. 2020;18:e3000720. genomics of novel Agrobacterium G3 strains isolated from the International 33. Santos MNM, Cho S-T, Wu C-F, Chang C-J, Kuo C-H, Lai E-M. Redundancy Space Station and description of Agrobacterium tomkonis sp. nov. Front and specificity of type VI secretion vgrG loci in antibacterial activity of Microbiol. 2021;12:3369. Agrobacterium tumefaciens 1D1609 strain. Front Microbiol. 2020;10:3004. 56. Hooykaas PJJ, Klapwijk PM, Nuti MP, Schilperoort RA, Rörsch A. Transfer of 34. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. the Agrobacterium tumefaciens Ti plasmid to avirulent agrobacteria and to BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. Rhizobium ex planta. J Gen Microbiol. 1977;98:477–84. 35. Wu H-Y, Chung P-C, Shih H-W, Wen S-R, Lai E-M. Secretome analysis 57. Haryono M, Tsai Y-M, Lin C-T, Huang F-C, Ye Y-C, Deng W-L, et al. Presence uncovers an Hcp-family protein secreted via a type VI secretion system in of an Agrobacterium-type tumor-inducing plasmid in Neorhizobium sp. Agrobacterium tumefaciens. J Bacteriol. 2008;190:2841–50. NCHU2750 and the link to phytopathogenicity. Genome Biol Evol. 2018;10: 36. Bondage DD, Lin J-S, Ma L-S, Kuo C-H, Lai E-M. VgrG C terminus confers the 3188–95. type VI effector transport specificity and is required for binding with PAAR 58. Rathore DS, Mullins E. Alternative non-Agrobacterium based methods for plant and adaptor–effector complex. Proc Natl Acad Sci. 2016;113:E3931–40. transformation. In: Roberts JA, editor. Annu Plant Rev Online. Hoboken, New 37. Lin J-S, Ma L-S, Lai E-M. Systematic dissection of the Agrobacterium type VI Jersey: John Wiley & Sons, Ltd.; 2018. p. 891–908. Available from: https:// secretion system reveals machinery and secreted components for onlinelibrary.wiley.com/doi/abs/10.1002/9781119312994.apr0659. subcomplex formation. PLoS One. 2013;8:e67647. 59. Barco RA, Garrity GM, Scott JJ, Amend JP, Nealson KH, Emerson D. A genus 38. Pukatzki S, Ma AT, Revel AT, Sturtevant D, Mekalanos JJ. Type VI secretion definition for Bacteria and Archaea based on a standard genome system translocates a phage tail spike-like protein into target cells where it relatedness index. mBio. 2020;11:e02475-19. cross-links actin. Proc Natl Acad Sci. 2007;104:15508–13. 60. Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil P-A, 39. Leiman PG, Basler M, Ramagopal UA, Bonanno JB, Sauder JM, Pukatzki S, et al. A standardized bacterial taxonomy based on genome phylogeny et al. Type VI secretion apparatus and phage tail-associated protein substantially revises the tree of life. Nat Biotechnol. 2018;36:996–1004. complexes share a common evolutionary origin. Proc Natl Acad Sci. 2009; 61. Parks DH, Chuvochina M, Chaumeil P-A, Rinke C, Mussig AJ, Hugenholtz P. A 106:4154–9. complete domain-to-species taxonomy for Bacteria and Archaea. Nat 40. Wu C-F, Weisberg AJ, Davis EW, Chou L, Khan S, Lai E-M, et al. Biotechnol. 2020;38:1079–86. Diversification of the type VI secretion system in agrobacteria. mBio. 2021; 62. Kuo C-H, Ochman H. Inferring clocks when lacking rocks: the variable rates 12:e01927-21. of molecular evolution in bacteria. Biol Direct. 2009;4:35. 41. Liang X, Moore R, Wilton M, Wong MJQ, Lam L, Dong TG. Identification of 63. Okasha S. Evolution and the Levels of Selection. Oxford: Oxford University divergent type VI secretion effectors using a conserved chaperone domain. Press; 2006. Available from: https://oxford.universitypressscholarship.com/ Proc Natl Acad Sci. 2015;112:9106–11. view/10.1093/acprof:oso/9780199267972.001.0001/acprof-9780199267972 42. Unterweger D, Kostiuk B, Ötjengerdes R, Wilton A, Diaz-Satizabal L, Pukatzki 64. Daubin V, Moran NA, Ochman H. Phylogenetics and the cohesion of S. Chimeric adaptor proteins translocate diverse type VI secretion system bacterial genomes. Science. 2003;301:829–32. effectors in Vibrio cholerae. EMBO J. 2015;34:2198–210. 65. Choi I-G, Kim S-H. Global extent of horizontal gene transfer. Proc Natl Acad 43. Weisberg AJ, Miller M, Ream W, Grünwald NJ, Chang JH. Diversification of Sci. 2007;104:4489–94. plasmids in a genus of pathogenic and nitrogen-fixing bacteria. Philos Trans 66. Ochman H, Lawrence JG, Groisman EA. Lateral gene transfer and the nature R Soc B Biol Sci. 2022;377:20200466. of bacterial innovation. Nature. 2000;405:299–304. 44. Li X, Tu H, Pan SQ. Agrobacterium delivers anchorage protein VirE3 for 67. Dagan T, Artzy-Randrup Y, Martin W. Modular networks and cumulative companion VirE2 to aggregate at host entry sites for T-DNA protection. Cell impact of lateral transfer in prokaryote genome evolution. Proc Natl Acad Rep. 2018;25:302–11.e6. Sci. 2008;105:10039–44. 45. Jarchow E, Grimsley NH, Hohn B. virF, the host-range-determining virulence 68. Chan CX, Beiko RG, Darling AE, Ragan MA. Lateral transfer of genes and gene of Agrobacterium tumefaciens, affects T-DNA transfer to Zea mays. Proc gene fragments in prokaryotes. Genome Biol Evol. 2009;1:429–38. Natl Acad Sci. 1991;88:10426–30. 69. Pál C, Papp B, Lercher MJ. Adaptive evolution of bacterial metabolic 46. Vogel AM, Das A. The Agrobacterium tumefaciens virD3 gene is not essential networks by horizontal gene transfer. Nat Genet. 2005;37:1372–5. for tumorigenicity on plants. J Bacteriol. 1992;174:5161–4. 70. Kuo C-H, Ochman H. The fate of new bacterial genes. FEMS Microbiol Rev. 47. Lin T-S, Kado CI. The virD4 gene is required for virulence while virD3 and 2009;33:38–43. orf5 are not required for virulence of Agrobacterium tumefaciens. Mol 71. Wiedenbeck J, Cohan FM. Origins of bacterial diversity through horizontal Microbiol. 1993;9:803–12. genetic transfer and adaptation to new ecological niches. FEMS Microbiol 48. Pan SQ, Jin S, Boulton MI, Hawes M, Gordon MP, Nester EW. An Rev. 2011;35:957–76. Agrobacterium virulence factor encoded by a Ti plasmid gene or a 72. Mira A, Ochman H, Moran NA. Deletional bias and the evolution of bacterial chromosomal gene is required for T-DNA transfer into plants. Mol Microbiol. genomes. Trends Genet. 2001;17:589–96. 1995;17:259–69. 73. Kuo C-H, Ochman H. Deletional bias across the three domains of life. 49. Hwang H-H, Wu ET, Liu S-Y, Chang S-C, Tzeng K-C, Kado CI. Characterization Genome Biol Evol. 2009;1:145–52. and host range of five tumorigenic Agrobacterium tumefaciens strains and 74. Sundin GW. Genomic insights into the contribution of phytopathogenic possible application in plant transient transformation assays. Plant Pathol. bacterial plasmids to the evolutionary history of their hosts. Annu Rev 2013;62:1384–97. Phytopat