Vanilla Research Paper PDF

Document Details

HealthyIndicolite4090

Uploaded by HealthyIndicolite4090

Warren Wilson College

2022

Quentin Piet, Gaetan Droc, William Marande, Gautier Sarah, Stéphanie Bocs, Christophe Klopp, Mickael Bourge, Sonja Siljak-Yakovlev, Olivier Bouchez, Céline Lopez-Roques, Sandra Lepers-Andrzejewski

Tags

plant communications genome assembly vanilla planifolia biology

Summary

This research article details the chromosome-level assembly and annotation of a Vanilla planifolia genome. The study highlights the significant challenge of partial endoreplication in accurately assembling whole genomes. The research also explores the relationship between gene-rich regions and the endoreplication phenomenon, providing insights into the molecular regulation of this process in Vanilla.

Full Transcript

Plant Communications ll Research Article A chromosome-level, haplotype-phased Vanilla planifolia genome highlights the challenge of partial endoreplication for accura...

Plant Communications ll Research Article A chromosome-level, haplotype-phased Vanilla planifolia genome highlights the challenge of partial endoreplication for accurate whole- genome assembly Quentin Piet1,17, Gaetan Droc2,3,4,17,*, William Marande5,17, Gautier Sarah4,6,17, Stéphanie Bocs2,3,4,17, Christophe Klopp7,17, Mickael Bourge8, Sonja Siljak-Yakovlev9, Olivier Bouchez10, Céline Lopez-Roques10, Sandra Lepers-Andrzejewski11, Laurent Bourgois12, Joseph Zucca13, Michel Dron14, Pascale Besse15, Michel Grisoni16,*, Cyril Jourda1,* and Carine Charron1,17 1 CIRAD, UMR PVBMT, 97410 Saint-Pierre, La Réunion, France 2 CIRAD, UMR AGAP Institut, 34398 Montpellier, France 3 UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398 Montpellier, France 4 French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, 34398 Montpellier, France 5 INRAE, CNRGV, Genotoul, 31326 Castanet-Tolosan, France 6 AGAP, Univ. Montpellier, CIRAD, INRAE, Montpellier SupAgro, Montpellier, France 7 Plateforme Bioinformatique, Genotoul, BioinfoMics, UR875 Biométrie et Intelligence Artificielle, INRAE, Castanet-Tolosan, France 8 Cytometry Facility, Imagerie-Gif, Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France 9 Université Paris-Saclay, CNRS, AgroParisTech, Ecologie Systématique Evolution (ESE), 91190 Gif-sur-Yvette, France 10 INRAE, GeT-PlaGe, Genotoul, 31326 Castanet-Tolosan, France 11 Etablissement Vanille de Tahiti, Uturoa, French Polynesia, France 12 Eurovanille, Rue de Maresquel, 62870 Gouy Saint André, France 13 Département Biotechnologie, V. Mane Fils, 06620 Le Bar Sur Loup, France 14 Université Paris-Saclay, CNRS, INRAE, Univ. Evry, Institute of Plant Sciences Paris-Saclay (IPS2), 91405 Orsay, France 15 Université de la Réunion, UMR PVBMT, Saint-Pierre, La Réunion, France 16 CIRAD, UMR PVBMT, 501 Tamatave, Madagascar 17These authors contributed equally to this article. *Correspondence: Gaetan Droc ([email protected]), Michel Grisoni ([email protected]), Cyril Jourda ([email protected]) https://doi.org/10.1016/j.xplc.2022.100330 ABSTRACT Vanilla planifolia, the species cultivated to produce one of the world’s most popular flavors, is highly prone to partial genome endoreplication, which leads to highly unbalanced DNA content in cells. We report here the first molecular evidence of partial endoreplication at the chromosome scale by the assembly and anno- tation of an accurate haplotype-phased genome of V. planifolia. Cytogenetic data demonstrated that the diploid genome size is 4.09 Gb, with 16 chromosome pairs, although aneuploid cells are frequently observed. Using PacBio HiFi and optical mapping, we assembled and phased a diploid genome of 3.4 Gb with a scaffold N50 of 1.2 Mb and 59 128 predicted protein-coding genes. The atypical k-mer frequencies and the uneven sequencing depth observed agreed with our expectation of unbalanced genome represen- tation. Sixty-seven percent of the genes were scattered over only 30% of the genome, putatively linking gene-rich regions and the endoreplication phenomenon. By contrast, low-coverage regions (non-endore- plicated) were rich in repeated elements but also contained 33% of the annotated genes. Furthermore, this Published by the Plant Communications Shanghai Editorial Office in association with Cell Press, an imprint of Elsevier Inc., on behalf of CSPB and CEMPS, CAS. Plant Communications 3, 100330, September 12 2022 ª 2022 The Author(s). 1 This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Plant Communications Partial endoreplication in Vanilla planifolia challenges genome assembly assembly showed distinct haplotype-specific sequencing depth variation patterns, suggesting complex molecular regulation of endoreplication along the chromosomes. This high-quality, anchored assembly represents 83% of the estimated V. planifolia genome. It provides a significant step toward the elucidation of this complex genome. To support post-genomics efforts, we developed the Vanilla Genome Hub, a user- friendly integrated web portal that enables centralized access to high-throughput genomic and other omics data and interoperable use of bioinformatics tools. Keywords: vanilla, whole-genome sequencing, optical mapping, partial endoreplication, genome hub Piet Q., Droc G., Marande W., Sarah G., Bocs S., Klopp C., Bourge M., Siljak-Yakovlev S., Bouchez O., Lopez- Roques C., Lepers-Andrzejewski S., Bourgois L., Zucca J., Dron M., Besse P., Grisoni M., Jourda C., and Charron C. (2022). A chromosome-level, haplotype-phased Vanilla planifolia genome highlights the challenge of partial endoreplication for accurate whole-genome assembly. Plant Comm. 3, 100330. INTRODUCTION ek Brown et al., 2017; Lepers-Andrzejewski et al., 2011; Trávnı́c et al., 2015). Endoreplication, characterized by a series of DNA replications in the nucleus without mitotic cell division, is found in a large num- Vanilla planifolia G. Jackson is an emblematic orchid cultivated ber of both animal and plant species (Lee et al., 2009). During for its fruit (pod) fragrance. Pods contain many aromatic com- regular endoreplication, each step of this mechanism leads to a pounds, particularly vanillin in high proportion (Perez-Silva two-fold increase in nuclear DNA content in somatic cells (2C, et al., 2006). In this species, diploid nuclei (2C) are found mainly 4C, 8C, 16C, etc.), where 1C corresponds to the DNA content in nodal tissues (with PE up to 32E), whereas the nuclei of of the non-replicated holoploid chromosome set. Endoreplication mature leaf cells contain a low 2C fraction and show PE up to is very common in plants and is related to various biological pro- 64E (Brown et al., 2017). The F fraction was estimated to be cesses, such as plant development and growth, and occurs in 71.6% of the genome, whereas the P fraction (28.4%) could be response to biotic and abiotic stresses (Bourdon et al., 2012; duplicated up to 64E. In addition, the proportion of the non- Lang and Schnittger, 2020). This phenomenon depends on the endoreplicated (F) genome varies greatly from species to spe- type of tissue and its stage of development, suggesting cies. It is very high in Vanilla pompona (F = 81%) but rather low involvement in cell differentiation and maintenance of the final in Vanilla mexicana (F = 17%) (Brown et al., 2017). Several stage of differentiation (Bhosale et al., 2018). The molecular studies on orchids have also shown that species prone to PE mechanisms involved in regular endoreplication have been have a larger genome than those prone to conventional particularly well studied in Arabidopsis thaliana over the past endoreplication (Trávnı́cek et al., 2015, 2019; Chumová et al., few years. A downregulation of mitotic activity caused by 2021). Nevertheless, the molecular mechanisms involved in PE mitotic cyclin-dependent kinase (CDK)–cyclin complexes has are not yet elucidated. been shown to be directly involved in the control of endoreplication (Lang and Schnittger, 2020). A chromosome-scaled, phased V. planifolia genome (Daphna cultivar) was recently reported, highlighting haplotype differences In many orchid species, measurements of genomic content by and one ancestral whole-genome duplication shared by all flow cytometry (FCM) have not agreed with the commonly sequenced orchids (Hasing et al., 2020). However, the 1.5 accepted model of complete endoreplication. In this case, nu- Gb size of the assembled genome was far from the V. planifolia clear DNA content in endoreplicated cells was present at less genome size, estimated to be about 4 Gb using FCM than twice the 2C cell content. Because this ratio was constant measurement (Bory et al., 2008; Lepers-Andrzejewski et al., for a given Vanilla species, whatever the cell ploidy level, it was 2011), suggesting that the Daphna genome assembly may be suggested that the nuclear DNA could be categorized into two highly incomplete. As mentioned by Hasing et al. (2020), the parts: the P fraction, subject to endoreplication, and the F frac- reason for the genome size discrepancy between FCM and tion, not endoreplicated (Brown et al., 2017). These fractions assembly results remains to be elucidated. With about 65% of are constant in all cells undergoing partial endoreplication (PE), the V. planifolia genome missing in the Daphna assembly, we which suggests the fine regulation of genome rearrangements. hypothesize that the missing part of the genome corresponds The fact that the gametes are haploid also suggests the mainly to the F (non-endoreplicated) fraction (71.6%) (Brown presence of molecular mechanisms that enable the isolation of et al., 2017), whose lower representation results in a lower the holoploid genome. This type of endoreplication, which sequencing depth. appears to be specific to the Orchidaceae lineage in plants, has been successively termed ‘‘progressively PE’’ (Bory et al., 2008; Here, we address this issue by developing an approach that com- ek et al., 2015; Hribová et al., 2016), strict PE (Brown Trávnı́c bines FCM, cytogenetics, and whole-genome sequencing using et al., 2017), and more recently, PE (Chumová et al., 2021; the most recently developed technologies (Supplemental ek et al., 2019). To be in line with the latest works and to Trávnı́c Figure 1) with a tissue that is enriched in the 2C fraction harmonize the terminology for this phenomenon, the term PE (nodes), resulting in a reduced P/F ratio and therefore a greater will be used in this work. To date, PE has been observed in all proportion of the F fraction. We demonstrate that the genome species studied within the genus Vanilla (Bory et al., 2008; size discrepancy was due to the occurrence of PE, for which 2 Plant Communications 3, 100330, September 12 2022 ª 2022 The Author(s). Partial endoreplication in Vanilla planifolia challenges genome assembly Plant Communications further knowledge at the chromosome scale was gained from this genome size. One third of the assembly could be anchored study. We present the most complete version to date of a high- onto 14 chromosomes using published Daphna chromosomes quality, chromosome-level phased genome of V. planifolia using as references (Hasing et al., 2020). Unfortunately, no data could a traditional vanilla cultivar from the Indian Ocean region help to organize the remaining contigs into the two missing (CR0040). Our results are shared through a web portal that facil- chromosomes. Therefore, the remaining two-thirds correspond itates data access, use, and analyses by a wide community. to unanchored additional sequences that were compiled into two unknown random pseudomolecules, A0 and B0. The final assembly comprised 24 534 contigs with a contig N50 length of RESULTS 924 kb. The lengths of the 14 chromosomes ranged from 73.5 Genome size, ploidy level, and chromosome content Mb (Chr01) to 20 Mb (Chr14). Main genome assembly statistics The 2C genome size of V. planifolia CR0040, a traditional vanilla are synthesized in Table 1. cultivar from La Reunion island (Supplemental Note 1), was estimated in nodal tissues to be 4.18 ± 0.08 pg by FCM In order to understand how PE affects the assembly, a k-mer (Supplemental Note 1), corresponding to 4.09 Gb (Dolezel analysis was produced. The results should reflect the sequencing et al., 2003). To estimate PE levels, the fluorescence ratio of coverage of the different genome fractions present in our raw DNA content between consecutive peaks of endoreplication data and assembly. In brief, the reads were split into overlapping levels was estimated (Supplemental Note 1; Supplemental k-mers (47-mers in our case). K-mers were then sorted and oc- Figure 2; Supplemental Table 1). Results showed no significant currences counted. These counts were then used to produce a differences (calculated t-values of 1.116, 1.900, 0.935, and histogram. A spectra-cn plot was used to compare the k-mers 0.365 compared with Student table t-value [a = 0.05] of 2.131) found in the reads versus the k-mers found in the assembly between the PE pattern of CR0040 and those of other (Supplemental Figure 3). The x axis gives the number of times a V. planifolia cultivars, such as CR1110 (2C = 4.16 ± 0.04 pg), given k-mer was found in all the reads, reflecting the coverage studied by Brown et al. (2017). The replicated fraction P was of the k-mer. The y axis gives a value representing the also calculated (P = 30.5% ± 3.2%). The equivalent amount 2p number of k-mers that were found a specified number of times was then P 3 2C = 1.275 pg, which meant that the absolute (x axis value). Interestingly, two k-mer distributions were quantity p was 0.637 pg, and the absolute quantity f of fixed centered at 423 and 843, representing a classical diploid amount was 1.453 pg (Figure 1A and 1B; Supplemental Note 1). distribution with heterozygous and homozygous k-mer content. The karyotype of V. planifolia obtained by cytogenetics We assumed that these peaks represented the k-mers of approaches (Supplemental Note 1) appeared to be of bimodal the endoreplicated fraction with a high sequencing depth type, composed of 16 both large and small chromosome pairs due to higher representation. Remarkably, the graph also (Figures 2A–2C), although aneuploid cells were frequently showed an additional k-mer distribution centered around 103 observed, such as those with only 28 chromosomes (Figure 2D (Supplemental Figure 3, red arrow). This distribution could and 2E). V. planifolia chromosomes possess important portions easily be mistaken for an erroneous k-mer distribution, but we of telomeric and pericentromeric heterochromatin, which made assumed that it represented non-endoreplicated k-mers of the the determination of their morphology difficult. In the interphase V. planifolia genome, with low-sequencing depth due to lower nuclei, this heterochromatin was present in the form of representation. numerous chromocenters that were clearly visible after staining with both orcein (Figure 2F) and DAPI (Figure 2G). This type of To validate the assembly and compare it with the already pub- heterochromatin is unspecific, whereas heterochromatin linked lished reference, we produced four k-mer spectra-cn plots to rRNA genes is rich in G-C bases. Only one locus (two spots) showing k-mer distributions of Daphna Illumina reads and of rDNA (18S-5.8S-26S) was present in the genome of CR0040 HiFi reads colored both with the Daphna and CR0040 V. planifolia (Figure 2H, arrows), evidenced after chromomycin assemblies (Figure 3). A spectra-cn plot enables comparison of (CMA3) staining. After Hoechst 33258 staining, our results also the k-mers found in the reads versus the k-mers found in the as- revealed that AT-rich DNA regions were more common than sembly. The k-mer histogram from the reads is colored based on GC-rich regions in the V. planifolia chromosomes and that the number of times each k-mer is found in the assembly. For a some chromosomes were entirely or almost entirely heterochro- heterozygous diploid assembly, we expected to find two distribu- matinized (Figure 2I). tions: on the left, the heterozygous distribution, which should be colored in red because each k-mer is only found once in the as- sembly, and on the right, the homozygous distribution, which is Whole-genome assembly and k-mer analysis purple because the corresponding k-mers are found twice in CR0040 genome sequencing produced 69 Gb of Pacific Biosci- the assembly. The black area at the far left of the diagram corre- ences (PacBio) HiFi long reads, 147 Gb of Oxford Nanopore sponds to k-mers that include sequencing errors; these are found Technology (ONT) long reads, and 200 Gb of Illumina 10X Geno- a limited number of times in the reads and never in the assembly. mics short reads (Supplemental Note 1; Supplemental Table 2). Daphna Illumina sequencing, being deeper, resulted in better These DNA sequencing (DNA-seq) reads were assembled using separation between the homozygous (803 sequencing depth) different bioinformatics pipelines. The best result was obtained and heterozygous (1603 depth) k-mer fractions in the spectra- using only high-quality HiFi long reads (Supplemental Note 2; cn graph compared with CR0040 (Figure 3A and 3B against 3C Supplemental Tables 3 and 4). Contigs from the HiFi read and 3D). The same pattern occurred for CR0040 HiFi data assembly were scaffolded with optical maps to obtain a final around 453 and 903 (Figure 3C). The differences between phased assembly of 3.4 Gb (1.5 Gb for haplotype A and 1.9 Gb Figure 3A and 3C come from the sequencing depth and the for haplotype B), representing around 83% of the expected type of tissue used: mature leaves with a higher proportion of Plant Communications 3, 100330, September 12 2022 ª 2022 The Author(s). 3 Plant Communications Partial endoreplication in Vanilla planifolia challenges genome assembly A B C D Figure 1. Endoreplicated and non-endoreplicated fractions in the CR0040 Vanilla planifolia genome. (A) The histogram represents the distribution of nuclei in V. planifolia nodal tissues according to the partial endoreplication state of cells, from 2C (green) to 4E (blue), 8E (yellow), 16E (orange), and 32E (gray). The disks below represent the endoreplicated (colored) and non-endoreplicated (black) DNA content for each class of nuclei, proportionally to their mass (pg). The lowercase f and p denote the respective DNA quantities of the F fraction (fixed proportion of the haploid genome that cannot endoreplicate) and the P fraction (part that participates in endoreplication). The mean and the standard deviation (SD) of the interpeak ratio have been indicated below the dotted arrows. (B) F and P fractions and P/F ratio values obtained by flow cytometry and detailed for the P fraction for each nuclear class (2C, green; 4E, blue; 8E, yellow; 16E, orange; and 32E, gray). (C) Theoretical F and P fractions expected from HiFi sequencing and from flow-cytometry data. (D) Theoretical (dotted) and experimental k-mer coverages for F (black) and P (hatched) fractions. the P fraction for Daphna and nodal tissues with a lower P/F ratio genome assembly is close in size to the FCM estimate and has for CR0040. The k-mer distribution of the non-endoreplicated the expected k-mer diploid profile, with a well-represented non- fraction (low coverage) was not found in the Daphna assembly endoreplicated fraction (Figure 1C and 1D). (black area left of Figure 3B and 3D) but is mostly present in the CR0040 assembly. Regarding the completeness of the Daphna reference assembly, the spectra-cn plots (Figure 3B and 3D) Gene and transposable element annotation showed that part of the heterozygous fraction was missing The assembled genome supplemented with transcriptomic data (orange arrows), and some k-mers were in overrepresented from nine distinct tissues made it possible to identify 59 128 pro- copies (>23) in both heterozygous and homozygous fractions tein-coding genes (26 392 for haplotype A and 32 736 for haplotype (Figure 3B, black arrows). The spectra-cn diagram also showed B), 90.31% of which could be associated with a function heterozygous content present two or more times instead of (Supplemental Note 3; Supplemental Tables 5–10). Sixty-seven once in this assembly (Figure 3, black arrows), which could percent of the predicted genes were anchored onto the 14 chromo- indicate spurious duplications. As a whole, our CR0040 some pairs and the remaining 33% onto the two random mosaic 4 Plant Communications 3, 100330, September 12 2022 ª 2022 The Author(s). Partial endoreplication in Vanilla planifolia challenges genome assembly Plant Communications A B D C F E G H I Figure 2. Cytogenetic analysis of Vanilla planifolia CR0040. (A–D) Orcein staining: (A and B) mitotic metaphases with 2n = 32 chromosomes; (C) karyotype corresponding to (B); (D) hypoaneuploid mitotic metaphase with 2n = 28 chromosomes; (E) karyotype corresponding to (D); (F) interphase nuclei showing heterochromatic chromocenters; (G) DAPI- stained interphase nucleus showing unspecific heterochromatin; (H) chromomycin fluorochrome staining with two CMA+ regions (arrows) corresponding to rDNA sites; (I) Hoechst-stained AT-rich DNA in metaphase and interphase nucleus (IN), with two fully heterochromatinized chromosomes (arrows). Scale bars represent 10 mm. chromosomes that were constructed from the unanchored scaf- database. In total, 72% of the assembly consisted of repeats, folds and contigs (Figure 4A, blue distributions). We estimated including single-sequence repeats (15.4%), and 9.7% of other the annotation completeness at 93.2% with the Benchmarking low-complexity regions (Supplemental Note 3; Supplemental Universal Single-Copy Orthologs approach using the Viridiplantae Table 11). A high content of retrotransposons was found (41.5%), Plant Communications 3, 100330, September 12 2022 ª 2022 The Author(s). 5 Plant Communications Partial endoreplication in Vanilla planifolia challenges genome assembly Total assembly size (Gb) 3.4 To identify whole-genome duplications (WGDs), pairwise genome synteny analyses between CR0040, Daphna, and P. aphrodite and Total contig number 24 534 within themselves were carried out (Supplemental Figures 7 and 8; Contig N50 length (Mb) 0.924 Supplemental Notes 4 and 6). The CR0040 haplotype A dot plot Maximum contig length (Mb) 31 validated at least one pan-orchid WGD (a , the origin of the paleo-allotetraploid) previously found by Hasing et al. (2020). An GC content (%) 31.6 additional dot plot diagonal and dS peak suggested a second Number of protein-coding genes 59 128 WGD, possibly the tau of Monocots (tm). Benchmarking Universal Single-Copy 93.2 Orthologs completeness (%) Detection of non-endoreplicated regions Total of interspersed repeats (%) 47.0 PE induces highly unbalanced DNA representation with a P/F DNA ratio ranging from 3 to 10, depending on the tissue. This was re- Table 1. HiFi assembly and annotation statistics of the diploid flected in our assembly by highly variable sequencing depth CR0040 genome (Figure 4A, green lane). The two random mosaic chromosomes that showed a low sequencing depth at most loci may therefore contain a large part of the non-endoreplicated F fraction of the whereas the content of DNA transposons was low (1.4%). The long genome. It is likely that a large number of unanchored sequences terminal repeat retrotransposon content was richer in Gypsy originate from the two fully heterochromatinized chromosomes (9.7%; Figure 4A, purple distributions) than in Copia (6.1%; observed in the interphase nuclei (Figure 2I), possibly Figure 4A, orange distribution), although a number of annotated chromosome pairs 15 or 16. The remaining unanchored retrotransposons (12.5%) were not more precisely classified. The sequences should correspond to missing fractions in the two random mosaic chromosomes were enriched in repeats and anchored chromosomes. Interestingly, the sequencing reads showed low gene density and low sequencing depth (Figure 4A, that mapped to the CR0040 and Daphna assemblies also green distributions, and Supplemental Table 12). Indeed, showed intra-chromosomal sequencing depth variations compared with the 14 chromosome sequences, the unanchored (Figure 4B). These patterns were consistent, regardless of the regions showed higher proportions of long interspersed nuclear technology used. To observe this phenomenon globally on all element sequences (8% and 14.05%), and this was true for both chromosomes and genomes with all technologies (HiFi, ONT, haplotypes. By contrast, DNA transposons (3.14% and 0.93%), and Illumina), sequencing depth analysis tools were used and short interspersed nuclear elements (0.12% and 0.05%), and manual validation performed (Supplemental Note 5; long terminal repeats (21.67% and 15.57%) represented a larger Supplemental Tables 15 and 16). Two patterns of sequencing part of the 14 chromosome sequences than of the unanchored re- depth variation were identified along all chromosomes. The first gions. The biggest difference in unanchored regions was observed one (indicated with a dotted box labeled ‘‘1’’ in Figure 4B and for unclassified retrotransposons, which represented 16.76% of Supplemental Figure 9) corresponded to a sharp decrease in the unanchored sequences versus 3.28% of the 14 chromosome sequencing depth for both cultivars, with all sequencing sequences. Main genome annotation statistics are synthesized in technologies, which dropped down from 453–1203 to less than Table 1. 203. Surprisingly, this pattern occurred independently on the two haplotypes. A total of 37 very low-coverage regions with V. planifolia pangenomics and whole-genome this pattern (from 0.4 to 6 Mb in length) were identified along the duplication chromosomes (24 in haplotype A and 13 in haplotype B) for a cu- The comparison of the four mosaic haplotypes from the two mulative size of 60.1 Mb. In a large portion of these regions, we V. planifolia cultivars, CR0040 and Daphna (Supplemental found low gene density and high repeat density. This pattern could Tables 12 and 13), showed that the 14 pseudomolecules of correspond to non-endoreplicated regions present in both the CR0040 were shorter and contained fewer genes than those of Daphna and CR0040 genomes. The fact that these patterns are Daphna and that a large number of regions in the CR0040 systematically located at junctions between super-scaffolds is pseudomolecules (haplotype A or B) were not located in the consistent with the decrease in sequencing depth caused by Daphna pseudomolecules (Supplemental Figure 4). Pan- non-endoreplication, which impaired the assembly of the endore- genomic analysis of the orthogroups from proteomes derived plicated regions located on either side. The second pattern (indi- from the 14 chromosomes only (Supplemental Figure 5; cated with a dotted box labeled ‘‘2’’ in Figure 4B and Supplemental Table 14; Supplemental Note 4) indicated that the Supplemental Figure 9) corresponded to 36 regions (from 1.2 core genome was composed of 14 210 families and 77 692 Mb to 20 Mb in length; cumulative size of 207.2 Mb) with genes (35 972 CR0040 and 41 720 Daphna). The dispensable segmental sequencing depth variation present in CR0040 (with genome of CR0040 contains 1266 families and 3613 genes HiFi, ONT, and Illumina) but not in Daphna (with ONT and specific to CR0040. The dispensable genome of Daphna Illumina). Furthermore, these variations were syntenic along the contains 3997 Daphna-specific families and 13 645 genes. two haplotypes, but the direction of variation was inverted Finally, we looked at the expansion or reduction of gene families between the two phases. Their respective levels of sequencing in relation to six proteomes (CR0040, Daphna, Phalaenopsis depth differ by a factor of about three in CR0040. The cause of equestris, Phalaenopsis aphrodite, A. thaliana, and Oryza sativa; these apparently coordinated sequencing depth inversions Supplemental Figure 6). From an orchid perspective, the between CR0040 haplotypes remains unclear. After analyzing expansion number for the orchid node is rather low (+36), the locations of these k-mers in the CR0040 assembly, it whereas the Daphna-specific number is rather high (A +1841 appeared that these low depth k-mers (between 53 and 153) and B +1943) compared with CR0040 (A +418 and B +826). were mostly present in the unanchored part of the genome 6 Plant Communications 3, 100330, September 12 2022 ª 2022 The Author(s). Partial endoreplication in Vanilla planifolia challenges genome assembly Plant Communications Figure 3. Assembly k-mer content comparison between CR0040 PacBio HiFi long reads and Daphna Illumina short reads using spectra-cn graph. (A–D) The x axis represents k-mer multiplicity (counts), and the y axis indicates the number of distinct k-mers multiplied by their counts. Because of different sequencing depths between read sets, the y axis upper values are 109 for (A) and (B) and 108 for (C) and (D). The area colors indicate the number of k-mer copies found in the assembly (black: 03 or missing k-mers, red: 13, purple: 23, green: 33, blue: 43, and orange: 53). Four spectra-cn plots are presented: (A) Daphna reads versus CR0040 assembly, (B) Daphna reads versus Daphna assembly, (C) CR0040 reads versus CR0040 assembly, and (D) CR0040 reads versus Daphna assembly. The red arrows point toward a low-coverage k-mer distribution not expected in a diploid genome assembly spectra-cn graph. The black arrows point toward the heterozygous (on the left) and homozygous (on the right) k-mer distributions expected in a diploid genome assembly. The orange arrows point toward missing k-mers in the heterozygous k-mer distribution. The lower the black distribution at this location, the fewer k-mers are missing in the assembly. compared to the chromosome sequences (Figure 5), with median were localized on approximately half of the assembled ratio values equal to 0.27 and 0.036, respectively, showing a chromosome 7A sequence. significant difference (Wilcoxon-Mann-Whitney test; p = 4e 13). However, chromosomes 7A and 6B were outliers, showing also a high proportion of low-depth k-mers. In addition, the distribution Orthologs of cell cycle regulator genes involved in of these k-mers along the genome was globally consistent with the A. thaliana endoreplication areas identified, except for some discrepancies (Supplemental A search for orthologs of the CDK and cyclin (Cyc) families of Figure 10). Chromosomes 6B and 7A showed strong signals in A. thaliana, involved in the regular endoreplication mechanism, terms of low-depth k-mer proportions, as already pointed out in showed that representatives of these two families were indeed Figure 5. Indeed, on chromosome 6B, k-mers of this type were found in the proteomes of CR0040 and P. aphrodite positioned on nearly all the assembled sequence, whereas they (Supplemental Table 17). However, the number of genes Plant Communications 3, 100330, September 12 2022 ª 2022 The Author(s). 7 Plant Communications Partial endoreplication in Vanilla planifolia challenges genome assembly A Figure 4. Overview of the assembled vanilla genome. (A) Circos plot of the genomic content along V. planifolia haplotypes A and B and the relationship between them. All tracks are divided into 500 kb genomic windows. From the outside to the inside of the circular representation, ideograms of 28 chromosomes and two random mosaic chromosomes that contain the unanchored scaffolds are shown. Gene density (blue) and interspersed repeat RepeatMasker hit density (black: retroelements; orange: long terminal repeat/Copia; purple: long terminal repeat/Gypsy) are shown. Sequencing depth was obtained by mapping CR0040 PacBio HiFi reads on the assembly (green) and N density (gray). Syntenic blocks across haplotypes are connected by lines in the innermost part of the figure. (B) Sequencing depth along the CR0040 A03 and B03 chromo- somes (red rectangles) obtained by mapping Daphna Illumina (yel- low) and ONT (pink) reads and CR0040 PacBio HiFi (blue), Nano- pore (green), and Illumina (gray) reads onto the CR0040 assembly. Synteny between homologous chromosomes is repre- sented by red boxes. Gaps (N stretches) that explain sudden drops in sequencing depth are shown with white blocks. (1) Low level of sequencing depth for all data is shown. (2) Inverted level of sequencing depth for CR0040 between haplotypes A and B and constant level of sequencing depth for both Daphna haplotypes are shown. Gene and retrotransposon distributions along the chro- mosomes are represented by a blue line chart and a stacked histo- B gram (Copia: red; Gypsy: purple; other retrotransposons: black), respectively. genes encoding regulatory proteins of the CDK–Cyc com- plexes, all of them had orthologs in both orchids. However, it appeared that these multigenic families were slightly un- der-represented in the CR0040 gene annotation compared with those of A. thaliana and P. aphrodite. Finally, an imbal- ance between the A and B haplotypes was observed for Fizzy-related proteins and CDK inhibitor (Krp) orthologs. Vanilla Genome Hub The Vanilla Genome Hub (VGH) (https://vanilla-genome-hub. cirad.fr) has been developed to support post-genomics ef- forts. It centralizes vanilla genomic information with a set of user-friendly interconnected modules and interfaces for the analysis and visualization of genomic data. From the main menu of the VGH (Supplemental Note 6; Figure 6A), the search for genes of interest to biologists is simplified using the interoperable system by the identification of paralogous genes using keywords and sequence homology (Figure 6B and 6C) and the production of an information report with gene name, gene localization, and polypeptide function (Figure 6D). The genome browser was built to offer tracks of supplemental information, such as GC content, gene structure, gene expression, DNA-seq depth, and repeat composition to support the identification of new genes of interest (Figure 6E and Supplemental Figure 11). A metabolic pathway reconstruction and visualization tool enables the identification of annotated genes involved in pathways (Figure 6F). A Gene Ontology enrichment tool encoding CDKs and Cycs found via the orthogroups approach was enables testing and visualization of enrichment according to a lower for these two species. For example, the gene encoding Cyc- Gene Ontology category of a gene group (Figure 6G). Finally, D3-1 in A. thaliana (At4g34160) was part of a species-specific or- comparative analysis at the genome scale is supported by an thogroup that contained some other D-type Cyc genes. Regarding interactive multiscale synteny visualization (Figure 6H). 8 Plant Communications 3, 100330, September 12 2022 ª 2022 The Author(s). Partial endoreplication in Vanilla planifolia challenges genome assembly Plant Communications Figure 5. Ratio of k-mers within unanchored and anchored CR0040 genome. This boxplot shows the ratio of k-mers with a depth less than 15 in our HiFi reads within unanchored sequences (blue) and within chromosomes (or- ange). genome published previously (Hasing et al., 2020). This difference could be explained by the fact that the CR0040 genome was assembled from HiFi reads that enabled the assembly of repetitive regions in different contigs despite their low sequencing depth. We were thus able to assemble a greater number of repeated sequences that may correspond to a large fraction of the non- endoreplicated genome and were missing from the Daphna genome assembly. The bio- logical reality of this hypothesis is reinforced by the consistency of the k-mer depth profiles and sequencing depth patterns that resulted from the mapping of DISCUSSION reads from different sequencing technologies (HiFi, 10X, and Flow cytometry and cytogenetic data validate genome ONT) tested in this study for CR0040 (Supplemental Figure 9). A size and chromosome content k-mer spectra-cn diagram is an efficient tool for visually Genome size, ploidy level, and chromosome content of the comparing the k-mer compositions of reads and assemblies. V. planifolia CR0040 cultivar were validated by FCM and cytoge- Such diagrams are used to validate diploid or haploid assembly netic analyses. The estimated size of 4.09 Gb indicated a ploidy quality (Yen et al., 2020). The k-mer spectra-cn diagram clearly level similar to those of other traditional diploid V. planifolia cul- shows a general diploid pattern, with a heterozygous distribution tivars (Bory et al., 2008; Lepers-Andrzejewski et al., 2011). containing only k-mers present once in the assembly and a ho- Estimation of endoreplication levels confirmed PE, as mozygous distribution containing, as expected, only k-mers pre- previously described in V. planifolia (Brown et al., 2017). This sent twice in the assembly. Unexpectedly for a diploid genome, species was shown to exhibit diploidized meiotic chromosome this figure includes a third distribution that is located in the low- pairing with 16 bivalents (Bory, 2007). This result coverage area of the diagram. The color pattern shows clearly demonstrates the complete diploidization of this supposed that these k-mers present in low frequencies (5–15 times) are segmental paleo-allotetraploid (Ravindran, 1979; Nair and also present in our assembly. These k-mers represent the non- Ravindran, 1994). The same meiotic observation was also repeated fraction of low-coverage sections of the assembly, performed for Vanilla 3 tahitensis by Lepers-Andrzejewski which are mainly located in the unanchored sequences but are et al. (2011). Aneuploid chromosome numbers were frequently also present in low-coverage sections of other chromosomes. observed in mitotic metaphases of V. planifolia (Nair and Even if the unanchored sequences are mainly built of repeats, Ravindran, 1994; Bory et al., 2008), possibly owing to the they also harbor genes and other non-repeated blocks, and these observed mitotic associations that could lead to unequal portions are large enough in terms of k-mers to generate this un- anaphase separation. This may lead to errors in the evaluation expected k-mer distribution in the spectra-cn plot. These k-mers of basic chromosome number, as was the case in a recent are not present in the public V. planifolia Daphna assembly, and paper in which the authors considered that the basic number therefore, the corresponding distribution is black in Figure 3B. was x = 14 (Hasing et al., 2020). The phenomenon of aneuploidy apparently occurs only in somatic cells, whereas Molecular signatures of partial endoreplication meiosis appears to be regular, with a stable number of The abundance of interspersed repeats detected in CR0040 was chromosomes (Bory, 2007). Although the CR0040 assembly is consistent with already mentioned data in other orchids, such as more complete than that of Daphna, only 14 pseudomolecules P. equestris (Cai et al., 2015) and P. aphrodite (Chao et al., 2018), were obtained because CR0040 scaffolds were anchored on and in other lineages, like the Oryza genus (Stein et al., 2018). The the 14 Daphna pseudomolecules. Chromosomes 15 and 16 high content of retrotransposons and low content of DNA are probably non-endoreplicated and present in the transposons were in the range of what has been found for unanchored part (CR0040_A0 and CR0040_B0). different orchids (Cai et al., 2015; Chao et al., 2018). High repeat content was found in candidate non-endoreplicated re- PE hinders whole-genome assembly gions, which is in agreement with previous descriptions in other Given the CR0040 diploid genome size estimate of 4.09 Gb by orchids (Chumová et al., 2021). Furthermore, some types of FCM, our genome assembly represented around 83% of the ex- repeats may be preferentially found in non-endoreplicated re- pected genome size and was twice the size of the Daphna gions, as shown by differences in repeat proportions, particularly Plant Communications 3, 100330, September 12 2022 ª 2022 The Author(s). 9 Plant Communications Partial endoreplication in Vanilla planifolia challenges genome assembly Figure 6. Overview (screen shots) of some interoperable vanilla genome analysis tools integrated into the Vanilla Genome Hub. (A) Main menu. (B) Gene search (Tripal MegaSearch). (C) Sequence homology search (Blast). (legend continued on next page) 10 Plant Communications 3, 100330, September 12 2022 ª 2022 The Author(s). Partial endoreplication in Vanilla planifolia challenges genome assembly Plant Communications retrotransposon proportions, between assembled chromosomes specific sequences called internal eliminated sequences (IESs) and unanchored sequences. Thus, long interspersed nuclear el- (Bracht et al., 2013; Sellis et al., 2021). However, FCM ements, for example, occupy a larger portion of the unanchored approaches in Ludisia discolor, an orchid subject to PE, have regions than of the 14 chromosomes, even though these regions ruled out the possibility of such DNA elimination and favor the are overrepresented in the present assembly. However, the lack hypothesis of under-replication (Hribová et al., 2016). Under- of a more detailed annotation of the retrotransposon class ham- replication has also been studied in several organisms, such as pers the search for a potential preferential distribution of repeat Drosophila, in which it has been proposed that a reduction in families between endoreplicated and non-endoreplicated re- the expression of genes involved in DNA replication may lead to gions. It is therefore crucial to better annotate these repeats in or- a slower mitosis S phase and an incomplete replication of der to determine exactly which kinds are found preferentially in genomic regions during late S phase (Lilly and Spradling, 1996). the two fractions of the genome. On the other hand, the distribu- Molecular mechanisms described in Drosophila highlighted an tion of genes in the genome shows the opposite trend, with inhibition of replication fork progression involving Rif1 protein, approximately two-thirds of the protein-coding sequences local- which interacts with the SUUR protein (Munden et al., 2018; ized in the anchored region. Armstrong et al., 2019). The distinct sequencing depth profiles observed between CR0040 Finally, cytogenetic studies using in situ hybridization techniques and Daphna probably reflected a tissue-specific endoreplication (fluorescence in situ hybridization [FISH] and genomic in situ pattern. Indeed, the nodes used to sequence the CR0040 genome hybridization) could also be used to increase our knowledge of are growing and differentiating tissues, whereas the leaves used to the molecular signatures of PE (Younis et al., 2015). A recent sequence the Daphna genome are composed of fully differenti- advance in FISH is the development of probes based on ated cells. The irregular haplotype-specific endoreplication synthetic oligonucleotides specific to repetitive sequences or pattern (segmental or not) observed in CR0040 could thus result to particular chromosome regions (Jiang, 2019). This new from a peculiar physiological activity. Whatever the reason, this generation of FISH probes in plants has been applied to intriguing pattern suggests a complex and fine regulation of PE species with sequenced genomes, such as Zea and Cucumis at the chromosome level, which deserves further study. Although species (Han et al., 2015; Martins et al., 2019; Braz et al., 2020; no previous study has demonstrated the mechanisms underlying Zhang et al., 2021a). Endoreplicated versus non-endoreplicated PE in orchids, many works have focused on the regulation of reg- genomic regions could be used to synthesize oligo-based FISH ular endoreplication found in a large number of plant species and probes specific to each fraction in order to precisely locate these well analyzed in tomato and Arabidopsis (Lang and Schnittger, PE signatures on chromosomes. The genomic in situ hybridiza- 2020). The common mechanism that triggers endoreplication is tion technique uses the total genomic DNA of a species, in a downregulation of mitotic CDK activity to suppress mitosis and contrast to FISH. We hypothesize that hybridizing the total a fine regulation of this activity throughout the induced DNA of highly endoreplicated nuclei (16E and 32E) to CR0040 endocycle, with an alternation between high and low activity chromosomes would induce a more intense hybridization signal levels at specific checkpoints in order to maintain the replication in endoreplicated regions, thus enabling us to identify non- process (De Veylder et al., 2011; Shimotohno et al., 2021). CDK endoreplicated areas that showed little hybridization. controls cell-cycle progression and mitosis entry via its phosphorylation activity, which is activated by association with CYC proteins. Recently, Inada et al. (2021) demonstrated the Impact of technologies on whole-genome evolution involvement of actin and actin-binding protein in the regulation analysis of A. thaliana endoreplication. The present whole-genome analysis The strategy of combining optical mapping with HiFi long- made it possible to identify orthologous CDK, Cycs, CDK-activa- read sequencing for the CR0040 genome assembly resulted in tors and repressors, and actin depolymerizing factors in a haplotype A with 14 pseudomolecules of better quality and V. planifolia CR0040. A first step in understanding orchid PE would with fewer scaffolding errors than the Daphna haplotype A, which therefore be to further analyze these molecular regulators. Indeed, was built with Hi-C and ONT technologies (Hasing et al., 2020). the recognition of orthologs and paralogs in large gene families, Indeed, comparisons between the Daphna and CR0040 A such as the CDK–Cyc complex, is challenging and requires deeper haplotypes revealed a dual-haplotype conservation problem in investigation by high-quality manual annotation of the genes of in- the Daphna phased assembly, which is reflected in the Daphna terest (Vaattovaara et al., 2019). Hi-C scaffolding. The use of HiFi long reads and optical maps enabled more accurate haplotype separation, as shown in ek et al., Although PE seems specific to orchids in plants (Trávnı́c previous works (Matthews et al., 2018; Du et al., 2020). In the 2015), this phenomenon of under-represented genomic regions case of CR0040, not only did HiFi enable better assembly of is well known in metazoans. Ciliates, such as Paramecium tet- non-endoreplicated regions but also Hifiasm allowed better sep- raurelia and Tetrahymena thermophile, show programmed DNA aration of haplotypes. These improvements were therefore elimination following endoreplication in their macronucleus necessary to better solve the sequencing of the complex vanilla (MAC), involving chromosome fragmentation and elimination of genome, with a high rate of heterozygosity (Ho V. planifolia (D) Gene report (Tripal). (E) Genome Browser (JBrowse). (F) Metabolic pathway visualization (Pathway Tools). (G) Gene Ontology enrichment (DIANE). (H) Comparison of genomic sequences (SynVisio). Plant Communications 3, 100330, September 12 2022 ª 2022 The Author(s). 11 Plant Communications Partial endoreplication in Vanilla planifolia challenges genome assembly cultivars = 0.362; Favre et al., 2022) and subjected to PE (Brown lecular mechanisms of PE with appropriate plant material, bio- et al., 2017). However, this dual-haplotype conservation problem technologies, and bioinformatics tools. observed in Daphna, and not in CR0040, impacted comparative pan-genomics analyses and distorted the results obtained. METHODS Thus, the differences observed between the two V. planifolia ge- nomes (number of paralogs, numbers of gene families with Cytometry, cytogenetics, and DNA sequencing expansions and contractions, and complete and duplicated A traditional vanilla cultivar (CR0040) from Reunion Island was used in this Benchmarking Universal Single-Copy Orthologs scores) could study (Supplemental Note 1). FCM and cytogenetics studies were performed using protocols described in Supplemental Note 1. High- be explained by these mosaic assembly problems and therefore molecular-weight DNA and ultra-high-molecular-weight DNA were by an incorrect separation between haplotypes A and B. extracted from node tissues and sequenced using PacBio HiFi, ONT, and Illumina technologies (Supplemental Note 1). Optical genome maps Monocot genome evolution analyses were carried out using were produced using the Bionano Genomics protocol and the Saphyr the high-quality haplotype A sequence of CR0040. Dot plot re- G1 System (Supplemental Note 1). sults were in agreement with the fact that V. planifolia is a dip- loidized paleo-polyploid species with a primary basic chromo- Genome assembly and analysis some number x = 8 and a secondary basic number x = 16, as HiFi reads were assembled into contigs using Hifiasm 0.13 with default described for the whole Vanilla genus (Felix and Guerra, 2005). parameters (Cheng et al., 2021). The hybrid scaffolding between DNA Moreover, only one locus (two spots) of rDNA (18S-5.8S- contigs and optical genome maps was performed using the hybrid 26S) was identified in the genome of V. planifolia by cytoge- Scaffold pipeline of Bionano Genomics with default parameters. These netic approaches, which provides additional evidence of an scaffolds were phased into two haplotypes using in-house scripts, and ancient diploidization of this supposed segmental paleo- the unscaffolded contigs were phased using purge dups (https://github. com/dfguan/purge_dups). Then, pseudomolecules were reconstructed allotetraploid. Finally, two WGDs, possibly corresponding to using alignments of the phased assembly on Daphna chromosomes a and tm, were highlighted, as also described for the Den- (Hasing et al., 2020; Supplemental Note 2). The assembly quality was drobium chrysotoxum chromosome-scale genome assembly estimated with QUAST 5.1.0 (Gurevich et al., 2013) and using the (Zhang et al., 2021b). approach of Benchmarking Universal Single-Copy Orthologs (version 5.0.0) (Simao et al., 2015; Supplemental Note 2). The k-mer analysis was performed with kat 2.4.2 using the comp tool (Mapleson et al., Efficiency of an integrative approach combining 2017). The plot script was slightly modified to project on the y axis the cytogenetics with high-quality whole-genome number of distinct k-mers multiplied by the k-mer multiplicity instead of sequencing just the number of distinct k-mers. In parallel, k-mers of size 47 with a In this study, we confirmed the size and structure of the depth between 5 and 15 were extracted within PacBio sequences using Jellyfish 2.3.0 (Marcais and Kingsford, 2011). These k-mers were V. planifolia genome using both cytogenetics and nuclear DNA- repositioned on our reference using the tool ‘‘query_per_sequence’’ seq methods. The particular phenomenon of PE at play in many (https://github.com/gmarcais/Jellyfish/tree/master/examples/query_per_ orchids has been explored at the chromosome level for the first sequence), and the ratio of these k-mers was computed among each time in plants, to our knowledge. Our data showed that the sequence of our genome. These sequences were split between non-endoreplicated sequences are very predominantly made chromosomes and unanchored sequences, and the repartition of the up of repeated sequences. This confirmed, at the genomic level, k-mer ratio was drawn using the python seaborn library (https:// previous findings in orchids by Chumová et al. (2021) based on a seaborn.pydata.org/). phylogenetics generalized least squares model and by Brown et al. (2017), who used nuclei imaging to demonstrate that in Structural and functional genome annotation Vanilla, on the other hand, the endoreplicated part was Automatic gene prediction was performed on CR0040 contigs with the Eu- transcribed. We nevertheless revealed that 33% of the 59 128 Gene Eukaryotic Pipeline (EGNEP version 1.5) (Sallet et al., 2019; annotated protein-coding genes were present in the two Supplemental Note 3). Transcriptomic data from CR0040 were produced using RNA sequencing of nine organs with Illumina technology random mosaic chromosomes, corresponding mainly to the (Supplemental Note 3). In addition, gene expression profiles and putative non-endoreplicated part, as shown by low sequencing depth. novel isoforms were identified with StringTie v.2.0.3 (Kim et al., 2019; In addition, a thorough examination of sequencing depths along Supplemental Note 3). Transcriptomic data from V. planifolia cultivars anchored chromosomes with three different technologies re- (CR0040, Daphna [NCBI BioProjects: PRJNA668740 and PRJNA633886], vealed 73 regions whose different endoreplication levels vary and an unspecified cultivar [NCBI GEO: GSE134155]); proteomic data from with haploid phase, half of which may be linked to tissue type V. planifolia Daphna (Hasing et al., 2020), P. equestris (NCBI BioProject: (leaves versus nodes). This last conclusion remains to be PRJNA382149), and the Liliopsida class (Swissprot: 2020_06); and a confirmed with DNA-seq from different tissues of the same custom orchid-specific statistical model for splice-site detection were cultivar. This work constitutes considerable progress in our un- used for this analysis (Supplemental Note 3). Functions were assigned derstanding of V. planifolia genomics and sheds light on the through InterProScan domain searches as well as similarity searches against the UniProt/Swissprot and UniProt/TrEMBL databases (BlastP). most relevant methodologies for further deciphering this complex Gene Ontology terms were assigned through InterProScan (Jones et al., genome and the PE phenomenon. The VGH was built to help the 2014) results, and enzyme classification numbers were predicted by community to address major unresolved questions about vanilla, combining the tools PRIAM (Claudel-Renard et al., 2003) and such as PE, biosynthesis of aromatic compounds, and resistance BlastKOALA (Kanehisa et al., 2016). to pathogens. We are working on a new version of the vanilla nu- clear genome sequence that will be improved in terms of haplo- Repeats were first identified using RepeatModeler v.2.0.1 (Flynn et al., type separation, chromosome reconstruction, and gene and 2020), RepeatScout v.1.0.5, and transposable element genes predicted repeat element annotation in order to further investigate the mo- from EGNEP annotation and then classified with REPET v.3.0 (Flutre 12 Plant Communications 3, 100330, September 12 2022 ª 2022 The Author(s). Partial endoreplication in Vanilla planifolia challenges genome assembly Plant Communications et al., 2011) and PASTEC v.2.0 (Hoede et al., 2014) according to Wicker’s SUPPLEMENTAL INFORMATION transposable element classification (Wicker et al., 2007). After cleaning Supplemental information can be found online at Plant Communications steps (see details in Supplemental Note 3), repeats were clustered using Online. CD-HIT v.4.8.1 (Fu et al., 2012) to produce two banks of repeats. The CR0040 genome was then annotated for repeats using previous banks, RepeatMasker v.4.1.1 (Tarailo-Graovac and Chen, 2009), and bedtools FUNDING This work was supported by grants from Eurovanille and V. Mane Fils intersect v.2.29.2 (Quinlan and Hall, 2010). companies. The research was co-funded by the Centre de Coopération Internationale en Recherche Agronomique pour le Développement Genomic comparisons and reconstruction of gene families (CIRAD), the Université de La Réunion (UR), the Institut National de Re- cherche pour l’Agriculture l’Alimentation et l’Environnement (INRAE), the To compare the 14 haplotype A chromosomes of both vanilla cultivars, Centre National de la Recherche Scientifique (CNRS), and the Etablisse- check the completeness of the Vanilla genome, and study the pan- ment Vanille de Tahiti (EVT). This work was also supported by grants orchid a WGD, a series of analyses were performed with the CoGe Syn- from the European Regional Development Fund (ERDF), the Conseil Map pipeline as described in Supplemental Note 4. Gene family Régional de la Réunion, and the Conseil Départemental de la Réunion. reconstruction was performed using OrthoFinder2 (v.2.4.0) (Emms and This work was supported by France Génomique National infrastructure, Kelly, 2019). Genes known to be involved in cell cycle control in funded as part of ‘‘Investissement d’avenir’’ program managed by Agence A. thaliana, such as Cycs, CDKs, and known regulators of these genes, Nationale pour la Recherche (contrat ANR-10-INBS-09) and has also were searched in the CR0040 and P. aphrodite proteomes with a benefited from Imagerie-Gif core facility supported by l’Agence Nationale combination of BlastP searches and orthogroups. This analysis was de la Recherche (ANR-11-EQPX-0029/Morphoscope, ANR-10-INBS-04/ applied to CDK-A and B types as well as Cyc-A, B, and D types. FranceBioImaging; ANR-11-IDEX-0003-02/Saclay Plant Sciences). Regulators of these genes included CDK inhibitor (KRP), transcriptional repressor ILP1, WEE1, actin depolymerizing factor, and Fizzy-related proteins. AUTHOR CONTRIBUTIONS C.J., M.D., M.G., and P.B. contributed to conceptualization of the study. C.C., C.J., G.S., M.B., M.D., O.B., S.B., and W.M. designed the experi- Detection of non-endoreplicated genomic regions ments. M.B. and S.S.-Y. performed flow cytometry and cytogenetic ex- Reads from each sequencing technology used in this study (HiFi, ONT, periments and analyses. L.B. and J.Z. contributed to the funding of the and Illumina reads from CR0040), as well as ONT and Illumina reads research, monitored the progress of the work, and supported the re- from Daphna, were mapped onto the CR0040 assembly. Illumina short searchers throughout the project. C.C., C.L.-R., O.B., and W.M. per- reads and long reads (HiFi and ONT) were mapped onto the CR0040 as- formed nucleic acid preparation and sequencing. C.C., C.J., C.K., G.D., sembly using BWA-MEM2 (Vasimuddin et al., 2019) and Minimap2 (Li, G.S., Q.P., S.B., and W.M. performed sequence analyses and assemblies. 2018), respectively. Sequencing depths were averaged for genomic C.C., G.D., Q.P., S.B., and S.L.-A. performed genome annotation and built windows of 20 kb. To detect sequencing depth bias and limit the risk of the genome hub. C.C., C.J., C.K., M.G., Q.P., and W.M. outlined the detecting false positives, the mean sequencing depth for every 20 manuscript and wrote first drafts. C.C., C.J., C.K., C.L.-R., G.D., G.S., successive 20-kb windows was computed using Illumina reads for M.B., M.D., M.G., P.B., Q.P., S.B., S.S.-Y., and W.M. provided Daphna and using long reads (HiFi and ONT) for CR0040. Identified input and revisions to the manuscript. regions were manually validated and refined by visualization of sequencing depth drops for each CR0040 chromosome and for all ACKNOWLEDGMENTS available sequencing datasets (see details in Supplemental Note 5). We are grateful to Jean Bernard Dijoux and Katia Jade for preparing the plant material and to the Plant Protection Platform (3P, IBISA) for lab facil- VGH ities and access to plant resources (BRC Vatel). We acknowledge the SouthGreen Bioinformatics Platform (http://www.southgreen.fr/) for ac- The VGH was constructed using the Tripal system, a specific toolkit for the cess to computational resources and the GeT-PlaGe platform (INRAE, construction of online community genomic databases, by integrating the Toulouse, France) for the use of sequencing facilities. Finally, the authors GMOD Chado database schema and the Drupal open-source platform would like to thank the reviewers for their suggestions, which helped to (https://www.drupal.org/). The VGH implements a set of interconnected improve the manuscript. No conflict of interest is declared. modules and user-friendly interfaces (details in Supplemental Note 6). Received: October 30, 2021 Data availability Revised: April 10, 2022 Accepted: April 27, 2022 The chromosome assembly and accompanying data received the Published: May 5, 2022 following identifiers in NCBI: BioProject (with SRA database) ID: PRJNA753216 (haplotype A) and PRJNA754028 (haplotype B) REFERENCES BioSample (node) SAMN20691751. Armstrong, R.L., Penke, T., Chao, S.K., Gentile, G.M., Strahl, B.D., Matera, A.G., McKay, D.J., and Duronio, R.J. (2019). H3K9 RNA sequencing data are readily accessible on the NCBI portal: promotes under-replication of pericentromeric heterochromatin in BioSamples SAMN20691786 (fruit), SAMN20691787 (leaf), SAMN20691 Drosophila salivary gland polytene chromosomes. Genes 10:93. 788 (flower), SAMN20691789 (stem), SAMN20691790 (soil root), https://doi.org/10.3390/genes10020093. SAMN20691791 (aerial root), SAMN20691792 (bud), SAMN20691793 (flower bud), SAMN20691794 (ovary), SAMN20691795 (mixed tissues) Bhosale, R., Boudolf, V., Cuevas, F., Lu, R., Eekhout, T., Hu, Z.B., Van and SRA: SRR15411867 (mixed tissues), SRR15411868 (ovary), Isterdael, G., Lambert, G.M., Xu, F., Nowack, M.K., et al. (2018). A SRR15411869 (flower bud), SRR15411870 (bud), SRR15411871 (aerial spatiotemporal DNA endoploidy map of the Arabidopsis root reveals root), SRR15411872 (soil root), SRR15411873 (stem), SRR15411874 roles for the endocycle in root development and stress adaptation. (flower), SRR15411875 (leaf), and SRR15411876 (fruit) Plant Cell 30:2330–2351. https://doi.org/10.1105/tpc.17.00983. Bory, S. (2007). Diversity of Vanilla planifolia in the Indian Ocean and its In addition, these data and various exploration tools are accessible at VGH Related Species : Genetics, Cytogenetics and Epigenetics Aspect (https://vanilla-genome-hub.cirad.fr/). (France: Université de La Réunion). Plant Communications 3, 100330, September 12 2022 ª 2022 The Author(s). 13 Plant Communications Partial endoreplication in Vanilla planifolia challenges genome assembly Bory, S., Catrice, O., Brown, S., Leitch, I.J., Gigant, R., Chiroleu, F., and relationships with allied species of the clonally propagated crop Grisoni, M., Duval, M.F., and Besse, P. (2008). Natural polyploidy in Vanilla planifolia Jacks. Ex Andrews. Genet. Resour. Crop Ev. https:// Vanilla planifolia (Orchidaceae). Genome 51:816–826. https://doi.org/ doi.org/10.1007/s10722-022-01362-1. 10.1139/G08-068. Felix, L.P., and Guerra, M. (2005). Basic chromosome numbers of Bourdon, M., Pirrello, J., Cheniclet, C., Coriton, O., Bourge, M., terrestrial orchids. Plant Syst. Evol. 254:131–148. https://doi.org/10. Brown, S., Moise, A., Peypelut, M., Rouyere, V., Renaudin, J.P., 1007/s00606-004-0200-9. et al. (2012). Evidence for karyoplasmic homeostasis during Flutre, T., Duprat, E., Feuillet, C., and Quesneville, H. (2011). endoreduplication and a ploidy-dependent increase in gene Considering transposable element diversification in de novo transcription during tomato fruit growth. Development 139:3817– annotation approaches. Plos One 6:e16526, 6ARTN. https://doi.org/ 3826. https://doi.org/10.1242/dev.084053. 10.1371/journal.pone.0016526. Bracht, J.R., Fang, W., Goldman, A.D., Dolzhenko, E., Stein, E.M., and Flynn, J.M., Hubley, R., Goubert, C., Rosen, J., Clark, A.G., Feschotte, Landweber, L.F. (2013). Genomes on the edge: programmed genome C., and Smit, A.F. (2020). RepeatModeler2 for automated genomic instability in ciliates. Cell 152:406–416. https://doi.org/10.1016/j.cell. discovery of transposable element families. Proc. Natl. Acad. Sci. 2013.01.005. U S A 117:9451–9457.. https://doi.org/10.1073/pnas.1921046117. Braz, G.T., do Vale Martins, L., Zhang, T., Albert, P.S., Birchler, J.A., Fu, L.M., Niu, B.F., Zhu, Z.W., Wu, S.T., and Li, W.Z. (2012). CD-HIT: and Jiang, J.M. (2020). A universal chromosome identification accelerated for clustering the next-generation sequencing data. system for maize and wild Zea species. Chromosome Res. Bioinformatics 28:3150–3152. https://doi.org/10.1093/bioinformatics/ 28:183–194. https://doi.org/10.1007/s10577-020-09630-5. bts565. Brown, S.C., Bourge, M., Maunoury, N., Wong, M., Wolfe Bianchi, M., Gurevich, A., Saveliev, V., Vyahhi, N., and Tesler, G. (2013). QUAST: Lepers-Andrzejewski, S., Besse, P., Siljak-Yakovlev, S., Dron, M., quality assessment tool for genome assemblies. Bioinformatics and Satiat-Jeunema^ıtre, B. (2017). DNA remodeling by strict partial 29:1072–1075. https://doi.org/10.1093/bioinformatics/btt086. endoreplication in orchids, an original process in the plant kingdom. Han, Y.H., Zhang, T., Thammapichai, P., Weng, Y.Q., and Jiang, J.M. Genome Biol. Evol. 9:1051–1071. https://doi.org/10.1093/gbe/evx063. (2015). Chromosome-specific painting in Cucumis species using Cai, J., Liu, X., Vanneste, K., Proost, S., Tsai, W.C., Liu, K.W., Chen, bulked oligonucleotides. Genetics 200:771–779. https://doi.org/10. L.J., He, Y., Xu, Q., Bian, C., et al. (2015). The genome sequence of 1534/genetics.115.177642. the orchid Phalaenopsis equestris. Nat. Genet. 47:65–72. https://doi. Hasing, T., Tang, H.B., Brym, M., Khazi, F., Huang, T.F., and org/10.1038/ng.3149. Chambers, A.H. (2020). A phased Vanilla planifolia genome enables Chao, Y.T., Chen, W.C., Chen, C.Y., Ho, H.Y., Yeh, C.H., Kuo, Y.T., Su, genetic improvement of flavour and production. Nat. Food C.L., Yen, S.H., Hsueh, H.Y., Yeh, J.H., et al. (2018). Chromosome- 1:811–819. https://doi.org/10.1038/s43016-020-00197-2. level assembly, genetic and physical mapping of Phalaenopsis Hoede, C., Arnoux, S., Moisset, M., Chaumier, T., Inizan, O., aphrodite genome provides new insights into species adaptation and Jamilloux, V., and Quesneville, H. (2014). PASTEC: an automatic resources for orchid breeding. Plant Biotechnol. J. 16:2027–2041. transposable element classification tool. PLoS One 9:e91929, 9ARTN https://doi.org/10.1111/pbi.12936. e91929. https://doi.org/10.1371/journal.pone.0091929. Cheng, H., Concepcion, G.T., Feng, X., Zhang, H., and Li, H. (2021). Hribová, E., Holus  ek, P., Petrovská, B., Ponert, J., ová, K., Trávnı́c Haplotype-resolved de novo assembly using phased assembly   Simková, H., Kubátová, B., Jersáková, J., Curn, V., Suda, J., et al. graphs with hifiasm. Nat. Methods 18:170–175. https://doi.org/10. (2016). The enigma of progressively partial endoreplication: new 1038/s41592-020-01056-5. insights provided by flow cytometry and next-generation sequencing. Chumová, Z., Záveská, E., Hlousková, P., Ponert, J., Schmidt, P.A., Genome Biol. Evol. 8:1996–2005. https://doi.org/10.1093/gbe/  Certner, M., Mandáková, T., and Trávnı́c  ek, P. (2021). Repeat evw141. proliferation and partial endoreplication jointly shape the patterns of Inada, N., Takahashi, N., and Umeda, M. (2021). Arabidopsis thaliana genome size evolution in orchids. Plant J. Cel. Mol. Biol. subclass I ACTIN DEPOLYMERIZING FACTORs and vegetative 107:511–524. https://doi.org/10.1111/tpj.15306. ACTIN2/8 are novel regulators of endoreplication. J. Plant Res. Claudel-Renard, C., Chevalet, C., Faraut, T., and Kahn, D. (2003). 134:1291–1300. https://doi.org/10.1007/s10265-021-01333-0. Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Jiang, J.M. (2019). Fluorescence in situ hybridization in plants: recent Acids Res. 31:6633–6639. https://doi.org/10.1093/nar/gkg847. developments and future applications. Chromosome Res. De Veylder, L., Larkin, J.C., and Schnittger, A. (2011). Molecular control 27:153–165. https://doi.org/10.1007/s10577-019-09607-z. and function of endoreplication in development and physiology. Trends Jones, P., Binns, D., Chang, H.Y., Fraser, M., Li, W., McAnulla, C., Plant Sci. 16:624–634. https://doi.org/10.1016/j.tplants.2011.07.001. McWilliam, H., Maslen, J., Mitchell, A., Nuka, G., et al. (2014). Dolez , J., Voglmayr, H., and Greilhuber, J. (2003). Letter to el, J., Bartos InterProScan 5: genome-scale protein function classification. the editor. Cytom. Part A 51A:127–128.. https://doi.org/10.1002/cyto. Bioinformatics (Oxford, England) 30:1236–1240. https://doi.org/10. a.10013. 1093/bioinformatics/btu031. Du, K., Stock, M., Kneitz, S., Klopp, C., Woltering, J.M., Adolfi, M.C., Kanehisa, M., Sato, Y., and Morishima, K. (2016). BlastKOALA and Feron, R., Prokopov, D., Makunin, A., Kichigin, I., et al. (2020). The GhostKOALA: KEGG tools for functional characterization of genome sterlet sturgeon genome sequence and the mechanisms of and metagenome sequences. J. Mol. Biol. 428:726–731. https://doi. segmental rediploidization. Nat. Ecol. Evol. 4:841–852. https://doi. org/10.1016/j.jmb.2015.11.006. org/10.1038/s41559-020-1166-x. Kim, D., Paggi, J.M., Park, C., Bennett, C., and Salzberg, S.L. (2019). Emms, D.M., and Kelly, S. (2019). OrthoFinder: phylogenetic orthology Graph-based genome alignment and genotyping with HISAT2 and inference for comparative genomics. Genome Biol. 20, 20Artn. HISAT-genotype. Nat. Biotechnol. 37:907–915. https://doi.org/10. https://doi.org/10.1186/S13059-019-1832-Y. 1038/s41587-019-0201-4. Favre, F., Jourda, C., Grisoni, M., Piet, Q., Rivallan, R., Dijoux, J.B., Lang, L., and Schnittger, A. (2020). Endoreplication - a means to an end Hascoat, J., Lepers-Andrzejewski, S., Besse, P., and Charron, C. in cell growth and stress response. Curr. Opin. Plant Biol. 54:85–92. (2022). A genome-wide assessment of the genetic diversity, evolution https://doi.org/10.1016/j.pbi.2020.02.006. 14 Plant Communications 3, 100330, September 12 2022 ª 2022 The Author(s). Partial endoreplication in Vanilla planifolia challenges genome assembly Plant Communications Lee, H.O., Davidson, J.M., and Duronio, R.J. (2009). Endoreplication: Shimotohno, A., Aki, S.S., Takahashi, N., and Umeda, M. (2021). polyploidy with purpose. Genes Dev. 23:2461–2477. https://doi.org/ Regulation of the plant cell cycle in response to hormones and the 10.1101/gad.1829209. environment. Annu. Rev. Plant Biol. 72:273–296. https://doi.org/10. Lepers-Andrzejewski, S., Siljak-Yakovlev, S., Brown, S.C., Wong, M., 1146/annurev-arplant-080720-103739. and Dron, M. (2011). Diversity and dynamics of plant genome size: an Simao, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., and example of polysomaty from a cytogenetic study of Tahitian vanilla Zdobnov, E.M. (2015). BUSCO: assessing genome assembly and (Vanilla 3tahitensis, Orchidaceae). Am. J. Bot. 98:986–997. https:// annotation completeness with single-copy orthologs. Bioinformatics doi.org/10.3732/ajb.1000415. 31:3210–3212.. https://doi.org/10.1093/bioinformatics/btv351. Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Stein, J.C., Yu, Y., Copetti, D., Zwickl, D.J., Zhang, L., Zhang, C., Bioinformatics 34:3094–3100. https://doi.org/10.1093/bioinformatics/ Chougule, K., Gao, D., Iwata, A., Goicoechea, J.L., et al. (2018). bty191. Genomes of 13 domesticated and wild rice relatives highlight genetic Lilly, M.A., and Spradling, A.C. (1996). The Drosophila endocycle is conservation, turnover and innovation across the genus Oryza. Nat. controlled by Cyclin E and lacks a checkpoint ensuring S-phase Genet. 50:285–296. https://doi.org/10.1038/s41588-018-0040-0. completion. Genes Dev. 10:2514–2526. https://doi.org/10.1101/gad. 10.19.2514. Tarailo-Graovac, M., and Chen, N. (2009). Using RepeatMasker to iden- tify repetitive elements in genomic sequences. In Current Protocols in Mapleson, D., Garcia Accinelli, G., Kettleborough, G., Wright, J., and Bioinformatics, A.D. Baxevanis, ed. Clavijo, B.J. (2017). KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics 33, btw663– Trávnı́c   ek, P., Certner, M., Ponert, J., Chumová, Z., Jersáková, J., and 576. https://doi.org/10.1093/bioinformatics/btw663. Suda, J. (2019). Diversity in genome size and GC content shows adaptive potential in orchids and is closely linked to partial Marcais, G., and Kingsford, C. (2011). A fast, lock-free approach for endoreplication, plant life-history traits and climatic conditions. New efficient parallel counting of occurrences of k-mers. Bioinformatics Phytol. 224:1642–1656. https://doi.org/10.1111/nph.15996. 27:764–770. https://doi.org/10.1093/bioinformatics/btr011. do Vale Martins, L., Yu, F., Zhao, H.N., Dennison, T., Lauter, N., Wang,  ek, P., Ponert, J., Urfus, T., Jersáková, J., Vrána, J., Hribová, Trávnı́c H.Y., Deng, Z.H., Thompson, A., Semrau, K., Rouillard, J.M., et al. E., Dolez el, J., and Suda, J. (2015). Challenges of flow-cytometric (2019). Meiotic crossovers characterized by haplotype-specific estimation of nuclear genome size in orchids, a plant group with both chromosome painting in maize. Nat. Commun. 10, 10Artn. https:// whole-genome and progressively partial endoreplication. Cytometry doi.org/10.1038/s41467-019-12646-z. Part A. Journal Int. Soc. Anal. Cytol. 87:958–966. https://doi.org/10. 1002/cyto.a.22681. Matthews, B.J., Dudchenko, O., Kingan, S.B., Koren, S., Antoshechkin, I., Crawford, J.E., Glassford, W.J., Herre, M., Vaattovaara, A., Leppa€ la € , J., Saloja € rvi, J., and Wrzaczek, M. (2019). Redmond, S.N., Rose, N.H., et al. (2018). Improved reference High-throughput sequencing data and the impact of plant gene genome of Aedes aegypti informs arbovirus vector control. Nature annotation quality. J. Exp. Bot. 70:1069–1076. https://doi.org/10. 563:501–507. https://doi.org/10.1038/s41586-018-0692-z. 1093/jxb/ery434. Munden, A., Rong, Z., Sun, A., Gangula, R., Mallal, S., and Nordman, Vasimuddin, M., Misra, S., Li, H., and Aluru, S. (2019). Efficient J.T. (2018). Rif1 inhibits replication fork progression and controls architecture-aware acceleration of BWA-MEM for multicore systems. DNA copy number in Drosophila. eLife 7:e39140.. https://doi.org/10. Int. Parall Distrib P, 314–324. https://doi.org/10.1109/Ipdps.2019. 7554/eLife.39140. 00041. Nair, R.R., and Ravindran, P.N. (1994). Somatic association of Wicker, T., Sabot, F., Hua-Van, A., Bennetzen, J.L., Capy, P., chromosomes and other mitotic abnormalities in Vanilla planifolia Chalhoub, B., Flavell, A., Leroy, P., Morgante, M., Panaud, O., (andrews). Caryologia 47:65–73. https://doi.org/10.1080/00087114. et al. (2007). A unified classification system for eukaryotic 1994.10797284. transposable elements. Nat. Rev. Genet. 8:973–982, nrg2165 [pii]. Perez-Silva, A., Odoux, E., Brat, P., Ribeyre, F., Rodriguez-Jimenes, https://doi.org/10.1038/nrg2165. G., Robles-Olvera, V., Garcia-Alvarado, M.A., and Gunata, Z. (2006). GC-MS and GC-olfactometry analysis of aroma compounds Yen, E.C., McCarthy, S.A., Galarza, J.A., Generalovic, T.N., Pelan, S., in a representative organic aroma extract from cured vanilla (Vanilla Nguyen, P., Meier, J.I., Warren, I.A., Mappes, J., Durbin, R., et al. planifolia G. Jackson) beans. Food Chem. 99:728–735. https://doi. (2020). A haplotype-resolved, de novo genome assembly for the org/10.1016/j.foodchem.2005.08.050. wood tiger moth (Arctia plantaginis) through trio binning. GigaScience 9:giaa088, ARTN. https://doi.org/10.1093/gigascience/ Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities giaa088. for comparing genomic features. Bioinformatics 26:841–842.. https:// doi.org/10.1093/bioinformatics/btq033. Younis, A., Ramzan, F., Hwang, Y.J., and Lim, K.B. (2015). FISH and GISH: molecular cytogenetic tools and their applications in Ravindran, P.N. (1979). Nuclear behavior in the sterile pollen of Vanilla ornamental plants. Plant Cell Rep. 34:1477–1488. https://doi.org/10. planifolia (andrews). Cytologia 44:391–396. https://doi.org/10.1508/ 1007/s00299-015-1828-3. cytologia.44.391. Sallet, E., Gouzy, J., and Schiex, T. (2019). EuGene: an automated Zhang, T., Liu, G.Q., Zhao, H.N., Braz, G.T., and Jiang, J.M. (2021a). integrative gene finder for eukaryotes and prokaryotes. Gene Chorus2: design of genome-scale oligonucleotide-based probes for Prediction: Methods Protoc. 1962:97–120. https://doi.org/10.1007/ fluorescence in situ hybridization. Plant Biotechnol. J. 19:1967–1978. 978-1-4939-9173-0_6. https://doi.org/10.1111/pbi.13610. Sellis, D., Guérin, F., Arnaiz, O., Pett, W., Lerat, E., Boggetto, N., Zhang, Y.X., Zhang, G.Q., Zhang, D.Y., Liu, X.D., Xu, X.Y., Sun, W.H., Krenek, S., Berendonk, T., Couloux, A., Aury, J.M., et al. (2021). Yu, X., Zhu, X.E., Wang, Z.W., Zhao, X., et al. (2021b). Massive colonization of protein-coding exons by selfish genetic Chromosome-scale assembly of the Dendrobium chrysotoxum elements in Paramecium germline genomes. PLoS Biol. genome enhances the understanding of orchid evolution. Hortic. 19:e3001309. https://doi.org/10.1371/journal.pbio.3001309. Res-england 8, 8 artn. https://doi.org/10.1038/s41438-021-00621-z. Plant Communications 3, 100330, September 12 2022 ª 2022 The Author(s). 15

Use Quizgecko on...
Browser
Browser