🎧 New: AI-Generated Podcasts Turn your study notes into engaging audio conversations. Learn more

The genome sequences of Arachis duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut.pdf

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Full Transcript

Articles OPEN...

Articles OPEN The genome sequences of Arachis duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut David John Bertioli1,2,27, Steven B Cannon3,27, Lutz Froenicke4,5,27, Guodong Huang6,27, Andrew D Farmer7, Ethalinda K S Cannon8, Xin Liu6, Dongying Gao2, Josh Clevenger9, Sudhansu Dash7, Longhui Ren10, Márcio C Moretzsohn11, Kenta Shirasawa12, Wei Huang13, Bruna Vidigal1,11, Brian Abernathy2, Ye Chu14, Chad E Niederhuth15, Pooja Umale7, Ana Cláudia G Araújo11, Alexander Kozik4, Kyung Do Kim2, Mark D Burow16,17, Rajeev K Varshney18, Xingjun Wang19, Xinyou Zhang20, Noelle Barkley21,22, © 2016 Nature America, Inc. All rights reserved. Patrícia M Guimarães11, Sachiko Isobe12, Baozhu Guo23, Boshou Liao24, H Thomas Stalker25, Robert J Schmitz15, Brian E Scheffler26, Soraya C M Leal-Bertioli2,11, Xu Xun6, Scott A Jackson2, Richard Michelmore4,5 & Peggy Ozias-Akins9,14 Cultivated peanut (Arachis hypogaea) is an allotetraploid with closely related subgenomes of a total size of ~2.7 Gb. This makes the assembly of chromosomal pseudomolecules very challenging. As a foundation to understanding the genome of cultivated peanut, we report the genome sequences of its diploid ancestors (Arachis duranensis and Arachis ipaensis). We show that these genomes are similar to cultivated peanut’s A and B subgenomes and use them to identify candidate disease resistance genes, to guide tetraploid transcript assemblies and to detect genetic exchange between cultivated peanut’s subgenomes. On the basis of remarkably high DNA identity of the A. ipaensis genome and the B subgenome of cultivated peanut and biogeographic evidence, we conclude that A. ipaensis may be a direct descendant of the same population that contributed the B subgenome to cultivated peanut. Peanut (also called groundnut; A. hypogaea L.) is a grain legume The peanut subgenomes are closely related5,12. This, together with and oilseed, which is widely cultivated in tropical and subtropical a total genome size of ~2.7 Gb and an estimated repetitive content regions (annual production of ~46 million tons). It has a key role in of 64% (ref. 13), makes the assembly of the peanut genome sequence human nutrition. In Africa and Asia, more peanut is grown than any very challenging. However, the A and B subgenomes appear to have other grain legume (including soy bean) (FAOSTAT 2015; see URLs). undergone relatively few changes since polyploidization: genomic The Arachis genus is endemic to South America and is composed in situ hybridization (GISH), using genomic DNA from the diploid mostly of diploid species (2n = 2x = 20). A. hypogaea is an allotetra- species as probes, clearly distinguishes A and B chromosomes npg ploid (AABB-type genome; 2n = 4x = 40), probably derived from a and does not show large A-B mosaics7,8. Also, the genome size of single recent hybridization event between two diploid species and A. hypogaea is close to the sum of those for A. duranensis and polyploidization1–6. Chromosomes are of mostly similar size and are A. ipaensis (1.25 and 1.56 Gb, respectively14), indicating that there metacentric, but strong chromosomal centromeric banding and one has been no large change in genome size since polyploidy. Most nota- pair of small chromosomes distinguish the A from the B subgenome. bly, observations of progeny derived from crosses between cultivated Cytogenetic, phylogeographic and molecular evidence indicate peanut and an artificially induced allotetraploid A. ipaensis K30076 A. duranensis Krapov. & W.C. Greg. and A. ipaensis Krapov. & W.C. × A. duranensis V14167 (2n = 4x = 40)15 strongly support the close Greg. as the donors of the A and B subgenomes, respectively3,5,7–11. relationships between the diploid genomes and the corresponding 1Institute of Biological Sciences, University of Brasília, Brasília, Brazil. 2Center for Applied Genetic Technologies, University of Georgia, Athens, Georgia, USA. 3Corn Insects and Crop Genetics Research Unit, US Department of Agriculture–Agricultural Research Service, Ames, Iowa, USA. 4Genome Center, University of California, Davis, Davis, California, USA. 5Department of Plant Sciences, University of California, Davis, Davis, California, USA. 6BGI-Shenzhen, Shenzhen, China. 7National Center for Genome Resources, Santa Fe, New Mexico, USA. 8Department of Computer Science, Iowa State University, Ames, Iowa, USA. 9Institute of Plant Breeding, Genetics and Genomics, University of Georgia, Tifton, Georgia, USA. 10Interdepartmental Genetics Graduate Program, Iowa State University, Ames, Iowa, USA. 11Embrapa Genetic Resources and Biotechnology, Brasília, Brazil. 12Kazusa DNA Research Institute, Department of Frontier Research, Kisarazu, Japan. 13Department of Agronomy, Iowa State University, Ames, Iowa, USA. 14Department of Horticulture, University of Georgia, Tifton, Georgia, USA. 15Department of Genetics, University of Georgia, Athens, Georgia, USA. 16Texas A&M AgriLife Research, Lubbock, Texas, USA. 17Department of Plant and Soil Science, Texas Tech University, Lubbock, Texas, USA. 18International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India. 19Shandong Academy of Agricultural Sciences, Biotechnology Research Center, Jinan, China. 20Henan Academy of Agricultural Sciences, Zhengzhou, China. 21Plant Genetic Resources Conservation Unit, US Department of Agriculture–Agricultural Research Service, Griffin, Georgia, USA. 22International Potato Center, Lima, Peru. 23Crop Protection and Management Research Unit, US Department of Agriculture–Agricultural Research Service, Tifton, Georgia, USA. 24Chinese Academy of Agricultural Sciences, Oil Crops Research Institute, Wuhan, China. 25Department of Crop Science, North Carolina State University, Raleigh, North Carolina, USA. 26Middle Southern Area Genomics Laboratory, US Department of Agriculture–Agricultural Research Service, Stoneville, Mississippi, USA. 27These authors contributed equally to this work. Correspondence should be addressed to D.J.B. ([email protected]). Received 30 July 2015; accepted 29 January 2016; published online 22 February 2016; doi:10.1038/ng.3517 438 VOLUME 48 | NUMBER 4 | APRIL 2016 Nature Genetics Articles subgenomes of A. hypogaea. Progeny are vigorous, phenotypically and were numbered according to previously published linkage normal and fertile and showed lower segregation distortion16,17 than maps17,19,23,24. They represent 82% and 86% of the genomes, respec- has been observed for some populations derived from A. hypogaea tively, when considering genome size estimates based on flow cytom- intraspecific crosses18–21. Therefore, as a first step to characterizing etry14,25, or 95% and 98% of the genomes when using estimates derived the genome of cultivated peanut, we sequenced and analyzed the from k-mer frequencies with k = 17 (Supplementary Figs. 13 and 14). genomes of the two diploid ancestors of cultivated peanut. Comparisons of the chromosomal pseudomolecules with 14 BAC sequences from A. duranensis and 6 BAC sequences from A. ipaensis RESULTS showed collinearity of contigs and high sequence identity (≥99%) Sequencing and assembly of the diploid A and B genomes (Supplementary Fig. 15a–l and Supplementary Table 9). Considering that A. duranensis V14167 and A. ipaensis K30076 are likely good representatives of the ancestral species of A. hypogaea, we Characterization of transposons sequenced their genomes. After filtering, the data generated from the We identified transposable elements that contributed 61.7% and seven paired-end libraries corresponded to an estimated 154× and 68.5% of the A. duranensis and A. ipaensis genomes, respectively 163× base-pair coverage for A. duranensis and A. ipaensis, respectively (Supplementary Tables 10 and 11; PeanutBase). These values are com- (Supplementary Tables 1–6). The total assembly sizes were 1,211 and patible with the 64% repetitive content estimated for cultivated peanut 1,512 Mb for A. duranensis and A. ipaensis, respectively, of which 1,081 using renaturation kinetics13. Most transposon families were shared and 1,371 Mb were represented in scaffolds of 10 kb or greater in size by the two genomes, and, for abundant families, macroscale position- (Supplementary Table 7). Ultradense genetic maps were generated ing in the two genomes seemed similar. However, because of trans- through genotyping by sequencing (GBS) of two diploid recombinant poson activity since the divergence of the two genomes, microscale inbred line (RIL) populations (Supplementary Data Set 1). SNPs within positioning and relative abundance differed (data not shown). A few scaffolds were used to validate the assemblies and confirmed their high Ty3-gypsy and non-autonomous retrotransposon families were very © 2016 Nature America, Inc. All rights reserved. quality; 190 of 1,297 initial scaffolds of A. duranensis and 49 of 353 ini- abundant, forming dense accumulations in pericentromeres (Fig. 1 tial scaffolds of A. ipaensis were identified as chimeric, on the basis of and Supplementary Figs. 16 and 17). These included the previ- the presence of diagnostic population-wide switches in genotype calls ously described autonomous/non-autonomous pairs FIDEL/Feral occurring at the point of misjoin. Chimeric scaffolds were split, and their and Pipoka/Pipa, the non-autonomous Gordo26,27, and the newly components were remapped. Thus, approximate chromosomal place- observed Apolo and Polo. Overall, long terminal repeat (LTR) retro- ments were obtained for 1,692 and 459 genetically verified scaffolds, transposons comprised more than half of each genome. In contrast, respectively. Conventional molecular marker maps (Supplementary DNA transposons constituted about 10%. Notably, 7.8% and 11.7% of Data Set 2) and syntenic inferences were then used to refine the order- the genomes could be attributed to long interspersed nuclear elements ing of scaffolds within the initial genetic bins. Generally, agreement (LINEs) for A. duranensis and A. ipaensis, respectively. These are the was good for maps in euchromatic arms and poorer in pericentromeric highest coverages for LINEs thus far reported for plant genomes. regions (although one map22 showed large inversions in two linkage groups in comparison to the other maps; Supplementary Data Set 2). Gene annotation and analysis of gene duplications Overall, 96.0% and 99.2% of the sequence in contigs ≥10,000 bp in Transcript assemblies were constructed using sequences expressed length, represented by 1,692 and 459 scaffolds, could be ordered into in diverse tissues of A. duranensis V14167, A. ipaensis K30076 and 10 chromosomal pseudomolecules per genome of 1,025 and 1,338 Mb for A. duranensis and B01 30 A. ipaensis, respectively (Aradu.A01–Aradu. 20 10 A10 and Araip.B01–Araip.B10; GenBank, npg assembly accessions GCA_000817695.1 and GCA_000816755.1; Supplementary Table 8). The pseudomolecules mostly showed one-to- one equivalence between the A and B genomes (Figs. 1 and 2, and Supplementary Figs. 1–12) Figure 1 Structural overview and comparison of chromosomal pseudomolecules A01 and A01 B01. The distributions of genes and mobile elements are represented as stacked areas. High frequency of genetic recombination (represented by red on a white-red heat scale) is confined to distal regions. In the dot-plot comparison, note how inverted chromosome regions form arcs, indicating that, over the evolutionary time since the divergence of the two species, accumulation of DNA has been greater in more central regions of the 30 20 10 10 20 30 40 50 60 70 10 chromosomes and elimination of DNA has been 20 30 more frequent in distal regions. Genes, DNA 40 DNA TE LINE transposable elements (TEs) and Ty1-copia 50 60 Gene elements are more frequent in more distal 70 Ty1-copia regions. Ty3-gypsy elements are more frequent Ty3-gypsy and in pericentromeric regions. non-autonomous Nature Genetics VOLUME 48 | NUMBER 4 | APRIL 2016 439 Articles A01 Figure 2 Circos diagram depicting B01 A02 125 100 the relationships of the chromosomal 25 50 100 75 75 0 50 2 B0 25 pseudomolecules of A. duranensis and 25 100 0 50 0 75 75 A. ipaensis. Blue color represents the density A0 50 0 3 25 25 of genes, and brown color represents the 50 12 0 density of Ty3-gypsy elements and non- 75 5 10 3 0 B0 75 10 0 autonomous LTR retrotransposons. The 5 50 12 scale for the gray bars is in megabases. 25 0 25 A0 12 0 50 4 5 A. hypogaea cv. Tifrunner28 (16,439,433, 10 75 0 0 10 B04 21,406,315 and 2,064,268,316 paired-end 75 50 0 reads for each species, respectively; details 25 25 below and in Supplementary Tables 12 A05 0 50 75 and 13). Using these assemblies and repre- 125 100 sentative characterized transposon sequences, 100 0 B05 MAKER2 (ref. 29) delineated 36,734 and 75 25 50 50 41,840 high-quality non–transposable ele- A06 25 75 ment genes for A. duranensis and A. ipaensis, 0 100 respectively (PeanutBase). The elevated gene 125 0 numbers in A. ipaensis appear to originate 100 25 75 50 A07 from more local duplications, which can be B06 50 75 0 seen in counts of genomically ‘close’ paralo- 25 © 2016 Nature America, Inc. All rights reserved. 0 25 gous genes. Considering similar genes within 12 5 0 8 A0 0 a ten-gene window, there were 25% more in 10 25 75 50 A. ipaensis than in A. duranensis (7,825 ver- 50 75 25 B0 sus 6,241). Gene families known to occur 9 10 12 0 7 A0 10 5 0 0 in clusters such as those encoding NB-ARC, 0 25 75 50 50 leucine-rich repeat (LRR), pentatricopeptide- 75 25 100 0 0 B0 0 25 125 8 A1 repeat, kinase, WD40-repeat and kinesin pro- 50 100 75 75 100 125 50 0 teins had large differential counts between B09 25 B10 the two genomes. These differences were also apparent with wider inspection. In a set of 9,236 gene families of Arachis stenosperma V10309 Krapov. & W.C. Greg. (ref. 34) resides with members in A. ipaensis or A. duranensis, or both, 2,879 fami- in a cluster of 38 NB-LRR–encoding genes covering 6.1 Mb. Another lies had more members in A. ipaensis, 1,983 had more members in source of nematode resistance already widely used in the United States A. duranensis and 4,374 had the same number of members in both originates from an introgression of the A-genome species Arachis species (Supplementary Data Sets 3–5). cardenasii Krapov. & W.C. Greg. (ref. 35). This segment resides in the upper distal 7.6 Mb of chromosome A09 (ref. 36) and contains DNA methylation many NB-LRR–encoding genes. A major QTL conferring reduction in Analysis of DNA methylation by whole-genome bisulfite sequenc- lesion number, size and sporulation of rust was identified in Arachis ing using MethylC-seq30 generated 189,653,337 and 277,101,705 magna K30097 Krapov., W.C. Greg. & C.E. Simpson37. The closest npg uniquely aligned reads, giving ~8.6× and 10.0× coverage per strand for linked marker (Ah280; Araip.B08, 126,645,511) maps close to an A. duranensis and A. ipaensis, respectively. Genome-wide methylation NB-LRR–encoding gene (Araip.RV63R). Another QTL for rust resist- per cytosine content31 was similar for A. duranensis and A. ipaensis, ance has previously been identified in peanut varieties that have the with 73% and 75% methylation at CG sites, 57% and 60% methylation wild A-genome species A. cardenasii in their pedigree38. Markers at CHG sites (where H is an A, T or C), and 8% and 6% methylation mapped this QTL to an introgressed chromosome segment at the at CHH sites, respectively. The genic methylation patterns were lower end of A03 (Aradu.A03, 131,305,113–133,690,542) where an typical for plants and provide independent verification of gene NB-LRR–encoding gene resides in A. duranensis (Aradu.Z87JB). annotation32,33 (Supplementary Figs. 18 and 19; Gene Expression The genes harbored on these genome segments from A. stenosperma, Omnibus (GEO), GSE71357). A. magna and A. cardenasii provide good pest and disease resistance and warrant further investigation. Disease resistances and NB-LRR–encoding genes Nucleotide-binding–leucine-rich repeat (NB-LRR)-encoding genes are Gene evolution in A. ipaensis and A. duranensis and of particular interest because they confer resistance against pests and species divergence diseases. We identified 345 and 397 of these genes in the A. duranensis Analyses suggest that the Arachis lineages have been accumulating and A. ipaensis genotypes, respectively (Supplementary Data mutations relatively quickly since the divergence of the Dalbergioid Set 6). The largest clusters were on distal regions of chromosomal clade ~58 million years ago. Modal KS paralog values (synonymous sub- pseudomolecule 02, the lower arms of chromosomal pseudomol- stitutions per synonymous site) are approximately 0.95 for A. ipaensis ecule 04 and the upper arms of chromosomal pseudomolecule 09 and 0.90 for A. duranensis, more similar to that for Medicago (paralo- (Supplementary Fig. 20). The genome assemblies allowed us to associ- gous Ks value of ~0.95) than to those of Lotus (~0.65), Glycine (~0.65) or ate quantitative trait loci (QTLs) with candidate genes (Supplementary Phaseolus (~0.80). Average rates of change for Arachis genes were esti- Data Set 7). A strong, consistent QTL for resistance to root-knot mated at 8.12 × 10−9 KS/year. On the basis of this and the peak in the fre- nematode (Meloidogyne arenaria (Neal.) Chitwood) identified on A02 quency of KS values between A. duranensis and A. ipaensis being 0.035, 440 VOLUME 48 | NUMBER 4 | APRIL 2016 Nature Genetics Articles Figure 3 Mutations and genome duplications. Frequency distributions 0.30 Ad-Ai max Ks = 0.035 are shown of values of synonymous substitutions (KS) for paralogous and Ad-Ad orthologous genes in comparisons of A. duranensis (Ad), A. ipaensis (Ai) Ai-Ai 0.25 and Glycine max (Gm). Peaks in the G. max–G. max comparison represent Ad-Ai the Glycine whole-genome duplication (WGD) at KS = 0.10 (~10 million Gm-Gm max Proportion of Ks counts in bin Ks = 0.1 years ago) and the early papilionoid WGD at KS = 0.65 (58 million years 0.20 Gm-Gm ago). The same early papilionoid WGD also affected the Arachis lineage, Ai-Gm so the shift in the A. duranensis–A. duranensis and A. ipaensis–A. ipaensis Ad-Gm peaks (at KS = 0.90 and 0.95, respectively) indicates that Arachis has 0.15 accumulated silent changes at a rate ~1.4 times faster than that in G. max. On the basis of average rates of change for Arachis of 8.12 × 10−9 KS/year, we estimate that A. duranensis and A. ipaensis diverged ~2.16 0.10 million years ago. 0.05 the divergence of the two species was estimated as occurring ~2.16 million years ago (Fig. 3 and Supplementary Figs. 21 and 22). 0 0 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 Analysis of chromosomal structure and synteny Ks In accordance with cytogenetic observations9,10, most pseudomolecules had symmetrically positioned pericentromeres. Most pseudomole- From these observations, we concluded that the major rearrange- cules showed a one-to-one correspondence between the two species: ments have all occurred in the A-genome lineage. Size differences pairs 02, 03, 04 and 10 were collinear; pairs 05, 06 and 09 were each between homeologous chromosomes that were differentiated by large © 2016 Nature America, Inc. All rights reserved. differentiated by a large inversion in one arm of one of the pseudo- rearrangements tended to be greater than those between collinear molecules; and the pseudomolecules in pair 01 were differentiated ones (r(6) = 0.65, P < 0.05; Supplementary Table 14). Because the by large inversions of both arms (Figs. 1 and 2, and Supplementary A. duranensis chromosomes that have undergone inversions are Figs. 1–12). In contrast, chromosomes 07 and 08 have undergone smaller than expected, it is evident that, in this dynamic, on balance, complex rearrangements that transported repeat-rich DNA to A07 the elimination of DNA has predominated over its accumulation. and gene-rich DNA to A08. As a result, A07 has only one normal (upper) euchromatic arm and A08 is abnormally small, with low A07 B07 repetitive content (Fig. 4 and Supplementary Table 11). In accord- ance with cytogenetic observations8,26, A08 could be assigned as the characteristic small ‘A chromosome’ (cytogenetic chromosome A09; Supplementary Fig. 23). All A. ipaensis pseudomolecules were larger than their A. duranensis counterparts (Supplementary Table 8). This is partly because of a greater frequency of local duplications and higher transposon content in A. ipaensis. In dot plots of collinear chromosomes, slopes formed by orthologous genes were similar in both euchromatic and peri- centromeric regions, with A. duranensis regions being ~80–90% the length of the corresponding regions in A. ipaensis (Supplementary npg Figs. 2–4 and 12). In contrast, in the dot plots, chromosomal regions differentiated by inversions showed distinct arcs (Fig. 1 and Supplementary Figs. 1, 5, 6 and 11). These arcs are due to changes in rates of DNA loss and gain39,40 in regions that switch from distal to pericentromeric contexts, or vice versa, when inverted (Fig. 5). In chromosomes without inversions, there were characteristic density gradients for genes, repetitive DNA and methylation (with gene densities increasing and densities of repetitive DNA and methyla- tion decreasing toward chromosome ends). However, in regions that had undergone large rearrangements, in A. duranensis, these gra- dients were disrupted (Supplementary Figs. 16, 17 and 24–27). A08 Figure 4 Schematic showing the rearrangements between chromosomes 7 and 8. These rearrangements gave rise to the small, repeat-poor chromosome, represented by pseudomolecule Aradu.A08 (equivalent to cytogenetic A09), which is characteristic of A genomes, and Aradu.A07, which has only one normal euchromatic arm (the upper one). Syntenous chromosomal segments are represented by blocks of the same color. The Ty3-gypsy and non-autonomous retroelement distributions are represented in gray. Note the low repetitive content of Aradu.A08 and the ‘knob’ of repeat-rich DNA in the upper distal region. This unusual composition seems likely to account for the distinct chromatin condensation of this chromosome pair (Supplementary Fig. 23). B08 Nature Genetics VOLUME 48 | NUMBER 4 | APRIL 2016 441 Articles Figure 5 Model for the formation of the arcs in dot plots of genome a b regions that have been inverted since the divergence of the A and B Net genomes. Gene densities are shown in gray. (a) The inversion transports Inversion transposon repeat-rich, gene-poor DNA to the distal chromosomal region and repeat- loss A. duranensis poor, gene-rich DNA to the more central region. (b) In the distal region, progenitor the inverted segment then loses DNA by recombination-driven deletion, gain and the more central region gains DNA. (c) Thus, the characteristic arc and atypical gene, repetitive DNA and methylation density patterns are formed. The presence of these atypical patterns indicates that all major genome rearrangements occurred in the A-genome lineage (Supplementary Figs. 16, 24 and 26). (d) An example dot plot comparing A. ipaensis progenitor A05 and B05 that shows the characteristic arc. c d Comparisons with Phaseolus vulgaris L., which shared a common ancestor with Arachis about 58 million years ago, showed syntenous A. duranensis A05 chromosomal segments. In some cases, although the dot plots were highly distorted, there was almost a one-to-one correspondence between chromosomes (for example, B01 and Pv03, B05 and Pv02, B06 and Pv01, and B08 and Pv05; Supplementary Figs. 28–31). A. ipaensis B05 Sequence comparisons to tetraploid cultivated peanut Comparisons showed fundamentally one-to-one correspondences © 2016 Nature America, Inc. All rights reserved. between the diploid chromosomal pseudomolecules and cultivated sequences than the B-genome chromosomes (Fig. 6, Supplementary peanut linkage groups. Of the marker sequences from three maps21,41, Fig. 32a–t and Supplementary Data Set 9). 83%, 83% and 94% were assigned by sequence similarity searches to We found distinct signals of genetic recombination between the the expected diploid chromosomal pseudomolecules (Supplementary A and B subgenomes of A. hypogaea, and, as expected, these signals Table 15a–c and Supplementary Data Set 8). For more detailed were more frequent in regions of the homeologous chromosome genome-wide comparisons, we produced 5.74 Gb (2× coverage) of pairs that were collinear. This recombination erodes the similarities long-sequence Moleculo reads from A. hypogaea cv. Tifrunner and between the tetraploid subgenomes and their corresponding dip- mapped the reads to the combined diploid pseudomolecules. The cor- loid genomes. We observed a significant tendency for A. hypogaea rected median identities between the A. hypogaea Moleculo reads and Moleculo reads that mapped to collinear A-genome pseudomol- the pseudomolecules of A. duranensis and A. ipaensis were 98.36% ecules to have, on average, lower sequence identity than reads that and 99.96%, respectively (Supplementary Data Set 6). When visual- mapped to pseudomolecules with inversions (Kruskal-Wallis test, ized as plots along the chromosomal pseudomolecules, the diploid P < 0.0001; Supplementary Tables 16 and 17, and Supplementary A-genome chromosomes were distinctly less similar to A. hypogaea Data Set 9). This trend was much weaker for the B subgenome on a Figure 6 Example graphs comparing DNA a 100 A. hypogaea reads mapped onto A. duranensis A05 sequences from cultivated peanut with 4 Normalized relative mapping density chromosomal pseudomolecules of Identity of mapped reads (%) A. duranensis and A. ipaensis. (a,b) Graphs 99 npg 3 show mapping of Moleculo DNA sequence reads from the tetraploid A. hypogaea 98 cv. Tifrunner along the diploid chromosomal 2 pseudomolecules Aradu.A05 (a) and 97 Araip.B05 (b). Dark blue dots represent percentage identity of reads in tiling paths, 1 96 and red dots represent density of Moleculo bases mapping in windows of 0.5 Mb (normalized to a value of 1 for the expected 95 0 20 40 60 80 100 (Mb) number). Note how the percentage identities of mapped reads for Aradu.A05 are, contrary b A. hypogaea reads mapped onto A. ipaensis B05 to expectation, more consistent in the 100 4 Normalized relative mapping density pericentromeric regions than in distal ones. This may reflect that sequence similarity Identity of mapped reads (%) 99 between A. duranensis and the A subgenome 3 of A. hypogaea has been eroded by 98 recombination between the A and B 2 subgenomes in the tetraploid by, for example, 97 gene conversion. In contrast, mapping on Araip.B05 is much more consistent and 1 generally very high identity, except for the 96 upper distal 6.1 Mb, where identities fall dramatically (blue arrow). Also note how 95 0 deviations in expected mapping density 20 40 60 80 100 120 140 (Mb) (indicated by red arrows) show that this region, in the tetraploid genome of A. hypogaea cv. Tifrunner, has undergone tetrasomic recombination and has changed from the expected genome formula of AABB to AAAA. 442 VOLUME 48 | NUMBER 4 | APRIL 2016 Nature Genetics Articles Figure 7 Identification of genetic exchange 2 RIL028 between subgenomes in cultivated peanut. The top graph depicts the result of recombination between A04 and B04 in RIL028, and, for 1 log2 (RIL028/Parent01) comparison, the bottom graph shows RIL025, Mapping density a typical line where this type of recombination has not occurred (lines described in Zhou 0 et al.41). The y axis shows log2-transformed ratios of densities of mapping for restriction site–associated sequence reads along the –1 diploid chromosomal pseudomolecules divided by the mapping densities of a parental line. The x axis shows the positions of mapping, –2 20 40 60 80 100 120 140 in 1-Mb windows, along Araip.B04 (red Position in chromosomal pseudomolecules dots) and Aradu.A04 (blue dots), the latter A. ipaensis B04 (Mb) and A. duranensis A04 (×1.08/Mb) with distances scaled so the homeologous chromosomal pseudomolecules are directly 2 RIL025 - a typical line comparable. In RIL028, the relative dosage of the subgenomes has changed greatly in the lower chromosome arms. This indicates that a 1 log2 (RIL025/Parent01) new event of genetic exchange between the Mapping density A and B subgenomes occurred. 0 whole-chromosome scale but was clearly © 2016 Nature America, Inc. All rights reserved. visible at the ends of some of the collinear –1 B-subgenome chromosome arms, where per- centage identities dropped dramatically. An example of this is indicated by the blue arrow –2 in Figure 6b. In this case, the A. hypogaea B 20 40 60 80 100 120 140 subgenome has become nullisomic and the A Position in chromosomal pseudomolecules A. ipaensis B04 (Mb) and A. duranensis A04 (×1.08/Mb) subgenome has become tetrasomic (produc- ing a genome composition of AAAA instead of the expected AABB). A subgenome of A. hypogaea as ~247,000 years and for the diver- This was confirmed by an inverse symmetry in the total number gence time of A. ipaensis from the B subgenome of A. hypogaea as of bases mapping to the A and B genomes (Fig. 6a,b, red arrows). a remarkably recent ~9,400 years. This phenomenon, a degree of tetrasomic genetic behavior, has been We used the chromosomal pseudomolecules to investigate the observed in the progeny of interploidy crosses involving wild spe- frequency of recombination between A and B subgenomes in 166 cies42 and recently in the cultivated × induced allotetraploid RILs cultivated peanut RILs described in a previous study41. To do this, used in this study43, but this is the first time, to our knowledge, that we calculated the mapping densities of restriction site–associated it has been observed in pure cultivated peanut. The event depicted in sequence reads from these RILs and their parental lines along the Figure 6 affects approximately the top 6 Mb, about half the euchro- chromosomal pseudomolecules. Mostly, the relative dosage of matic arms of A. hypogaea cv. Tifrunner chromosomes 05. Smaller mapping on the A and B genomes was equal and the same as in the similar events covered the bottom ~1 Mb of A02 and B02, the bottom parents, but for one line (RIL028) the relative dosage was dramatically npg ~0.4 Mb of A03 and B03, the bottom ~2 Mb of A06 and B06, and the altered for two homeologous chromosomal regions (Fig. 7): 104–112 top ~0.5 Mb of A09 and B09. Because of lower sequence identities, Mb on Aradu.A04 and 112–126 Mb on Araip.B04. In these regions, A-subgenome nullisomes were more difficult to detect; neverthe- mapping to Araip.B04 almost disappeared and mapping to Aradu. less, the bottom ~3 Mb of chromosomes 04 appeared to be nulli- A04 dramatically increased in density. This clear signal indicates somic for the A subgenome and tetrasomic for the B subgenome that genetic exchange occurred between the A and B subgenomes in (Supplementary Fig. 32 and Supplementary Data Set 9). regions of the cultivated peanut genome that had balanced dosage in Although we recognize that genetic exchange between the A and B the parental lines. This seems most likely to have occurred by tetra- subgenomes will tend to inflate the values calculated (especially somic recombination, but gene conversion after the formation of an for the A genomes), we estimated the dates of evolutionary diver- unresolved Holiday junction is also possible. gence of the sequenced diploid genomes and the corresponding subgenomes of A. hypogaea. To estimate the genome-wide Arachis Diploid genome–guided tetraploid transcriptome assembly mutation rate, we mapped A. ipaensis Moleculo reads against the Assemblies of transcribed sequences from tetraploid cultivated pea- A. duranensis pseudomolecules. This gave a corrected median DNA nut are challenging because reads from genes on the A and B subg- identity of 93.11% (a value compatible with previous comparisons enomes are erroneously assembled together, resulting in chimeric using BAC sequences27). Considering the date of divergence of the sequences. We used the diploid genomes to minimize this collapse and A and B genomes as 2.16 million years ago (from KS values), this produced tetraploid transcript assemblies. We assessed four assembly gives an Arachis genome-wide mutation rate of 1.6 × 10−8 mutations software approaches in three different ways: de novo assembly; pars- per base per year (within the range of 1–2 × 10 −8 calculated for ing into A- and B-genome sets followed by separate assembly; and other plants44). This mutation rate and the divergence of the most parsing followed by genome-guided assembly using the combined conserved chromosomes (presumably the ones that have undergone pseudomolecules. Results were compared by measuring the the least recombination between subgenomes; A01 and B07) gives percentage of assembled transcripts that mapped back to the pseu- an estimate of the divergence time of A. duranensis V14167 from the domolecules without mismatches. Higher percentages indicate less Nature Genetics VOLUME 48 | NUMBER 4 | APRIL 2016 443 Articles Figure 8 The approximate known distributions of A. duranensis and A. magna A. magna, the location of the single known occurrence of A. ipaensis Altitude 90% amino acid identity and >90% coverage of 82. Wu, T.D. & Watanabe, C.K. GMAP: a genomic mapping and alignment program for the protein model. mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005). doi:10.1038/ng.3517 Nature Genetics

Use Quizgecko on...
Browser
Browser