An_atlas_of_dynamic_chromatin_landscapes_in_mouse_.pdf

Full Transcript

Article An atlas of dynamic chromatin landscapes in mouse fetal development https://doi.org/10.1038/s41586-020-2093-3 David U. Gorkin1,2,21, Iros Barozzi3,4,21, Yuan Zhao1,5,21, Yanxiao Zhang1,21, Hui Huang1,6,21,...

Article An atlas of dynamic chromatin landscapes in mouse fetal development https://doi.org/10.1038/s41586-020-2093-3 David U. Gorkin1,2,21, Iros Barozzi3,4,21, Yuan Zhao1,5,21, Yanxiao Zhang1,21, Hui Huang1,6,21, Ah Young Lee1, Bin Li1, Joshua Chiou6,7, Andre Wildberg8, Bo Ding8, Bo Zhang9, Mengchi Wang8, Received: 8 August 2017 J. Seth Strattan10, Jean M. Davidson10, Yunjiang Qiu1,5, Veena Afzal3, Jennifer A. Akiyama3, Accepted: 11 June 2019 Ingrid Plajzer-Frick3, Catherine S. Novak3, Momoe Kato3, Tyler H. Garvin3, Quan T. Pham3, Anne N. Harrington3, Brandon J. Mannion3, Elizabeth A. Lee3, Yoko Fukuda-Yuzawa3, Published online: 29 July 2020 Yupeng He5,11, Sebastian Preissl1,2, Sora Chee1, Jee Yun Han2, Brian A. Williams12, Diane Trout12, Open access Henry Amrhein12, Hongbo Yang9, J. Michael Cherry10, Wei Wang8, Kyle Gaulton7, Joseph R. Ecker11,13, Yin Shen14,15, Diane E. Dickel3, Axel Visel3,16,17 ✉, Len A. Pennacchio3,16,18 ✉ & Check for updates Bing Ren1,2,8,19,20 ✉ The Encyclopedia of DNA Elements (ENCODE) project has established a genomic resource for mammalian development, profiling a diverse panel of mouse tissues at 8 developmental stages from 10.5 days after conception until birth, including transcriptomes, methylomes and chromatin states. Here we systematically examined the state and accessibility of chromatin in the developing mouse fetus. In total we performed 1,128 chromatin immunoprecipitation with sequencing (ChIP–seq) assays for histone modifications and 132 assay for transposase-accessible chromatin using sequencing (ATAC–seq) assays for chromatin accessibility across 72 distinct tissue- stages. We used integrative analysis to develop a unified set of chromatin state annotations, infer the identities of dynamic enhancers and key transcriptional regulators, and characterize the relationship between chromatin state and accessibility during developmental gene regulation. We also leveraged these data to link enhancers to putative target genes and demonstrate tissue-specific enrichments of sequence variants associated with disease in humans. The mouse ENCODE data sets provide a compendium of resources for biomedical researchers and achieve, to our knowledge, the most comprehensive view of chromatin dynamics during mammalian fetal development to date. Developmental gene regulation relies on a complex interplay between modification and accessible chromatin that make them amenable to genetic and epigenetic factors. Whereas genetic information encoded the binding of transcription factors (TFs), which can in turn recruit co- in the DNA sequence provides the instructions for an embryo to factors and stimulate transcription. These epigenomic properties have develop, epigenetic information is required for each cell in an embryo proven valuable for genome annotation, because histone modifications to obtain its specialized function from this single set of instructions. and accessibility at a given genome region can reflect the activity of Chromatin encodes epigenetic information in the form of post-trans- the underlying sequence5,6. lational histone modifications and accessibility to DNA binding fac- In previous phases of the ENCODE project, epigenomic and transcrip- tors1,2. Developmental programs of gene expression are orchestrated, tomic data sets were generated from mouse tissues at a single prenatal at least in part, by cis-regulatory sequences that direct the expression of time point (embryonic day (E)14.5) and two postnatal time points (8 and genes in response to specific developmental and environmental cues3,4. 24 weeks after birth)5. In the most recent phase of ENCODE, we made Active regulatory sequences show characteristic patterns of histone a coordinated effort to create resources for the study of mammalian 1 Ludwig Institute for Cancer Research, La Jolla, CA, USA. 2Center for Epigenomics, University of California, San Diego School of Medicine, La Jolla, CA, USA. 3Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA. 4Department of Surgery and Cancer, Imperial College London, London, UK. 5Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA. 6Biomedical Sciences Graduate Program, University of California, San Diego School of Medicine, La Jolla, CA, USA. 7Department of Pediatrics, University of California, San Diego School of Medicine, La Jolla, CA, USA. 8Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA. 9Department of Biochemistry and Molecular Biology, Penn State School of Medicine, Hershey, PA, USA. 10Stanford University School of Medicine, Department of Genetics, Stanford, CA, USA. 11Genomic Analysis Laboratory, Salk Institute for Biological Studies, La Jolla, CA, USA. 12Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA. 13Howard Hughes Medical Institute, Salk Institute for Biological Studies, La Jolla, CA, USA. 14Institute for Human Genetics and University of California, San Francisco, San Francisco, CA, USA. 15Department of Neurology, University of California, San Francisco, San Francisco, CA, USA. 16US Department of Energy Joint Genome Institute, Berkeley, CA, USA. 17School of Natural Sciences, University of California, Merced, Merced, CA, USA. 18Comparative Biochemistry Program, University of California, Berkeley, Berkeley, CA, USA. 19Institute of Genomic Medicine, University of California, San Diego School of Medicine, La Jolla, CA, USA. 20Moores Cancer Center, University of California, San Diego School of Medicine, La Jolla, CA, USA. 21These authors contributed equally: David U. Gorkin, Iros Barozzi, Yuan Zhao, Yanxiao Zhang, Hui Huang. ✉e-mail: [email protected]; [email protected]; [email protected] 744 | Nature | Vol 583 | 30 July 2020 Content courtesy of Springer Nature, terms of use apply. Rights reserved fetal development by generating epigenomic and transcriptomic data this gap, we collected mouse tissues at closely spaced intervals from sets from 7 additional stages of fetal development covering a window E11.5 until birth. At each stage, we dissected a diverse panel of tissues from E10.5 until birth at approximately one-day intervals. At each stage, from multiple litters of embryos and performed two replicates of we collected a diverse panel of 8–12 tissues to make a total of 72 tissue- ATAC–seq and ChIP–seq for each of eight histone modifications cho- stages, with 2 biological replicates per tissue-stage, and each replicate sen to distinguish between different types of functional elements (for containing tissue pooled from multiple embryos. This common tissue example, promoters, enhancers and gene bodies), and activity levels resource was used as input for RNA sequencing (RNA-seq)98, whole- (for example, active, poised and repressed)13,14 (Fig. 1a, b, Extended genome bisulfite sequencing7, ATAC–seq, and ChIP–seq for eight his- Data Fig. 1a, b). We also profiled 6 tissues at E10.5, using a micro-ChIP– tone modifications (ATAC–seq and ChIP–seq described here). Data seq procedure designed for smaller cell numbers and restricting our from this and all phases of ENCODE are publicly available through the scope to 6 histone modifications15. All ChIP–seq and ATAC–seq data sets ENCODE portal (https://www.encodeproject.org/). were processed with a uniform pipeline and subjected to quality stand- To map chromatin states during mouse fetal development, we ards (Methods; Fig. 1c, Extended Data Figs. 1c–f, 2, 3). Whole-genome performed ChIP–seq for a set of eight histone modifications that can bisulfite sequencing and RNA-seq from other groups are reported in distinguish between functional elements and activity levels. To assay companion manuscripts7,98 and used in select analyses below. chromatin accessibility, we used a version of ATAC–seq8 optimized for We observed several notable high-level features of the data series. use on frozen tissues (Methods). Chromatin accessibility can also be As expected, the landscape of histone modifications and chromatin mapped by DNase I hypersensitive sites sequencing (DNase-seq), which accessibility varies between tissues, particularly for marks of activity has been integral to the identification of millions of candidate regula- such as H3K27ac (acetylation at the 27th lysine residue of histone H3) tory sequences in mammalian genomes9,10, but we chose ATAC–seq here (Fig. 1d, Extended Data Fig. 4). Within each tissue, chromatin landscapes because it offers a more streamlined workflow. The resulting maps of change progressively across stages (Fig. 1e, Extended Data Fig. 5a–c). chromatin accessibility, together with those of histone modifications, These developmental dynamics are likely to reflect at least two underly- provide deep insight into the genomic regions and processes that drive ing biological processes: changes in the epigenetic landscape of indi- mouse fetal development. vidual cells within a tissue as they undergo differentiation, and shifts We systematically map chromatin state and accessibility across in the relative abundance of different cell types that compose a tissue. 72 distinct tissue-stages of mouse development, and carry out Although in most cases we cannot separate the relative contributions integrative analyses incorporating additional epigenomic and of these two factors, many of the changes we observe reflect known transcriptomic data sets from the same tissue-stages. hallmarks of cellular differentiation. For example, in the developing We derive a chromatin state model from combinatorial patterns of forebrain, neuronal markers acquire active chromatin signatures dur- histone modifications, encompassing 15 distinct states grouped in ing development, whereas genes that encode cell cycle factors show 4 broad functional classes: promoter, enhancer, transcriptional, the opposite trend (Fig. 1b, Extended Data Fig. 5d–f). and heterochromatin states. We characterize the spatial and temporal dynamics of chromatin states, finding that approximately 1–4% of the genome differs in The developmental chromatin landscape chromatin state between tissues at the same stage, and 0.03–3% To leverage the chromatin state information captured by combi- differs between adjacent stages of the same tissue; enhancer natorial patterns of histone modifications, we used ChromHMM 16, chromatin states show the largest differences in both cases. which derived a 15-state model that shows near-perfect consistency We show that Polycomb-mediated repression is pervasive between biological replicates and general agreement with previously during fetal development at genes that encode transcriptional published models10,13,16 (Fig. 2a, Extended Data Fig. 6; Methods). We regulators and enriched at those with human orthologues linked segmented the genome for each tissue-stage with the full comple- to Mendelian diseases. ment of eight histone modifications (n = 66 tissue-stages), excluding We identify more than 500,000 developmental regions of E10.5 to ensure a consistent approach (Extended Data Fig. 7). Each transposase-accessible chromatin marked by accessible state was assigned a descriptive label based on its similarity to known chromatin during mouse fetal development, including chromatin signatures5,13,17, and genomic distribution (Extended Data approximately 140,000 with dynamic temporal activity in at least Fig. 6i). The resulting chromatin state maps allow the visualization of one tissue. multiple functional predictions across a range of tissues and stages We show that human orthologues of mouse fetal accessible (Fig. 2b). chromatin regions are enriched for human disease-associated The 15 chromatin states fit into four broad functional classes: pro- sequence variation, with apparent tissue-restricted patterns of moter, enhancer, transcriptional, and heterochromatin states. As enrichment. expected, promoter states show the highest average levels of chromatin We show that temporal changes in chromatin accessibility often accessibility, followed by enhancer, transcriptional, and heterochro- coincide with changes in enhancer chromatin states, and tend to matin (Fig. 2c). In total, about 33% of the genome shows a reproducible precede changes in nearby H3K27ac levels. chromatin signature characteristic of one of these four functional We predict 21,142 enhancer–promoter interactions by measuring classes in at least one tissue-stage. In this calculation we required that the correlation between enhancer-associated chromatin signals a region be called in the same state in both biological replicates, and and gene expression across tissues-stages. we excluded states 15 (‘no signal’) and 11 (‘permissive’), which covered We show that candidate enhancers with stronger enrichment large swaths of the genome (Fig. 2d, Extended Data Fig. 8a). This does for marks of regulatory activity such as H3K27ac show a higher not necessarily imply that 33% of the genome sequence is functional validation rate in reporter assays in vivo. during development, but rather that 33% of the genome sequence is mappable and packaged in chromatin with a reproducible signature in at least one tissue-stage profiled here. These chromatin signatures often Profiling chromatin states in vivo reflect transcriptional and/or regulatory activity, but the underlying Despite the importance of chromatin states and accessibility in deter- sequences may not be under negative selection18. mining the functional output of the genome, a comprehensive survey The breadth of data collected here enabled us to characterize the spa- of chromatin dynamics during mammalian fetal development has been tial and temporal dynamics of chromatin states. On average, about 1.2% lacking aside from very early stages of embryogenesis11,12. To address of the genome differs in chromatin state between tissues at the same Nature | Vol 583 | 30 July 2020 | 745 Content courtesy of Springer Nature, terms of use apply. Rights reserved Article a Developmental stages (n = 8) c TSS-distal ATAC–seq peaks 140K (stages merged) 120K No. peaks 100K 80K 60K 40K 20K E10.5 E11.5 E12.5 E13.5 E14.5 E15.5 E16.5 P0 0 Forebrain (fb) fb mb hb ht lv in kd lu st nt lm cf Midbrain (mb) 8 histone modifications and ATAC–seq Tissues (n = 12) Hindbrain (hb) (1,056 ChIP–seq and 132 ATAC–seq) Heart (ht) Liver (lv) TSS-proximal ATAC–seq peaks Intestine (in) 140K (stages merged) Kidney (kd) 120K Not sampled No. peaks Lung (lu) 100K Stomach (st) 80K Neural tube (nt) 60K 40K Limb (lm) Not sampled 20K Craniofacial (cf) 0 prominence fb mb hb ht lv in kd lu st nt lm cf Micro-ChIP–seq at E10.5; 6 histone modifications per tissue (72 micro-ChIP–seq assays) b d Three major dimensions of data series Neurod2 Stages Data types Tissues Ti E10.5 s su fb mb hb nt ht lv in st lu kd lm cf es E11.5 Neurod2 E12.5 Neurod2 E13.5 fb E14.5 mb E15.5 All H3K27ac peaks H3K4me3 hb E16.5 H3K4me2 P0 ht H3K4me1 lv E10.5 H3K27ac in E11.5 H3K9ac kd E12.5 lu E13.5 H3K36me3 st E14.5 H3K27me3 nt E15.5 H3K9me3 lm E16.5 ATAC–seq cf P0 10 kb 10 kb 10 kb Developmental progression log2(ChIP/input) All marks, fb, E15.5 H3K27ac, all tissues, E15.5 H3K27ac/me3, fb, all stages 0 4 e H3K4me3 H3K4me2 H3K4me1 H3K27ac H3K9ac H3K36me3 H3K27me3 H3K9me3 ATAC 1.0 Spearman’s correlation 0.8 0.6 0.4 0.2 0 –0.2 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456 0123456 Developmental stages separating datasets within a tissue Fig. 1 | Profiling histone modifications during mouse fetal development. 36,849, 38,670, 31,168, 36,822 and 87,258. e, Spearman’s correlations of peak a, Experimental design. b, Three major axes of the data series: data types, tissues, strength between replicates from the same stage (that is, developmental and developmental stages (chr11: 98,318,134–98,336,928; mm10). Horizontal stages separating data sets is 0), or from different stages separated by one to scale 0–30 for narrow marks (H3K4me3, H3K4me2, H3K27ac, H3K9ac), 0–10 for six intervening stages, as indicated. Number of points per comparison: broad marks (H3K27me3, H3K4me1, H3K9me3, H3K36me3) and ATAC–seq. 0 stages, 66; 1 stage, 108; 2 stages, 84; 3 stages, 60; 4 stages, 36; 5 stages, 20; c, Number of TSS-distal (top, >1 kb) and TSS-proximal (bottom) ATAC–seq 6 stages, 10. For all boxplots in this paper: horizontal line, median; box, peaks for each tissue. d, k-means clustering of H3K27ac peaks (n = 333,097) interquartile range (IQR); whiskers, most extreme value within ±1.5 × IQR. across tissue-stages (k = 8). Cluster sizes, top to bottom: 20,497, 50,790, 31,043, stage (mean 1.2%, 31.3 Mb; range 1.0–4.0%, 26.8–109.1 Mb). Enhancer systematically examine the role of Polycomb-group (PcG) proteins dur- states are most variable between tissues, consistent with the role of ing mouse development, we assembled a list of 6,501 putative PcG target enhancers in defining tissue and cell identity (Fig. 2e, Extended Data genes with transcription start sites (TSSs) marked by Hc-P in at least Fig. 8b–e). Indeed, hierarchical clustering based on strong enhancers one tissue-stage (Extended Data Figs. 9c, 11, Supplementary Tables 1, alone (that is, state 5) distinguished tissues and identified similari- 2), many of which overlapped with DNA methylation valleys (DMVs) ties in developmental origin (Extended Data Fig. 8b, c). Within a given in the same tissue-stage7 (Extended Data Fig. 11e). Most of these genes tissue, about 1.3% of the genome differs in chromatin state between are previously described targets of PcG (Extended Data Fig. 11a–d), but adjacent developmental stages (mean 1.3%, 36.6 Mb; range 0.03–3.01%, roughly one quarter (n = 1,786) have not been described as PcG targets 9.4–82.1 Mb). Enhancer states are most variable, although poised or in mouse29–32, and 400 have not been described in human or mouse13. weak enhancer states are more variable than strong enhancer states Consistent with previous reports29–31, TFs are highly enriched among (Fig. 2e). Nonetheless, temporal changes in strong enhancer states can PcG targets (Extended Data Fig. 12a). Furthermore, we find that TFs capture important developmental processes such as the transition of with known human Mendelian phenotypes (Mendelian disease genes, fetal liver function from haematopoiesis to metabolism (Extended MDGs) are even more likely than other TFs to be PcG targets (1.42-fold, Data Fig. 9a). P = 2 × 10−7 considering all TFs; 1.23-fold, P = 1.3 × 10−4 excluding zinc We found that the Polycomb-associated heterochromatin state (Hc-P, finger TFs; Fig. 2f, g, Extended Data Fig. 12b–d). These data suggest state 13) is prevalent at well-characterized regulators of tissue develop- that PcG-mediated repression has an essential and pervasive role in ment19–23 (Fig. 2b, Extended Data Fig. 9b), while another heterochro- silencing key regulators outside their normal expression domains matic state characterized by H3K9me3 is found mainly in repetitive and point to failed repression as a potential disease mechanism for sequence, as previously described24–28 (Extended Data Fig. 10). To more further exploration. 746 | Nature | Vol 583 | 30 July 2020 Content courtesy of Springer Nature, terms of use apply. Rights reserved a b Neil2 Gad1 Gata4 H3K27me3 H3K36me3 Emission H3K4me3 H3K4me2 H3K4me1 H3K9me3 H3K27ac probability H3K9ac 0 1 Pr-A E11.5 1 Active (A) Pr-W 2 fb Weak/inactive (W) Pr-B Promoter 3 Bivalent (B) Pr-F P0 4 Flanking (F) E11.5 En-Sd 5 Strong, TSS-dist. (Sd) mb 6 En-Sp Strong, TSS-prox. (Sp) P0 Enhancer En-W 7 Weak, TSS-dist. (W) En-Pd E11.5 8 Poised, TSS-dist. (Pd) En-Pp hb 9 Poised, TSS-prox. (Pp) 10 Strong (S) Tr-S P0 11 Permissive (P) Transcription Tr-P E11.5 12 Tr-I nt Initiation (I) E15.5 13 Polycomb-assoc. (P) Heterochromatin Hc-P E11.5 14 H3K9me3-assoc. (H) Hc-H 15 No signal (Ns) Ns ht P0 c Promoter Enhancer Transcriptional Heterochromatin E11.5 states states states states 5 lv Chromatin accessibility P0 4 E14.5 in 3 P0 E14.5 2 st P0 1 E14.5 lu P0 0 E14.5 –10 kb +10 kb –10 kb +10 kb –10 kb +10 kb –10 kb +10 kb kd P0 chromHMM chromHMM chromHMM chromHMM E11.5 region region region region lm Pr-A Pr-B En-Sd En-Pd Tr-S Hc-P E15.5 Pr-W Pr-F En-Sp En-Pp Tr-P Hc-H E11.5 En-W Tr-I Ns cf E15.5 d Coverage of chromatin states e Chromatin state 50 kb 50 kb f 10 11 12 13 14 15 Pr-A 0.8 All protein-coding genes 1 2 3 4 5 6 7 8 9 Across stages Pr-W E12.5 Only TF genes Cumulative fraction (reference: fb, E11.5) Pr-B E13.5 Only TF MDG genes Pr-F E14.5 0.6 of gene set En-Sd E15.5 En-Sp E16.5 0.4 En-W P0 En-Pd mb 0.2 En-Pp hb nt Across tissues Tr-S Variable 0 (reference: ht fb, E15.5) Tr-P bases lv (fraction) 1 No. tissue-stages 66 Tr-I gene marked by Hc-P Hc-P in 1.0 Hc-H st g Ns lu −MDG P=2× 10−7 kd +MDG lm 0 0 0.1 0.2 0.3 0.4 0.5 0.6 10 kb 100 kb 1 Mb 10 Mb 100 Mb 1 Gb 10 Gb Enhancer states Fraction of TF genes marked by Hc-P in at least one tissue-stage Fig. 2 | A 15-state model characterizes the mouse developmental chromatin tissues at E15.5 (bottom). f, Fraction of indicated gene sets that show evidence landscape. a, Emission probabilities for histone modifications in 15 of PcG repression: for all protein-coding genes (0.313, black line); TF protein- ChromHMM states, with descriptive title of each state. b, Chromatin state coding genes (0.515, light blue line); and MDG TF protein-coding genes (0.667, landscapes at Gad1 (chr2: 70,541,017–70,641,016; mm10) and Gata4 (chr14: dark blue line). Cumulative fractions plotted by the number of tissue-stages at 63,181,234–63,288,624; mm10). Pr, promoter; En, enhancer; Tr, transcription; which a gene shows PcG repression (from one to 66, x-axis). g, MDG TFs are Hc, heterochromatin. c, Average chromatin accessibility at different chromatin more likely to show evidence of PcG repression (MDG+, 150/225; MDG−, states in E15.5 forebrain. d, Genome coverage of chromatin states in each 349/744). χ2 test of independence between PcG repression and MDG tissue-stage (n = 66). e, Fraction of bases for each state that vary in forebrain involvement. between E11.5 and other stages (top), or between E15.5 forebrain and other VISTA database34 shows that about 20% of d-TACs tested show in vivo Catalogue of regulatory sequences reporter activity in the corresponding tissue (Extended Data Fig. 13b), To build a catalogue of candidate regulatory sequences in mouse and 76–94% of in vivo validated enhancers are d-TACs in the corre- fetal development, we identified a non-overlapping set of 523,159 sponding tissue at E11.5 (VISTA reporter expression measured in E11.5 regions that were accessible in at least one tissue-stage, referred to embryos; Fig. 3c, d). below as developmental regions of transposase-accessible chromatin To more directly assess the temporal dynamics of chromatin acces- (d-TACs) (Fig. 3a, Supplementary Table 3). We note that this d-TAC sibility during development, we identified 139,894 dynamic d-TACs catalogue is based only on the mouse tissue ATAC–seq data reported that exhibit a significant change in accessibility in at least one stage here, and is thus distinct from the ENCODE Registry of Candidate transition within a tissue (27% of all d-TACs; Fig. 3f, g, Extended Data cis-Regulatory Elements (ccREs) (http://screen.encodeproject.org/), Fig. 13c). Most dynamic d-TACs show a significant change at only which incorporates data from other samples and assays53. Approxi- one stage transition in this developmental window (Extended Data mately 22% of d-TACs overlap with peaks from a single-cell ATAC–seq Fig. 13d, e), suggesting that these changes reflect enduring shifts in atlas of adult mouse tissues published while this manuscript was in cell fate and/or composition rather than rapid on–off switches. Gain revision33 (Extended Data Fig. 13a). We find that d-TACs are enriched or loss of accessibility often corresponds to gain or loss of enhancer in promoter and enhancer states, but generally depleted in states chromatin states, respectively (Fig. 3h, Extended Data Fig. 13f, g). In that characterize gene bodies, heterochromatin, and regions with addition, d-TACs close to each other in the genome are more likely to no chromatin signature (Fig. 3b). Most d-TACs are distal to annotated have correlated activity across tissue-stages (Fig. 3e, Supplementary TSSs, representing putative enhancers and other TSS-distal elements Table 3), particularly when located in the same topologically associat- (90% of d-TACs are more than 1 kb from a TSS). Comparison with the ing domain (TAD)35. Nature | Vol 583 | 30 July 2020 | 747 Content courtesy of Springer Nature, terms of use apply. Rights reserved Article a b Enrichment of d-TACs c d-TACs, estimates of: j - Body fat percentage Total d-TACs (523,159) - Primary biliary cirrhosis in chromatin states Sensitivity Specificity - Chronic kidney disease TSS distal pr enh - Type 2 diabetes Pr-A 100 - Bipolar disorder (473,332; 90.5%) No d-TAC present (%) Pr-W - Age at menopause - Atopic dermatitis dTAC overlap (%) Pr-B 80 - Alzheimer’s disease Pr-F - Crohn’s disease 60 - Inf. bowel disease TSS proximal En-Sd - Ulcerative colitis (49,827; 9.5%) En-Sp - Pri. sclerosing cholangitis 40 - Major depressive disorder En-W - Rheumatoid arthritis En-Pd - Birth length 20 d VISTA enhancer En-Pp - Myocardial infarction - Coronary artery disease enrichment 0 - Birth weight Tr-S - Height fb Actively transcribed TSS DNase- inaccesible TSS VISTA enahncers Tr-P d-TAC accessible - PTSD mb Tr-I - Autism spectrum disorder - Head circumference hb Hc-P - Microalbuminuria Hc-H - Child sleep duration nt 1 - Extraversion Ns - Neuroticism lm h Accessibility change: - Schizophrenia ht –1 −4 −2 0 2 4 6 Gain Loss - Body mass index - Anxiety Pr - ADHD log2[fold enrichment] fb mb hb nt lm ht log2[enrichment] En - Depressive symptoms Stage n accessible chromatin - Adult sleep duration Reporter expression Tr - Anorexia nervosa e g Hc - Chronotype - Educational attainment 0.5 d-TACs in same TAD: Example: lung Ns - Intelligence across tissue-stages - Waist: hip ratio En Hc Ns En Hc Ns Pr Tr Pr Tr Mean correlation 0.4 Yes No - ALS * Hopx Stage n + 1 3 –3 - Hip circumference 0.3 * ** ** - Coeliac disease * ** ** ** log2[enrichment] - Heart rate 0.2 * * * i Dynamic –3 3 P = 2.8 × 10–100 fb mb hb nt lm cf ht lv in kd lu st 0.1 peaks Normalized enrichment All peaks 30 k 0 - Autism spec. disorder GWAS SNPs Regions overlap (%) E14.5 - Type 2 diabetes 0.01 0.02 0.05 0.1 0.2 0.5 1.0 2.0 Matched ATAC-seq - Adult sleep duration E15.5 20 background - ADHD Distance between d-TACs (Mb) SNPs - Bipolar disorder f E16.5 - Educational attainment - Extraversion TSS proximal (12,659; 9%) 10 P = 3.1 × 10–11 - Intelligence P0 - Schizophrenia TSS distal (127,235; 91%) chromHMM - Alzheimer’s disease E14.5 - Anorexia nervosa E15.5 0 - Child sleep duration E16.5 All Novel - Neuroticism 0 20 40 60 80 100 120 140 P0 10 kb d-TAC human MG OC IN1 IN2 AC EX1 EX2 EX3 Dynamic d-TACs (×1,000) orthologues Fig. 3 | An expansive catalogue of regulatory sequences in mouse fetal in lung at Hopx (chr5: 77,084,370–77,116,768; mm10), a marker of mature development. a, Number of TSS-proximal and TSS-distal d-TACs. alveolar type I cells59. h, Chromatin state changes at dynamic d-TACs that gain b, Enrichment of accessible chromatin within different chromatin states (left) or lose (right) accessibility. Enrichment relative to coverage of each state (n = 66 tissue-stages). c, Estimates of d-TAC catalogue sensitivity (left) and in total d-TAC catalogue. i, Enrichment of genome-wide association study specificity (right). Six tissue-stages plotted for enhancers based on VISTA data (GWAS) single nucleotide polymorphisms (SNPs) in d-TAC human orthologues availability (E11.5 forebrain, midbrain, hindbrain, limb, heart and neural tube). compared to background set generated with SNPsnap60. Hypergeometric test Eighteen tissue-stages plotted for DNase-inaccessible TSS based on matched (all n = 190,462; novel n = 20,891, not described in catalogues of accessible DNase data available through the ENCODE portal. pr, promoter; enh, enhancer. chromatin regions in human). j, Enrichment of GWAS signal for complex traits d, Enrichment for elements that direct tissue-restricted reporter expression and diseases (y-axis) within human orthologues of TSS-distal d-TACs from within d-TACs accessible in the corresponding tissue. e, Correlation of specific tissues (x-axis) with polyTest61. For GWAS sample sizes, see ATAC–seq signal across tissue-stages plotted as a function of genomic distance Supplementary Table 11. Enrichment values plotted are −log10[polyTest between d-TACs (n = 523,159). d-TACs are divided according to whether they are P values] z-score normalized within studies. k, As in j, but with TSS-distal the same TAD (red line) or not (blue line). Two-sided Wilcoxon signed rank test, accessible chromatin regions from published forebrain single-nucleus left to right: P = 0.04, 0.02, 2 × 10 −3, 2 × 10 −4, 1 × 10 −4, 1 × 10 −4, 1 × 10 −4, 1 × 10 −4. ATAC–seq38. EX, excitatory neurons (sub-clusters 1, 2, 3); IN, inhibitory neurons f, Number of dynamic TSS-proximal and TSS-distal d-TACs. g, Dynamic d-TACs (sub-clusters 1, 2); AC, astrocytes; OC, oligodendrocytes; MG, microglia. Catalogues of candidate regulatory sequences can provide valu- able resources for the interpretation of non-coding genetic varia- Developmental enhancer dynamics tion linked to disease33,36,37. Thus, we investigated whether our d-TAC Given the important role of enhancers in directing gene expression, catalogue could provide insights into the genetics of human disease. we focused on dynamic enhancers as a window into the developmental We first identified putative human orthologues of our mouse d-TACs processes and regulatory factors in each tissue. We identified a high- (Supplementary Table 4). Approximately 89% (169,571 of 190,462) of confidence set of candidate enhancers marked by the strong TSS-distal these human sequences have been annotated as accessible chromatin enhancer state (Extended Data Fig. 14a, Supplementary Table 5), and in human cells9,36, suggesting that they have conserved function. We

Use Quizgecko on...
Browser
Browser