Summary

This document is about genome analysis. It describes what a genome is and how it can be analyzed. It also covers the C-value paradox and representative genomes. The document includes a table showing the total DNA, chromosomes, and genes of different species.

Full Transcript

Genome Analysis What is a genome? - the complete sequence of an organism’s genetic information def - for example, the human genome is the complete sequence of all 46 human chromosomes: about 3 x 10 9 base pairs - a huge technological feat (Oct...

Genome Analysis What is a genome? - the complete sequence of an organism’s genetic information def - for example, the human genome is the complete sequence of all 46 human chromosomes: about 3 x 10 9 base pairs - a huge technological feat (Oct. 1990 - April 2003) - a great source of knowledge about the proteins an organism can synthesize - not easy to interpret, works best when compared to other genomes, or to prior knowledge - many complete genome sequences now available, along with sophisticated computational techniques The C-value Paradox futures Paris japonica 150,000 Mbp C (Constant/Characteristic)-value: Amount of DNA (in pg) in a haploid genome (1pg ~ 1 gigabase [109/billion]) Protopterus aethiopicus 13,000 Mbp Val haploid Representative Genomes TABLE 9-3: DNA, Gene, and Chromosome Content in Some Genomes Species Total DNA (bp) Chromosomes* Genes Escherichia coli (bacterium) 4,600,000 1 ∼4,300 Saccharomyces cerevisiae (yeast) 12,068,000 16† ∼5,800 Caenorhabditis elegans (nematode) 97,000,000 12‡ ∼19,000 Drosophila melanogaster (fruit fly) 180,000,000 18 ∼13,600 Arabidopsis thaliana (plant) 125,000,000 10 ∼25,500 Oryza sativa (plant) 480,000,000 24 ∼57,000 Mus musculus (mouse) 2,500,000,000 40 ∼55,000 Homo sapiens (human) 3,200,000,000 46 ∼20,000 *Diploid chromosome number for all eukaryotes except yeast. †Haploid chromosome number; wild yeast strains generally have eight (octoploid) or more sets of chromosomes. ‡Number for females, with two X chromosomes; males have an X but no Y, for 11 total. Paris japonica (plant - Japan) 149, 000,000,000 Technique of the Day: Cot1/2 Analysis (actually, its the technique of yesterday) Usage: measures "complexity" of a DNA sample, determines amount and type of repeated DNA present Method: 1. randomly shear double stranded DNA into fragments 2. denature to single strands by heating to 95°C 3. allow to reassociate by slow cooling under a series of different conditions: -use different DNA concentrations (Co) -use different lengths of time (t1/2) 4. measure "percent reassociated" to double strands by hyperchromicity: single stranded DNA has a higher UV absorbance than double stranded DNA. UV absorbance drops as the DNA sample reassociates (DNA concentration does not change!) DNA Can Be Denatured By Heating Melting Curve DNA Fragments Can Be Denatured/Renatured: Hybridization Complementary DNA strands can bind each other (hybridize) when slowly cooled (annealing) even in a complex mixture of other sequences A simple duplex of homopolymers, polyU: polyA, reassociates right away. MS2 is a small phage: it’s DNA is all unique sequence and takes longer to reassociate. Phage T4 and E. coli have larger genomes. Their DNA is all unique sequence buts takes longer for each fragment to find its partner. What would the reassoc- iation of human DNA look like? Reassociation of Human DNA is very different! Human DNA % of DNA double stranded 100 % slow reassociation: unique sequence DNA fast reassociation: moderately repeated DNA very fast reassociation: simple sequence repeats (Concentration x time to 1/2 completion) While the other DNA samples reassociated with simple kinetics, human DNA has multiple separate components that reassociate independently. 1 a b 2 c d 3 1’ 2’ 3’ a’ b’ c’ d’ 1. break genomic DNA into fragments 2. heat to denature a 1 a’ d’ 3’ d 3 c 2 b’ c’ b 2’ 1’ 3. slow cool to renature d’ a d c b’ c’ b 1 2 a’ 3 3’ 1’ 2’ 4. Repeated sequences (numbered) reanneal faster because they can pair with the opposite strand from ANY copy of the repeat. Unique sequences have to find their original opposite strand. Technique of the Day: Cot1/2 Analysis (actually, its the technique of yesterday) Analysis: Low Cot1/2 indicates sequences that easily find a pairing partner. If a DNA sequence is repeated in the genome, one strand can reassociate with the opposite strand from any other repeat. This does not take long, and indicates low complexity DNA: “REPEATED DNA” High Cot1/2 indicates sequences that do not easily find a partner. For example, a DNA sequence that is unique in the genome, like a single copy gene, can only reassociate with its original opposite strand. This takes a long time, and indicates high complexity DNA: “UNIQUE SEQUENCE DNA”. Cot1/2 analysis can differentiate different classes of repeated DNA, from highly repeated to moderately repeated to unique sequence DNA. What kinds of sequences are in the human genome? Prokaryotic and viral genomes are very efficiently organized: wall to wall genes Eukaryotic genomes are dramatically different: little of the DNA is devoted to “genes”, most of the genome consists of repeated DNA of various classes: – Simple Sequences (2-12 base pairs) repeated millions of times – SINES (short interspersed sequences) 100-700 bp, 105 copies Alu elements: 300 bp in size, a fragment of a normal gene, can insert in new places in the DNA (>10% of the human genome, ~1 million copies) – LINES (long interspersed sequences) 6000 bp, 104 copies remnants of retroviral integrations, some produce active reverse transcriptase – Highly Expressed Genes present in several hundred copies the cell needs a lot of ribosomes so it has 200 copies of rRNA genes The E. coli genome: E. Coli has one circular chromosome, one origin of DNA replication The sequence was completed in 1997: very high quality sequence: good coverage, few mistakes. It contains 4.6 Mbp with 4,288 genes features: wall-to-wall genes with little intergenic space genes organized into operons few repeated genes no introns some genes recently arrived by horizontal gene transfer E. coli genome where rep starts Blattner et al. (1997) Science 277:1453-1462 The overall structure of the E. coli genome The origin and terminus of replication are shown as green lines, with blue arrows indicating replichores 1 and 2. A scale indicates the coordinates both in base pairs and in minutes (actually centisomes, or 100 equal intervals of the DNA). The distribution of genes is depicted on two outer rings: The orange boxes are genes located on the presented strand, and the yellow boxes are genes on the opposite strand. Red arrows show the location and direction of transcription of rRNA genes, and tRNA genes are shown as green arrows. The central sunburst is a histogram of inverse CAI (codon adaptation index, long yellow rays represent clusters of low

Use Quizgecko on...
Browser
Browser