Microbial Genomics Pt1 Lecture Notes PDF

Summary

These lecture notes provide an overview of microbial genomics. The document details the different domains of life, including bacteria, archaea, and eukaryotes, their characteristics, and their relatedness. It also discusses genome structure, including prokaryotic and eukaryotic genomes, and the processes of DNA replication and transcription.

Full Transcript

Lecture 1: Microbial diversity ​ Three domains of life: bacteria, archaea, eukarya ​ Diversity bigger in prokaryotes (bacteria and archaea) bc of longer evolutionary time ​ 16S or 18S used for sequencing to make phylo trees ​ Microbes = dominant life form ​ Best studied = proteobacter...

Lecture 1: Microbial diversity ​ Three domains of life: bacteria, archaea, eukarya ​ Diversity bigger in prokaryotes (bacteria and archaea) bc of longer evolutionary time ​ 16S or 18S used for sequencing to make phylo trees ​ Microbes = dominant life form ​ Best studied = proteobacteria (ex. E coli) ​ Examples of microbes ○​ Proteobacteria ○​ Cyanobacteria (generates o2) ○​ Firmicutes (spores) ○​ Actinobacteria (antibiotics) ○​ Planctomycetes (bacteria w/membrane bound organelles ​ Eukaryarchota (archaea ex) can be ○​ Methanogens ○​ Halophiles ○​ Extremophiles ○​ May contain small genomes ​ Eukarya developed chloroplasts and mitochondria from bacteria (via evolution) ​ Examples of eukarya ○​ Fungi ○​ Amoebozoa ○​ Protists ○​ Some algae ​ Viruses = microbes but no living (debatable) ​ Important vocab for phylogenetic classifications ○​ Domain (bacteria, archaea, eukaryota) ​ Kingdom for eykaryota ○​ phylum (ex. Proteobacteria, plantomycetes) ○​ Class ○​ Order ○​ Family ○​ Genus (ex. Homo as in homo sapiens) ○​ Species (ex sapiens) ​ Tree is based on genome sequence ​ Ribosomes are complexes of proteins and ribosomal RNAs found in all cells are used to make proteins and bound to mRNA ​ Central dogma = replication via DNA polymerase, transcription via RNA polymerase, translation via ribosomes ​ 16S rRNA has variable and conserved regions, vertically inherited and present in all organisms which make it suited for phylogenetic analysis ○​ Variable regions will be closely similar in related organisms ​ Ribosome binds at Shine Dalagarno sequence (ribosome binding site; 16S on prokaryotes and 18S on eukaryotes) on RNA ​ Mitochondria have a 16S rRNA gene that is similar to bacteria ​ 16S sequences can be used for making phylogenetic trees, taxonomic classifications, measuring bacterial abundances in a community of microbes, visualizing the locations of different bacteria in their environment ​ PCR amplifies the 16S gene using universal primers (made from conserved regions) ​ If 16S sequences of two bacteria are >97% identical, they are considered to be the same species Lecture 2: Genome Structure ​ Typical features of prokaryotic genomes ○​ Horizontally acquired genes ○​ Transposons = jumping genes, segment of DNA that can move around genome ○​ One circular chromosome ​ Typical features of eukaryotic genomes ○​ Diploid ○​ Horizontally acquired genes ○​ Can undergo meiosis ○​ Transposons ○​ Larger than 10 Mbp ​ Organisms with larger genomes tend to have more genes ​ Bacteria with very small genomes tend to rely on another organism for their basic needs ​ DNA replication must start at a specific location on the chromosome ​ All known DNA and RNA polymerases add nucleotides at the 3’ end only (from 5 to 3 is the buildup) ​ Single stranded DNA is more commonly found in the lagging strand template during DNA replication ​ DNA replication and transcription can happen at the same time ​ Genome = the complete genetic material of an organism including chromosome(s) and plasmid(s) ​ Eukaryotic genome features ○​ Linear DNA form ○​ Multiple genomes ○​ Diploid ploidy ○​ Located in nucleus (except eukaryotes also have mitochondrial DNA) ​ Prokaryotic genome features ○​ Usually Circular DNA form and sometimes a plasmid or two ○​ Singular genome ○​ Haploid ploidy ○​ Located in the cytoplasm (nucleoid= DNA is compacted) ​ Components of a genome ○​ Mobile genetic elements are materials that can be transferred between genomes or to different locations within a genome ​ Plasmids ​ Viral genes and genomes ​ Transposons and insertion elements (pieces of DNA that can move from one location to another) ○​ Mobile elements contribute to horizontal gene transfer and genome rearrangements ​ Modes of inheritance include ○​ Vertical = parent to offspring ​ Variation through mutation and selection ○​ Horizontal = from another organism ​ Major cause of variability in prokaryotic genome ​ Mobile elements benefit organisms by ○​ Carrying beneficial genes such as antibiotic resistance genes (make bacteria survive in antibiotics and carried on plasmids) and genes that aid in metabolism (virus that affects bacteria which can allow photosynthesis) ○​ Allowing bacteria to colonize a niche (ex symbiosis or virulence plasmids or toxins) ○​ Conferring immunity to phage infection via prophages ​ Mobile elements can harm organisms by ○​ Energy draining due to extra DNA replicated ○​ Inducing prophages (viral gene genomes inside host genome and can induce lytic cycle) that kill the cell ○​ Inactivating important genes via transposons ○​ Inducing selfish genes (genes that once are in a genome the genome cant get rid of them; can be toxins carried on plasmids) ​ Typical protein coding gene is 1 kb for prokaryotic genes and for eukaryotic is 10 kb ​ Gene size of bacteria and archaea is 1 protein coding gene ​ Gene size of viruses are < 1 lb ​ Gene size of eukaryotes are several kb ​ E. coli’s genome is 4.6 mbp ​ Human genome is 3 Gbp ​ Genome and cell size of eukyotes ○​ Most are gbp range ○​ A small percentage of genome is protein coding genes ○​ Cell size or particle size is around 10 um ​ Genome and cell size of prokaryotes ○​ Mbp range ○​ More than 90% of genome are protein coding genes ○​ Cell size is around 1 um ​ Genome and cell size of viruses ○​ 10 kb range ○​ More than 90% of genome are protein coding genes ○​ Particle size is cytokinesis -> G1 phase = cellular contents excluding the chromosomes are duplicated -> S phase = each of the 46 chromosomes is duplicated by the cell -> G2 = the cell “double checks” the duplicated chromosomes for error, making any needed repairs -> back to mitosis ○​ Bacteria has a single origin while eukaryotes and archaea have multiple origins ​ DNA replication has four steps: initiation, elongation, termination, untangling ​ Initiation of DNA replication ○​ Replication begins at a specific site in the chromosome ​ “Origin of replication” or oriC ○​ DNA replication is initiated by the binding of specific proteins to oriC ​ oriC is a cis-acting site (DNA sequence that something acts upon; if sequence moves so does proteins) ​ Proteins that bind at cis-acting sites are referred to as “trans acting factors” (proteins find DNA wherever it is) ○​ oriC is composed of a number of repeated sequences that are bound by DnaA initiator subunits ○​ Replication is regulated at the level of initiation (once the cell commits to starting replication it commits to the whole process) ​ DnaA protein binds to OriC, the bacterial chromosomal origin of replication, to initiate replication ​ Elongation = bidirectional replication of the chromosome ○​ Has a constant rate ○​ Machinery (enzymes that do the process) is highly conserved ​ Termination of DNA replication occurs at ter sequences (only some have ter sequences not all) or wherever replication ends ​ Untangling the chromosomes: several proteins are needed ○​ Topoisomerases nick (single strand of DNA is broken) and unwind supercoiled DNA making double strand break to separate chromosomes ○​ Recombination proteins resolve Holliday junctions to separate chromosomes ​ Introduction to GC skew ○​ Cytosine bases on single stranded DNA are vulnerable to deamination (causes mutation) ○​ Deamination of C converts it to uracil ○​ U can base pair with adenine resulting in a CG → TA mutation as time goes on ○​ In order to avoid that mutation happening there tends to be less C’s in single stranded DNA ​ All known DNA and RNA polymerases add nucleotides at the 3’ end ​ C’s in ssDNA are prone to mutation, so organisms have evolved to have more G’s in the lagging strand template ​ Blue is the ssDNA in the lagging strand template ​ Blue on top is the leading strand and black on bottom is lagging strand ​ Genomes have fewer C’s on template of lagging strand (+ strand) ​ More G’s on the right side of the + strand and less G’s on left side of the + strand ​ From left to right or graph ​ ​ DNA replication and transcription occur simultaneously using the same DNA template ​ Processes occurring simultaneously on DNA ○​ DNA replication complex (DNA polymerase, helicase, etc.) unwinds and replicates DNA ○​ Hundreds of genes are being transcribed by RNA polymerase at the same time ○​ Co-directional bc DNA replication gets disrupted with head on DNA polymerases; may cause fork stalling which causes cell death ○​ Adaptation where organisms have genes going the same direction of the replication fork; facing same direction as replication fork ○​ Outer circle is where coding sequence of gene is on the plus strand ○​ Second inner circle is orf facing the same direction ○​ To prevent replication-transcription conflicts, many organisms have evolved transcribe most genes in the direction of DNA replication Base of arrow is start codon on the plus strand and the coding sequence is on the plus strand (gene is going left to right) Where GC skews flip you can say with confidence that its the origin (from low to high) and terminus from high to low Problem Set 1 ​ Structure of deoxythymidine monophosphate ○​ A= phosphate group ○​ B = thymine ○​ C = 3’ hydroxyl group ○​ D = 5’ carbon ○​ E = 3’ carbon ​ Phosphate group and hydroxyl group covalently bound to adjacent nucleotides ​ Thymine or nucleic base participates in base pairing to a different DNA strand in a double stranded DNA molecule ​ A new deoxynucleotide is added to the 3’ hydroxyl group during DNA replication ​ Mitochondrial genomes are more similar in size to chloroplast genomes than they are to bacterial genomes ​ Eukaryotic genomes have a wider range in size than bacterial genomes ​ 1000 kbp = 1 Mbp ​ 1,000,000 = 1 Gbp ​ Most prokaryotic genomes are composed of a single circular chromosome and sometimes a plasmid or two ​ The 5’ end is the top part of the DNA sugar phosphate backbone ​ The 3’ end is the bottom part of the DNA sugar phosphate backbone ​ The complementary DNA strand is located on the right side ​ The DNA strand complementary to the one above has the 5’ end on the bottom side ​ A and C is the 3’ end ​ B and D is the 5’ end ​ i is the leading strand and iii is the lagging strand ​ The replication fork is moving from right to left ​ Deamination of cytosine converts it to uracil ​ C bases on single stranded DNA are vulnerable to deamination ​ When GC skew is less than 0, there is less G than average on the (+) strand ​ Single stranded DNA is more commonly found on the template for the lagging strand ​ For origin the GC skew graph will go from low # of Gs to high number of G’s ​ For termination the GC skew graph will go from high # of Gs to low number of G’s ​ Replication and transcription produce a new strand of nucleic acid ​ Replication replicates the genome for cell division ​ Transcription is performed by RNA polymerase ​ Transcription is used to express genes for metabolic and biochemical functions ​ Transcription has an RNA product ​ Replication is performed by DNA polymerase ​ Replication has a DNA product ​ Bacterial genes tend to be transcribed in the same direction as DNA replication ​ The coding sequence is found on the leading strand to allow co-directional replication and transcription that avoids collisions Lecture 3: Comparative genomics ​ In order to compare the metabolic functions of two different organisms using comparative genomics ○​ Both genomes must be sequenced ○​ Both genomes must be annotated ​ 16S rRNA gene must be only vertically inherited to construct phylogenetic tree ​ Illumina is able to sequence more DNA for the cost ​ All cellular organisms have some genes in common ​ The energy metabolism of a microbe can be predicted by genome analysis ​ All prokaryotes have some genes in common ​ Some gene annotations are incorrect ​ An organism's genome is defined as its complete set of genetic material (all chromosomes, all plasmids, all everything) ​ Genomics = the study of the complete genome of an organism ​ Comparative genomics is the comparison of full genome sequences of two or more organisms ​ Analysis of genomes includes sequence, assemble, annotation, analyze, compare ​ Sequencing methods ○​ Sanger sequencing (since 1977) = one sequencing reaction at a time; very high accuracy, low throughput up to 1 kb reads ○​ Next generation sequencing (NGS) ​ Illumina (short-read) sequencing (since 2006) = multiple sequencing reactions in parallel; high accuracy, high throughput, ~200 bp reads ​ Similar to sanger but there are many reactions happening in parallel ​ Reads are much shorter than Sanger ​ Third generation/long read sequencing (since 2009) = variety of methods with much lower accuracy, high throughput, >10 kb reads ​ Longer reads but tend to be less accurate ​ Sanger is sequencing by synthesis has a huge limitation with one sequencing reaction at a time ○​ Must have DNA to sequence and primer ○​ DNA polymerase makes new DNA that enables sequence reading ○​ Have to have substrates ○​ Less dideoxyNTPS; dideoxyNTPS dont have 3’ hydroxyl group that catalyzes reaction to add new nucleotide so its added to stop reaction ○​ Pieces of DNA stopped at each point of sequence ○​ Produces fluorescence analog of nucs where you can read the color to find the nucleotide bases ​ Next-generation sequencing has multiple platforms in use ○​ Illumina sequencing ○​ Pacbio sequencing ○​ Nanopore ​ Illumina: sequencing by synthesis with multiple sequencing reactions at a time and has 6 steps (may be called short read sequencing) ○​ Purify DNA ○​ Shear to a specific size (e.g. 250 bp) ○​ Ligate adapters ○​ Attach DNA to primer-coated surface ○​ Amplify DNA ○​ Sequence from either one direction (single read) or both directions (paired ends) ​ Long Read Sequencing ○​ Pacbio ○​ Nanopore ○​ Reads are multiple kbp that you dont need to piece together but theyre not too accurate ​ Features of illumina ○​ Attach DNA to flow cell ○​ Bridge amplification ○​ Cluster generation ○​ Clonal single molecule array ○​ Sequencing by synthesis ○​ Base calling from images over time ​ Sanger method = sequencing by synthesis with one sequence per reaction; overall pricy, slow but accurate ○​ Read length of 750 bp ○​ 10^5 base pairs per run ○​ Mostly substitution error of 0.001% ○​ 4 dollars per reaction (pricey) ​ Illumina method: sequencing by synthesis that is “massively parallel”; cheap and somewhat accurate ○​ Read length 75-300 bp ○​ 10^9 basepairs per run ○​ 10 kbp theoretically infinite base pairs per run ○​

Use Quizgecko on...
Browser
Browser