Computational Molecular Microbiology (MBIO 4700) Lecture Notes PDF
Document Details
Uploaded by ArticulateBowenite6305
University of Manitoba
Abdullah Zubaer
Tags
Related
- Computational Molecular Microbiology (MBIO 4700) Lecture Notes PDF
- Computational Molecular Microbiology (MBIO 4700) Lecture Notes PDF
- Computational Molecular Microbiology (MBIO 4700) PDF
- 2023 Fall Zubaer Computational Molecular Microbiology (MBIO 4700) Lecture Notes PDF
- Computational Molecular Microbiology (MBIO 4700) Lecture Notes PDF
- Computational Molecular Microbiology (MBIO 4700) Lecture Notes PDF
Summary
These lecture notes cover Computational Molecular Microbiology (MBIO 4700) at the University of Manitoba. Topics include working with biological sequences, using NCBI databases like GenBank and BLAST, understanding Open Reading Frames (ORFs), and analyzing coding sequences (CDS). The notes are organized systematically to provide a concise breakdown of the relevant concepts.
Full Transcript
Computational Molecular Microbiology (MBIO 4700) ABDULLAH ZUBAER UNIVERSITY OF MANITOBA Working with sequences • Sequence databases • Sequence comparison • Pairwise alignment • Multiple sequence alignment • Phylogenetic tree 2 Sequence databases in NCBI ➢ GenBank ➢ Nucleotide ➢ Genome ➢ Sequen...
Computational Molecular Microbiology (MBIO 4700) ABDULLAH ZUBAER UNIVERSITY OF MANITOBA Working with sequences • Sequence databases • Sequence comparison • Pairwise alignment • Multiple sequence alignment • Phylogenetic tree 2 Sequence databases in NCBI ➢ GenBank ➢ Nucleotide ➢ Genome ➢ Sequence Reads Archive (SRA) ➢ Protein database ➢ Non-redundant (nr) database Let’s explore with Saccharomyces cerevisiae, and its mitochondrial genome. Use accession number MW122508.1 to explore GenBank. 3 BLAST 4 BLAST • BLAST is a family of programs that allow you to input a query sequence and compare sequences in a database • Used for DNA/RNA and protein sequences • Purpose of using BLAST: • Determine homologs, paralogs etc. • Identify a species • Discover a new gene • Determining gene variants 5 BLAST usage ▪ Step 1: Select a BLAST program – (e.g., blastn, blastp etc.) ▪ Step 2: Input of query sequence – NCBI accession or FASTA sequence ▪ Step 3: Select a database – nr database (consists of the nucleotide sequences from GenBank, EMBL, DDBJ, PDB, and RefSeq) ▪ Step 4: Set algorithm parameters – can be kept default ▪ Step 5: Set scoring matrix – (e.g., for blastp, use BLOSUM62) 6 BLAST principle ▪ Query (unknown or “your sequence”) is broken down into a library of short sequences (it is easier to find short matches) – thereafter this collection of short sequences is used to search through a database. 7 Practice BLAST o Use the sequence in the “practice_BLAST” file as a query o Find out: o What does this sequence encode? o What is its most likely origin? o What are the similar sequences? 8 BLAST output ▪ Coverage, identities, bits score, E-value ▪ -E (mathematical expectance) ~ probability this match is occurring by chance only ▪ The lower the E value the more significant the “hit” is 9 ORF finder 10 Open Reading Frame (ORF) • Functional ORF: an open reading frame that encodes a protein • Computer algorithms used to search for ORFs Look for start/stop codons [and Shine–Dalgarno sequences (ribosome binding sites, RBS)] • ORFs can be compared to ORFs in other genomes 11 Coding Sequence (CDS) is the actual region of DNA that is translated to make proteins. ORF – from start to stop codon - may contain introns (frequently for eukaryotic ORFs, rare for prokaryotic ORFs) ATG TAA = ORF Introns (in Eukaryotes – spliceosomal introns) In Archaea or Bacteria, we sometimes find Group I or group II introns (“self-splicing”) ATG TAA = CDS Can have alternate start codons – BUT these usually still “code” for Methionine! 12 Practice ORF finder o Use the sequence in the “practice_BLAST” file as a query o Find out: o Is there any Open Reading Frames (ORFs) in it? 13 ORF finder output Var1 = rps3 (ribosomal protein S3) ONE ORF but two functions are encoded. Ribosomal protein and Endonuclease activity (fusion ORF or hybrid ORF or fused gene) 14