2023_Fall_Zubaer_L7_DNA_part1.pdf
Document Details
Uploaded by ArticulateBowenite6305
University of Manitoba
2023
Tags
Full Transcript
Computational Molecular Microbiology (MBIO 4700) ABDULLAH ZUBAER UNIVERSITY OF MANITOBA Working with sequences • Sequence databases • Sequence comparison • Pairwise alignment • Multiple sequence alignment • Phylogenetic tree 2 Sequence databases in NCBI ➢ GenBank ➢ Nucleotide ➢ Genome ➢ Sequen...
Computational Molecular Microbiology (MBIO 4700) ABDULLAH ZUBAER UNIVERSITY OF MANITOBA Working with sequences • Sequence databases • Sequence comparison • Pairwise alignment • Multiple sequence alignment • Phylogenetic tree 2 Sequence databases in NCBI ➢ GenBank ➢ Nucleotide ➢ Genome ➢ Sequence Reads Archive (SRA) ➢ Protein database ➢ Non-redundant (nr) database Let’s explore with Saccharomyces cerevisiae, and its mitochondrial genome. Use accession number MW122508.1 to explore GenBank. 3 BLAST 4 BLAST • BLAST is a family of programs that allow you to input a query sequence and compare sequences in a database • Used for DNA/RNA and protein sequences • Purpose of using BLAST: • Determine homologs, paralogs etc. • Identify a species • Discover a new gene • Determining gene variants 5 BLAST usage ▪ Step 1: Select a BLAST program – (e.g., blastn, blastp etc.) ▪ Step 2: Input of query sequence – NCBI accession or FASTA sequence ▪ Step 3: Select a database – nr database (consists of the nucleotide sequences from GenBank, EMBL, DDBJ, PDB, and RefSeq) ▪ Step 4: Set algorithm parameters – can be kept default ▪ Step 5: Set scoring matrix – (e.g., for blastp, use BLOSUM62) 6 BLAST principle ▪ Query (unknown or “your sequence”) is broken down into a library of short sequences (it is easier to find short matches) – thereafter this collection of short sequences is used to search through a database. 7 Practice BLAST o Use the sequence in the “practice_BLAST” file as a query o Find out: o What does this sequence encode? o What is its most likely origin? o What are the similar sequences? 8 BLAST output ▪ Coverage, identities, bits score, E-value ▪ -E (mathematical expectance) ~ probability this match is occurring by chance only ▪ The lower the E value the more significant the “hit” is 9 ORF finder 10 Open Reading Frame (ORF) • Functional ORF: an open reading frame that encodes a protein • Computer algorithms used to search for ORFs Look for start/stop codons [and Shine–Dalgarno sequences (ribosome binding sites, RBS)] • ORFs can be compared to ORFs in other genomes 11 Coding Sequence (CDS) is the actual region of DNA that is translated to make proteins. ORF – from start to stop codon - may contain introns (frequently for eukaryotic ORFs, rare for prokaryotic ORFs) ATG TAA = ORF Introns (in Eukaryotes – spliceosomal introns) In Archaea or Bacteria, we sometimes find Group I or group II introns (“self-splicing”) ATG TAA = CDS Can have alternate start codons – BUT these usually still “code” for Methionine! 12 Practice ORF finder o Use the sequence in the “practice_BLAST” file as a query o Find out: o Is there any Open Reading Frames (ORFs) in it? 13 ORF finder output Var1 = rps3 (ribosomal protein S3) ONE ORF but two functions are encoded. Ribosomal protein and Endonuclease activity (fusion ORF or hybrid ORF or fused gene) 14