🎧 New: AI-Generated Podcasts Turn your study notes into engaging audio conversations. Learn more

2023_Fall_Zubaer_L7_DNA_part1.pdf

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Document Details

ArticulateBowenite6305

Uploaded by ArticulateBowenite6305

University of Manitoba

2023

Tags

molecular biology computational biology genomics

Full Transcript

Computational Molecular Microbiology (MBIO 4700) ABDULLAH ZUBAER UNIVERSITY OF MANITOBA Working with sequences • Sequence databases • Sequence comparison • Pairwise alignment • Multiple sequence alignment • Phylogenetic tree 2 Sequence databases in NCBI ➢ GenBank ➢ Nucleotide ➢ Genome ➢ Sequen...

Computational Molecular Microbiology (MBIO 4700) ABDULLAH ZUBAER UNIVERSITY OF MANITOBA Working with sequences • Sequence databases • Sequence comparison • Pairwise alignment • Multiple sequence alignment • Phylogenetic tree 2 Sequence databases in NCBI ➢ GenBank ➢ Nucleotide ➢ Genome ➢ Sequence Reads Archive (SRA) ➢ Protein database ➢ Non-redundant (nr) database Let’s explore with Saccharomyces cerevisiae, and its mitochondrial genome. Use accession number MW122508.1 to explore GenBank. 3 BLAST 4 BLAST • BLAST is a family of programs that allow you to input a query sequence and compare sequences in a database • Used for DNA/RNA and protein sequences • Purpose of using BLAST: • Determine homologs, paralogs etc. • Identify a species • Discover a new gene • Determining gene variants 5 BLAST usage ▪ Step 1: Select a BLAST program – (e.g., blastn, blastp etc.) ▪ Step 2: Input of query sequence – NCBI accession or FASTA sequence ▪ Step 3: Select a database – nr database (consists of the nucleotide sequences from GenBank, EMBL, DDBJ, PDB, and RefSeq) ▪ Step 4: Set algorithm parameters – can be kept default ▪ Step 5: Set scoring matrix – (e.g., for blastp, use BLOSUM62) 6 BLAST principle ▪ Query (unknown or “your sequence”) is broken down into a library of short sequences (it is easier to find short matches) – thereafter this collection of short sequences is used to search through a database. 7 Practice BLAST o Use the sequence in the “practice_BLAST” file as a query o Find out: o What does this sequence encode? o What is its most likely origin? o What are the similar sequences? 8 BLAST output ▪ Coverage, identities, bits score, E-value ▪ -E (mathematical expectance) ~ probability this match is occurring by chance only ▪ The lower the E value the more significant the “hit” is 9 ORF finder 10 Open Reading Frame (ORF) • Functional ORF: an open reading frame that encodes a protein • Computer algorithms used to search for ORFs Look for start/stop codons [and Shine–Dalgarno sequences (ribosome binding sites, RBS)] • ORFs can be compared to ORFs in other genomes 11 Coding Sequence (CDS) is the actual region of DNA that is translated to make proteins. ORF – from start to stop codon - may contain introns (frequently for eukaryotic ORFs, rare for prokaryotic ORFs) ATG TAA = ORF Introns (in Eukaryotes – spliceosomal introns) In Archaea or Bacteria, we sometimes find Group I or group II introns (“self-splicing”) ATG TAA = CDS Can have alternate start codons – BUT these usually still “code” for Methionine! 12 Practice ORF finder o Use the sequence in the “practice_BLAST” file as a query o Find out: o Is there any Open Reading Frames (ORFs) in it? 13 ORF finder output Var1 = rps3 (ribosomal protein S3) ONE ORF but two functions are encoded. Ribosomal protein and Endonuclease activity (fusion ORF or hybrid ORF or fused gene) 14

Use Quizgecko on...
Browser
Browser