Lecture 3 ORF Analysis - Dr. Asmaa Atef Ibrahim, PDF

Document Details

ProdigiousHorseChestnut

Uploaded by ProdigiousHorseChestnut

Faculty of Pharmacy, Ain Shams University

Asmaa Atef Ibrahim

Tags

ORF analysis gene prediction reading frames molecular biology

Summary

This presentation discusses ORF (Open Reading Frame) analysis and computational gene prediction. It also covers the concept of reading frames in nucleic acids.

Full Transcript

Lecture 3 ORF ANALYSIS Asmaa Atef Ibrahim, PhD Assistant Professor of Microbiology and Immunology Faculty of Pharmacy, ASU [email protected] INTRODUCTION Once the genome is sequenced and assembled, the first step is to locate all the protein– co...

Lecture 3 ORF ANALYSIS Asmaa Atef Ibrahim, PhD Assistant Professor of Microbiology and Immunology Faculty of Pharmacy, ASU [email protected] INTRODUCTION Once the genome is sequenced and assembled, the first step is to locate all the protein– coding genes within the genome. This helps in understanding the functional content of the genome. Computational gene prediction aims to predict all the genes with near 100% accuracy and thereby could reduce the amount of the experimental work required. A Sequence by itself is raw data and doesn't mean anything unless annotated. Annotation means adding specific structural or functional information about specific regions of sequences or genomes. READING FRAMES ▪ A reading frame is one of the possible ways of reading a nucleotide sequence (Imagine DNA as a sentence. There are six possible ways to read it). ▪ The sequence of nucleotide in nucleic acid (DNA or RNA) molecule is divided into a set of consecutive, nonoverlapping triplets which equate to amino acids or stop signals during translation, called codons. ▪ Insilico prediction of genes involves the determination of the correct Open Reading Frame (ORF) that starts with a start codon (ATG) and ends with one of these stop codons (TAG ,TAA ,TGA ) in most species. READING FRAMES Each helix has 3 possible configurations depending on where coding is started from. This is due to a codon being 3 nucleotides in length, there are 6 possible reading frames for DNA (double-helix) and 3 for RNA (single-helix). *Any DNA sequence has six possible reading frames: three on the forward strand (plus (+) strand) three on the reverse strand (minus (-) strand) 8 * OPEN READING FRAME (ORF) *The open reading frame (ORF) is the continuous nucleotide sequence that begins with a start codon (ATG), followed by a number of codons, and ends with a stop codon (TAG ,TAA ,TGA ). No stop codon interrupts the ORF in between. ORFs are like secret passages in our DNA. *They’re stretches of nucleotides that can be The ORF is presumed to carry the coding sequence of a gene. translated into proteins. ORFs: These are stretches of DNA that could potentially be translated into proteins. ORFs Functional Clues: Long ORFs help predict protein- coding regions, but their presence doesn’t guarantee translation. *CDS (Coding Sequence) and ORF (Open Reading Frame) CDS (Coding Sequence) ORF (Open Reading Frame) CDS is the real part of the gene ORF is the segment of DNA between the translating into a protein start codon and the stop codon It gets translated into a protein Not all ORFs get translated into proteins **All CDS are ORFs, but not all ORFs are CDS. CDS are the “official” recipes that code for proteins. If the sequence has more than one ORF, these ORFs have to be on the same strand (same direction) to be an operon. Must have the same regulatory set (promotor). Frameshift Error/Mutation A frameshift mutation refers to the insertion or deletion of nucleotide bases in numbers that are not multiples of three*. This shift causes the codons to be read incorrectly, resulting in the gene coding for completely different amino acids. There is some resistance to this error – the mutated gene may still code for a valid protein, a mutation may increase the odds of a stop codon occurring early, or the gene may even still code for the correct protein due to the fact that more than one codon may correspond to an amino acid. *ORF ANALYSIS FramePlot ORFfinder 2.3.2 ORFfinder *It is a graphical analysis tool which finds all open reading frames for a user provided sequence or in a sequence already found on a database. available at NCBI. In each of the upcoming examples you are provided with an accession number and we will need to find the following: Q1-find the number of ORFs for this accession code? Q2- ORFs present on which strand? And which frame? Q3- What is the number of nucleotides and amino acids present in these ORFs? Q4-Using the blast search tool, what is the putative function of these ORFs? N.B. Strand → (plus strand or minus strand) Frame → (+1, +2, +3, -1, -2, -3) Example 1 Using ORFfinder Using ORFfinder, predict ORFs for the following accession number: NM_001185098.2 Q1-find the number of ORFs for this accession code? Q2- ORFs present on which strand? And which frame? Q3- What is the number of nucleotides and amino acids present in these ORFs? Q4-Using the blast search tool, what is the putative function of these ORFs? ▪ Go to ORFfinder by using: Google search Or NCBI Resources ▪ Google search: ▪ NCBI: 2 Scroll down to the letter “O” 1- Click on “Resource List” You can insert accession code or nucleotide sequence 1-Insert accession code 2-select minimal length of the predicted ORF (*300 in case of Homo sapiens) 3- Genetic code: select standard in 4- ATG only case of eukaryotes 5- This excludes shorter ORFs that are non-coding for protein 6- Submit No. of ORFs found based on selected search parameters No. of amino acids in the selected sequence = 116 aa You can display ORF as: Protein sequence Selected ORF is on the + strand Nucleotide sequence Starting from nucleotide no. 28 to nucleotide no. 378 CDS translation No. of nucleotides (nt) = 351 No. of amino acids (aa) = 116 Protein sequence Nucleotide sequence CDS translation Answers to Example 2 Q1-find the number of ORFs for this accession code? One ORF Q2- ORFs present on which strand? And which frame? ORF present on plus strand, the +2 reading frame. Q3- What is the number of nucleotides and amino acids present in these ORFs? 333 bp. 110 amino acids in this ORF. Q4-Using the blast search tool, what is the putative function of these ORFs? Putative function: insulin B chain Example 2 Q1- In the shown ORFfinder report, how many ORFs were found? Q2- ORF2 is present on which strand? And which frame? Q3- What is the length of ORF 1? Q4- Might these ORFs be a part of an operon? Why? Q5- arrange the detected ORFs ascendingly based on their lengths. Answers to Example 2 Q1- In the shown ORFfinder report, how many ORFs were found? 3 ORFs Q2- ORF2 is present on which strand? And which frame? Minus strand, -2 frame Q3- What is the length of ORF 1? 1443 bp, 480 amino acids Q4- Might these ORFs be a part of an operon? Why? Yes, because they are on the same strand (i.e. same direction) Q5- arrange the detected ORFs ascendingly based on their lengths. ORF3< ORF1< ORF2 FramePlot 2.3.2 *A web-based tool for predicting protein-coding regions in DNA with a high G+C content. The graphical output provides for easy distinction of protein-coding regions from non-coding regions. Example 3 Using FramePlot 2.3.2 Predict ORFs for the following accession number: NM_001185098.2 Q1-find the number of ORFs for this accession code? Q2- ORFs present on which strand? And which frame? Q3- What is the number of nucleotides and amino acids present in these ORFs? Q4-Using the blast search tool, what is the putative function of these ORFs? 1- Paste your FASTA sequence here * 3- Click on Cookin’ 2- Select desired image color: “Color” or “Black & white” 6 possible reading frames 3 frames* on + strand* (+1, +2, +3) Notice direction of arrows 🡪 3 frames on – strand (-1, -2, -3) Notice direction of arrows 🡪 Highest GC% ▪ *Rule 1: the coding ORF has the highest (GC%) ▪ *Rule 2 (M-L-H) medium – low – high GC%: To determine whether the coding ORF is on + or - strand. ▪ Rule 1: the coding ORF has the highest (GC%) → in this graph, highest GC% is represented by the dashed line, Therefore, the coding ORF is either on reading frame +2 or reading frame -3. But Which one is it? This takes us to Rule 2 ▪ Rule 2 (M-L-H) medium-low-high GC%. To determine whether the coding ORF is on + or - strand. → M-L-H in the graph is Highest GC% M (dotted line) L (solid line) H (dashed line) On + strand: dotted is followed by solid, then dashed On – strand: dotted is followed by dashed, then solid Click on the ORF line in The correct sequence is on the + strand the area after the first Therefore, the coding ORF is on the +2 reading frame start codon to view nucleotide and amino acid sequences Answers to Example 5 Q1-find the number of ORFs for this accession code? One ORF Q2- ORFs present on which strand? And which frame? Plus strand, +2 frame Q3- What is the number of nucleotides and amino acids present in these ORFs? 333 bp, 110 amino acids Q4-Using the blast search tool, what is the putative function of these ORFs? Putative function: insulin isoform The coding ORF is on + strand, +3 frame Highest GC% Black & white Color Stop codon is represented by | symbol The coding ORF is on + strand, +3 frame Start codon is represented by > Click on the ORF line in symbol the area after the first start codon to view nucleotide and amino acid sequences How many coding ORFs? 3 Can you predict the coding ORFs? Apply the Rules Rule 1: the coding ORF has the highest (GC%) rule 2 (M-L-H) medium-low- ORF 1 ORF 2 ORF 3 high GC%. To determine whether ORF is on + or - strand. THANK YOU

Use Quizgecko on...
Browser
Browser