Lecture 10 Students Assignments BIO454 PDF

Summary

This document details topics on biological sequence analysis, including concepts like identity, score calculations, the concept of E-value and P-value, phylogenetic tree, and the BLAST tool. It's presentation material focused on undergraduate biological study.

Full Transcript

LECTURE 10 STUDENTS ASSIGNMENTS BIO454 OUTLINE Identity. Score. E-value and P-value Phylogenetic tree. Threshold. BLAST IDENTITY Sequence identity is the amount of characters which match exactly between two different sequences. Example : if we have two sequences each...

LECTURE 10 STUDENTS ASSIGNMENTS BIO454 OUTLINE Identity. Score. E-value and P-value Phylogenetic tree. Threshold. BLAST IDENTITY Sequence identity is the amount of characters which match exactly between two different sequences. Example : if we have two sequences each one have 720 nucleotides, identity will be 90% if ? identity will be 90% when 648 nucleotides are similar between the two sequences (648/720) When identity is 100 %, gap value will be ? When identity equal 100% gaps will equal zero. SCORE a score is a numerical value that describes the overall quality of an alignment. Higher numbers correspond to higher similarity. The score scale depends on the scoring system used (substitution, gap). Substitution : The presence of a non-identical nucleotide or amino acid at a given position in an alignment. Gap : A space introduced into an alignment to compensate for insertions and deletions in one sequence relative to another. The Maximum Score: we can get when we compare a sequence with itself. ? HOW TO CALCULATE SCORE The score of an alignment, calculated as the sum of substitution and gap scores. Substitution scores are given by a look-up table ? HOW TO CALCULATE SCORE Gap scores are typically calculated as the sum of : G, the gap opening penalty L, the gap extension penalty. For a gap of length n, the gap cost would be G+Ln. We give a high value for G (10-15) and a low value for L (1-2). gap opening gap extension HOW TO CALCULATE SCORE ? EXAMPLE E-VALUE E-value (short for expect value) is a calculation of the number of sequences in the database that are expected, by chance in a random search, to align equally or more significantly to the query than the hit that was found. The expect value (E-value) can be changed in order to limit the number of hits to the most significant ones. The lower the E-value, the better the hit. The E-value is dependent on the length of the query sequence and the size of the database. For example, an alignment obtaining an E-value of 0.05 means that there is a 5 in 100 chance of occurring by chance alone. Large E-value: many hits, partly of low quality. P-VALUE The probability of a chance alignment occurring with a particular score or a better score in a database search. The p value is calculated by relating the observed alignment score, S, to the expected distribution scores from comparisons of random sequences of the same length and composition as the query to the database. The most highly significant P values will be those close to 0. P values and E values are different ways of representing the significance of the alignment P-VALUE AND E-VALUE What is the difference between E-value and P-value ? E-value is a frequency metric, whereas p-value is a probability metric e-value represents the number of better alignments that are expected to occur by chance, while p- value represents the likelihood that the match in question occurred by chance. In statistical terms: the e-value is a multiple testing correction of the p-value. :THRESHOLD The percentage of the threshold determine to what extent we want the similarity when the DotMatcher plots the two sequences BLAST : BASIC LOCAL ALIGNMENT SEARCH TOOL What is BLAST ? is an algorithm for comparing primary biological sequence information, such as : The amino-acid sequences of proteins The nucleotides of DNA and/or RNA sequences BLAST score : a score is a numerical value that describes the overall quality of an alignment. Higher numbers correspond to higher similarity. The score scale depends on the scoring system used (substitution, gap). PHYLOGENETIC TREE What phylogenetic tree ? A phylogenetic tree is a diagram (called Dendrogram )that represents evolutionary relationships among organisms. Based upon similarities and differences in their physical or genetic characteristics. Each node with descendants represents the inferred most recent common ancestor of the descendants In trees, two species are more related if they have a more recent common ancestor and less related if they have a less recent common ancestor WHY MIGHT WE CARE ABOUT PHYLOGENETIC TREE? !THE PURPOSES Understanding human origins. Understanding biogeography, e.g. what’s the relative importance of dispersal versus vicariance? Learning about the tempo of evolution, e.g. was the Cambrian explosion really an explosion? Did mammals and birds wait until dinosaurs went extinct to inherit the earth or were they already started before the asteroid hit? Understanding the origin of particular traits. Understanding the processes of molecular evolution. Origin of disease, e.g. where did humans get AIDs from? PHYLOGENETIC TREE, FULL EXAMPLE …PHYLOGENETIC TREE, FULL EXAMPLE …PHYLOGENETIC TREE, FULL EXAMPLE …PHYLOGENETIC TREE, FULL EXAMPLE …PHYLOGENETIC TREE, FULL EXAMPLE …PHYLOGENETIC TREE, FULL EXAMPLE There are only two clusters. so this completes the calculation!

Use Quizgecko on...
Browser
Browser