Bioinformatics Sequence Alignment PDF
Document Details
Uploaded by FastestGrowingRoseQuartz2898
Biotechnology
Tags
Summary
This document provides an overview of sequence alignment methods, including global and local alignment. It also explains the various applications of these techniques in bioinformatics including gene finding, function prediction and protein secondary structure prediction.
Full Transcript
SEQUENCE ALIGNMENT Session 2 BIOINFORMATICS 1 Sequence alignment Is the process of comparing two or more sequences by looking for a series of individual characters or character patterns that are in the same order in the sequences to identify regions of similarity that may be a co...
SEQUENCE ALIGNMENT Session 2 BIOINFORMATICS 1 Sequence alignment Is the process of comparing two or more sequences by looking for a series of individual characters or character patterns that are in the same order in the sequences to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships among the sequences. BIOINFORMATICS 2 Applications 1. Identifying Species You can identify a species and find homologous species. This can be useful, for example, when one is working with a DNA sequence from an unknown species or source. 2. Establishing Phylogeny One can create a phylogenetic tree to determine the evolutionary relationship between different sequences. 3. Compare two or more sequences looking for Mutations and Variations. BIOINFORMATICS 3 4. Gene finding When we identify all ORFs of a genome we can use their lengths to estimate the probability of being protein-coding genes. If there is a gene from a reference genome or another organism with a good alignment with the ORF found, there is strong evidence that this ORF is a gene. 5. Function prediction The function of a gene can be predicted based on the similarity among sequences. Similar sequences may have similar functions. 6. Predicting protein secondary structure To avoid time and effort-consuming procedures, like X-ray crystallography and NMR, there is much interest in trying to predict protein function given its sequence. BIOINFORMATICS 4 Types of Sequence Alignment Global Sequence Alignment Local sequence alignment ✓ Alignment of each character (nucleotide or amino acid ✓ Aligning the regions of sequences with the highest sequence) over the entire length of the sequence from density of matches (similarity) with ignoring the rest of end to end. the sequence. In other words, “it finds local regions with the highest level of similarity between the two ✓ Very useful in analyzing minor differences or sequences”. polymorphisms like (SNPs). ✓ Can be used with dissimilar sequences that may have ✓ Needleman–Wunsch algorithm regions of similarity like motifs or domains). ✓ Smith–Waterman algorithm BIOINFORMATICS 5 Global alignment: more amino acids along the protein sequence can be matched. Local alignment: stops at the end of the regions of strong similarity or identity. BIOINFORMATICS 6 BIOINFORMATICS 7 Pairwise Alignment Vs Multiple sequence Alignment(MSA) Pairwise alignment Multiple sequence alignment ✓ Comparing two biological sequences. ✓ Comparing three or more biological sequences. BIOINFORMATICS 8 Methods of sequence comparisons Dot-matrix Diagrams provide a graphical method for comparing two sequences, one sequence is written horizontally across the top of the graph and the other along the left-hand side. Dots are placed within the graph at the intersection of the same letter appearing in both sequences. A series of diagonal lines in the graph indicate regions of the alignment. BIOINFORMATICS 9 Optimal alignment Such alignment can be represented by writing the sequence on lines across the page, with matching characters placed in the same column and unmatched characters placed in the same column as a mismatch or next to a gap as an insertion (or deletion in the other sequence). Gap A mismatch in the alignment of two sequences caused by either an insertion in one sequence or a deletion in the other. BIOINFORMATICS 10 BLAST (Basic Local Alignment Search Tool) It is a sequence similarity search program for comparing biological sequences such as an amino acid sequence of different proteins or the nucleotides (DNA, RNA) sequences to sequence databases. BIOINFORMATICS 11 Find the appropriate BLAST program BIOINFORMATICS 12 Find the appropriate BLAST program BIOINFORMATICS 13 Retrieving a nucleotide or a protein sequence from NCBI Query sequence A DNA or protein sequence is submitted to a computerized database for alignment. BIOINFORMATICS 14 BIOINFORMATICS 15 BIOINFORMATICS 16 BIOINFORMATICS 17 BLAST Databases content Non-redundant databases Databases that list only a single representative of identical sequences, consequently, decrease the number of matches. BIOINFORMATICS 18 For specifying your alignment by using search fields like protease NOT hiv1[organism], 1000:10000[slen],…… BIOINFORMATICS 19 Nucleotide BLAST How does the algorithm work? Protein BLAST BIOINFORMATICS 20 BIOINFORMATICS 21 Results BIOINFORMATICS 22 BIOINFORMATICS 23 Alignment score An algorithmically computed score based on the number of matches, substitutions, insertions, and deletions(gaps) within an alignment. “These scores derived from the scoring matrices”. Max score The highest alignment score between the query sequence and the database sequence segment. Total score The sum of alignment scores for all the sequence segments or local alignments. The higher the score, the better the alignment. When max and total scores are the same, there is one global alignment between the query and its match in the database. Query coverage The percent of the query length that is included in the aligned segments. The Expect value (E) Is a parameter that describes the number of hits one can "expect" to see by chance when searching a database of a particular size. It decreases exponentially as the Score (S) of the match increases. BIOINFORMATICS 24 Results Nucleotide BLAST BIOINFORMATICS 25 Protein BLAST BIOINFORMATICS 26 BIOINFORMATICS 27 Nucleotide BLAST Gap Mismatch BIOINFORMATICS 28 Protein BLAST BIOINFORMATICS 29 Clustal Omega BIOINFORMATICS 30 1 2 At least 2 sequences 3 4 BIOINFORMATICS 31 Retrieving multiple sequences for alignment BIOINFORMATICS 32 The FASTA (text) BIOINFORMATICS 33