Lecture 3. Sequence Comparison and Alignment PDF

Document Details

EnviousSanctuary

Uploaded by EnviousSanctuary

Alexandria Faculty of Medicine

2024

Iman Diab, Doaa Abdelmonsif

Tags

sequence alignment molecular biology bioinformatics DNA analysis

Summary

This document is a lecture on sequence comparison and alignment. It covers basic concepts of sequence alignment and its uses in DNA and protein analysis. The lecture also discusses sequence identity, similarity, and homology, as well as the different types of sequence alignments and algorithms for performing these types of analyses. The lecture notes are for the BME 454 course (2024-2025).

Full Transcript

BME 454 2024-2025 Sequence Alignment & Comparison Prof. Iman Diab Prof. Doaa Abdelmonsif Professor of Medical Biochemistry& Molecular Biology Alexandria Faculty of Medicine What is Sequence...

BME 454 2024-2025 Sequence Alignment & Comparison Prof. Iman Diab Prof. Doaa Abdelmonsif Professor of Medical Biochemistry& Molecular Biology Alexandria Faculty of Medicine What is Sequence Alignment? Sequence alignment means arranging two or more sequences {Nucleotides (DNA, RNA), or Amino acids (proteins)} to identify regions of similar/different character patterns Sequence similarity could be a result of structural, functional, or evolutionary relationships between the sequences Procedure involves searching for series of identical or similar characters/patterns in the same order between the sequences Definition of Sequence alignment Sequence alignment is a process that enables the comparative analysis of similarities and differences at nucleotide (gene) or amino acid (protein) level with the aim of inferring structural, functional and evolutionary relationships between/among sequences being studied. Alignments are mostly made with aid of computer programs using algorithms that assess quality of an alignment using a scoring scheme that rewards matches and punishes mismatches and gaps. DNA-sequence-1 tcctctgcctctgccatcat---caaccccaaagt |||| ||| ||||| ||||| |||||||||||| tcctgtgcatctgcaatcatgggcaaccccaaagt Alignment (matching) DNA-sequence-2 When a sequence does not have a corresponding nucleotide, a gap is placed in the alignment column for that sequence (Dash). ? ? When a sequence does not have a corresponding nucleotide, a gap is placed in the alignment column for that sequence (Dash). Uses of Sequence Alignment? Why Sequence Alignment? Uses:- 1 Useful in DNA and Protein sequence analysis for: – Predicting function of a gene or protein – Predicting molecular structure – Discovering evolutionary/phylogenetic relationships Sequences that are very alike (highly similar) probably have: – Same function (should be treated as hypothetical until experimentally tested) – Similar secondary and 3-D structure (if proteins) – Shared ancestral sequence (though not always) Why Sequence Alignment? Uses:- 2 Sequence alignment also enables the following: – Annotation (to characterize features) of new sequences – Fragment/ genome assembly- (reassemble the fragments in their correct order to reconstitute the genome or RNA sequence) – Detect gene mutations Evolutionary basis of Sequence Alignment Evolutionary basis of Sequence Alignment One goal of sequence alignment is to enable inference of homology (origin from common ancestor) through observed shared sequence similarity. Changes that occur during sequence divergence from common ancestor include: – Substitutions (Point mutation) – Deletions – Insertions Causes for sequence dis-similarity in sequence alignment: Point mutation (Substitution): a nucleotide at a certain location is replaced by another nucleotide (e.g.: ATA → AGA). It leads to mismatch in sequence alignment. Indel: insertion and/or deletion of nucleotides into genomic DNA. It leads to gaps in sequence alignment. Insertion: at a certain location one new nucleotide is inserted in between two existing nucleotides (e.g.: AA → AGA). Deletion: at a certain location one existing nucleotide is deleted (e.g.: ACTG → AC-G) Mutations: Insertions, deletions and substitutions Insertions and/or deletions are called indels Mismatches can be interpreted as point mutations and Gaps as indels (that is, insertion and/or deletion mutations) Sequence Identity& Similarity Sequence identity means the same residues being present at corresponding positions in two sequences being compared. For proteins, it means the same amino acids; for nucleic acids, it means the same bases Sequence similarity means similar residues (characters number) being present at corresponding positions in the two sequences being compared. For nucleic acids, sequence similarity and sequence identity are the same. However, for proteins, sequence similarity involves amino acids with similar physicochemical and functional properties. For example, substitution of lysine and arginine by one another will be regarded as similar substitution because both are positively charged hydrophilic amino acids. Similarity: A quantifiable property- Two sequences are similar if order of sequence characters is recognizably the same and they can be aligned. Similarity/Identity: Nucleotides Similarity/Identity: Amino Acids-1 ❖% Identity and similarity not synonymous for Amino acid sequences: Similarity/Identity: Amino Acids-1 ❖% Identity and similarity not synonymous for Amino acid sequences: Sequence Homology Sequence Homology: Sequence homology is an evolutionary term Homologous sequences (related by descent): Two or more sequences, readily aligned ,i.e. very similar such that they have a shared ancestry Sequences are called homologous if they have a common evolutionary origin—that is, if they are derived from a common ancestral sequence. So, sequences are either homologous or not homologous and there is no quantitation of homology Homologous sequences will possess similarity BUT the opposite isn’t true. Types of sequence Homology – Orthologous sequences: quite similar sequences found in different species (i.e. due to a speciation event) and carry out a similar biological function – Paralogous sequences: Sequences related through gene duplication events. Can have variable biological function within the same species – Analogous sequences: related through convergency [similar genetic changes in independent species (unrelated species) by parallel evolution] Types of Sequence Alignment Types of Sequence Alignment I. According to the algorithm of II. According to the number of alignment: sequences to be aligned: Global alignment Pairwise alignment Local alignment Multiple sequence alignment I. According to the algorithm of alignment 1. Global sequence-alignment A global sequence-alignment method aligns and compares two sequences along their entire length and comes up with the best alignment that displays the maximum number of nucleotides or amino acids aligned. A global alignment algorithm starts at the beginning of two sequences and adds gaps to each until the end of one is reached (end-to-end alignment of the sequences). The tool for global alignment is based on the Needleman- Wunsch algorithm Global alignment works the best when the sequences are similar in character and length. Because global alignment displays the best alignment between two sequences using the entire sequence, it may miss a small region of biological importance. Global Alignment Find the global best fit between two sequences Example: the sequences s = VIVALASVEGAS and t = VIVADAVIS align like: V I V A L A S V E G A S A(s,t) = | | | | | | | V I V A D A - V - - I S indels 20 I. According to the algorithm of alignment 2. Local sequence alignment Local sequence alignment is intended to find the most similar regions in two sequences being aligned. A local alignment algorithm finds the region(s) [one, or more] of highest similarity between two sequences and builds the alignment outward from this region (highest scoring). The tool for local alignment is based on Smith- Waterman algorithm Local sequence alignment If there are multiple regions of very high similarity, the same principle applies. Obviously, local alignment is useful for sequences that are not similar in character and length, yet are suspected to contain small regions of similarity, such as biologically important motifs. Global alignment: Suitable for- – Sequences that are quite similar (more closely related) – Sequences of approximately same length Global alignment made possible by including gaps either within the alignment or at the ends of the sequences Local alignment: Suitable for- – Sequences that differ in length Gaps less tolerated within local alignment II. According to the number of sequences to be aligned: II. According to the number of sequences to be aligned: 1. Pairwise Sequence Alignment: Pairwise sequence alignment maps and compares residues between two sequences Aligning two sequences has many distinct alignment options possible The overall goal is to find the alignment that provides the best (optimal) pairing between the two sequences (i.e. maximum residue/character matches, gaps inclusive) Pair-wise Sequence Alignment- Optimal Sequence alignments have to be scored to identify the best one/s among them. Scoring system can be simple match/ mismatch scheme (DNA) or for protein comparisons , use of a more sensitive scheme by substitution matrix Often there is more than one solution with the same score II. According to the number of sequences to be aligned: 2. Multiple sequence alignment: Determine the best alignment between multiple (more than two) DNA-sequences. Multiple alignment is an extension of pairwise alignment to incorporate more than two sequences into an alignment. 22 Multiple alignment 2 4 Sequence Alignment Algorithms Sequence Alignment Algorithms: An algorithm is a sequence of actions to be performed to arrive at a solution Rigorous algorithms (optimal alignments)- Dynamic programming – Needleman-Wunsch used for global alignments – Smith-Waterman for local alignments; provides one or more alignments of the sequences Heuristic algorithms (faster but only just approximate alignments…) – BLAST – FASTA

Use Quizgecko on...
Browser
Browser