🎧 New: AI-Generated Podcasts Turn your study notes into engaging audio conversations. Learn more

Lecture 4- Alignments.pdf

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Full Transcript

Lecture 4:Alignments BIOC 3265-Principles of Bioinformatics Dr. A. T Alleyne- UWI Cave1 Hill LEARNING OUTCOMES At the end of this lecture, you should be able to: 1. Distinguish between a local and global alignment event 2. Construct a pair wise alignme...

Lecture 4:Alignments BIOC 3265-Principles of Bioinformatics Dr. A. T Alleyne- UWI Cave1 Hill LEARNING OUTCOMES At the end of this lecture, you should be able to: 1. Distinguish between a local and global alignment event 2. Construct a pair wise alignment given two strings of sequences 3. Distinguish between a pairwise and a multiple sequence alignment 4. Distinguish among homology, similarity and identity 5. Distinguish between an Ortholog and a Paralog 6. Critically assess the importance of sequence alignments in Bioinformatics 2 Sequence Alignment The process of comparing two or more DNA, or protein sequences using their individual characters to determine their similarity or dissimilarity. 3 Comparisons are made using; 1. The order of the sequences, or 2. sequence patterns 4 Computing a Input: two sequences Sequence using the same alphabet code alignment Output: an alignment or match of the two sequences 5 Sequence 1: GCGCATGGATTGAGCGA Example: Input Sequence 2: TGCGCCATTGATGACCA Example:- Seq1: -GCGC-ATGGATTGAGCGA Output Seq2: TGCGCCATTGAT-GACC-A Alignments Have three main components: 1. Perfect matches (A/A) 2. Mismatches (A/G) 3. Gaps -Insertions or deletions (indel) ( ) -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A 1. Black letters = perfect matches 2. Red letters = mismatches 3. Dash or space = indel 8 Positions at which a letter is paired with a dash are called gaps. Gaps may occur in the middle or at the ends of an alignment Gaps Gap scores are usually negative. Gaps are described as single mutational events caused by insertion or deletion of more than one residue. When scoring an alignment, a gap’s presence is given more significance than the length of the gap. 9 10 This Photo by Unknown Author is licensed under CC BY-SA 11 Local Alignments The best or exact matching segments between two sequences They form the basis of the database searching e.g. BLAST ----------FGKI---------- ||| ----------FGKP---------- Aligning the most similar stretches within two Image taken from Applied Bioinformatics Paul M. Selzer, Richard J. Marhöfer, and Andreas Rohwer 2019. sequences – a local alignment makes it possible to identify protein domains and motifs (e.g., ATP binding sites, DNA binding domains, N-glycosylation sites). 12 Global Alignment The best match over the total length of two sequences of similar length. Allows for the introduction of GAPS LGPSTKDFGKISESREFDN | | ||| | LNQSERSFGKINMRLEDA- Image taken from Applied Bioinformatics Paul M. Selzer, Richard J. Marhöfer, and Andreas Rohwer 2019 13 Pair-wise alignment Pairwise A character match between the elements in each sequence alignment between any two sequences Pairwise process of lining up two (DNA or protein) sequences to achieve alignment maximum levels of identity 14 Decide if two genes or proteins are structurally or functionally related Pair-wise Identify shared domains or alignment: motifs between proteins Uses The basis of BLAST searching in the analysis of genomes 15 Aligned Alignment Mismatches, indels nucleotides therefore reflect evolutionary have a common suggest an events e.g. Genetic ancestor evolutionary changes process Alignment assumptions A good alignment matches as many nucleotide positions as possible 16 Similarity The extent to which nucleotide or protein sequences are related after alignment. It is based upon identity and conservation. The extent to which two sequences are invariant or Identity unchanged. Identity is quantitative. Changes at a specific position of an amino acid or Conservation (less commonly, DNA) sequence that preserve the physical/chemical properties of the original residue. 17 A1 A2 A3 A4 A1 0.32 0.15 0.08 Similarity matrices specify 1.0 the probability that a A2 0.32 0.36 0.18 sequence will transform into 1.0 another sequence over time. A3 0.15 1.0 0.28 1.0 A4 0.08 0.18 0.28 1.0 18 Identity A quantitative measure: - The extent to which two sequences (nucleotide or amino acid) are identical at each position. - Identity is the ratio of the number of identical amino acids or nucleotides in a sequence to the total number of amino acids or nucleotides. - The matches below can be quantified or given as a percentage: human beta globin and human neuroglobin share 22% identity but they are homologous proteins RBP: 26 RVKENFDKARFSGTWYAMAKKDPEGLFLQDNIVA 59 + K++ + ++ GTW++MA + L + A glycodelin: 23 QTKQDLELPKLAGTWHSMAMA-TNNISLMATLKA 55 19 Sequences of two homologous proteins can have a similarity of 60% and an identity of 40%. They do not show a 60% or 40% homology Homologous proteins from different species that possess the same function (e.g., corresponding kinases in a signal transduction pathway in humans and mice) are called orthologs. In contrast, homologous proteins that have different functions in the same species (e.g., two kinases in different signal transduction pathways of humans) are termed paralogs. 20 Homology Descent from a common ancestor; evolutionary inferences. Homology is not a measure of similarity. Homologs in different species descended from a common ancestral gene eg. Human Ortholog myoglobin and rat myoglobin genes May have a similar biological functions, but may differ from ancestral gene Homologs within the same species that arose by gene duplication eg; Human α 1 and α 2 genes Paralog globin genes Gene function is distinct but related in these gene pairs 21 Globin Gene homologies 22 Member of a protein family in a species = paralog Taken from : http://images.slideplayer.com/13/4168936/slides/slide_10.jpg 23 Orthologs: members of a gene (protein) family in various organisms. This Photo by Unknown Author is licensed under CC BY-SA 24 Multiple sequence alignment (MSA) A collection of three or more protein (or nucleic acid) sequences that are partially or completely aligned ̶ homologous residues are aligned in columns across the length of the sequences ̶ residues are homologous in an evolutionary sense or structural sense 25 Multiple sequence alignment of human lipocalin - paralogs ~~~~~EIQDVSGTWYAMTVDREFPEMNLESVTPMTLTTL.GGNLEAKVTM lipocalin 1 LSFTLEEEDITGTWYAMVVDKDFPEDRRRKVSPVKVTALGGGNLEATFTF odorant-binding protein 2a TKQDLELPKLAGTWHSMAMATNNISLMATLKAPLRVHITSEDNLEIVLHR progestagen-assoc. endo. VQENFDVNKYLGRWYEIEKIPTTFENGRCIQANYSLMENGNQELRADGTV apolipoprotein D VKENFDKARFSGTWYAMAKDPEGLFLQDNIVAEFSVDETGNWDVCADGTF retinol-binding protein LQQNFQDNQFQGKWYVVGLAGNAI.LREDKDPQKMYATIDKSYNVTSVLF neutrophil gelatinase-ass. VQPNFQQDKFLGRWFSAGLASNSSWLREKKAALSMCKSVDGGLNLTSTFL prostaglandin D2 synthase VQENFNISRIYGKWYNLAIGSTCPWMDRMTVSTLVLGEGEAEISMTSTRW alpha-1-microglobulin PKANFDAQQFAGTWLLVAVGSACRFLQRAEATTLHVAPQGSTFRKLD... complement component 8 This Photo by Unknown Author is licensed under CC BY-SA 26 DNA Alignments: Many Uses Determines structure, function or evolutionary patterns of genes and proteins: retinol-binding protein 4 ̶ Gene finding in a database (NP_006735 ̶ Prediction of gene function ̶ Genome assembly ̶ Identify cDNA ̶ Study coding regions or non-coding regions -lactoglobulin ̶ Find conserved regions between (P02754 organisms ̶ Finds mutations or polymorphisms 27 between two strings of sequences Measures sequence Observes degree of conservation Sequence similarity or variation among or between sequenced organisms Alignment significance Allows for genome Infers evolutionary annotation in databanks relationships 28 Example: A Sequence Homology and Bioinformatic Approach Can Predict Candidate Targets for Immune Responses to SARS-CoV-2 (readcube.com) Both homology and prediction models independently identify the same regions as promising targets for immune recognition of SARS-CoV-2. 29 30 A Sequence Homology and Bioinformatic Approach Can Predict Candidate Targets for Immune Responses to SARS-CoV-2 (readcube.com) *Analysis showed that certain SARS-CoV regions were dominant for B cell responses and that those regions were well conserved in terms of sequence with SARS-CoV-2. 31 2001 Per Kraulis Some definitions for sequence alignments Fundamental concepts of Bioinformatics, Krane, D. E. and References: Raymer, M. L (2003) Benjamin Cummings, CA. USA Bioinformatics and functional Genomics 3rd ed. (2015), Johnathan Pevsner. Wiley Blackwell 32

Use Quizgecko on...
Browser
Browser