Alignments and Phylogenetic Trees PDF
Document Details
Uploaded by PoignantZither
Tags
Related
- Computational Molecular Microbiology (MBIO 4700) Lecture Notes PDF
- Week 6 DNA Data Processing, Repository and Seq Alignment PDF
- Essential Bioinformatics PDF
- BIO 405 Bioinformatics Practice Questions PDF
- Lecture 3. Sequence Comparison and Alignment PDF
- Methods in Analysis of Simple Sequencing Lecture 5 PDF
Summary
This document presents an overview of sequence alignments and phylogenetic trees, outlining key concepts and algorithms. It covers different types of alignments, including global and local alignments. The discussion also includes examples and the related computational complexities of the algorithms like Needleman-Wunsch and Smith-Waterman.
Full Transcript
Alignments and Phylogenetic Trees Chapter 4: 123-153 1 Topics Sequence alignment overview Needleman-Wunsch (skip forward based on HW timing) Recursion Come back for: Dot plots Alignment matrices (PAM and Blosum) Blast...
Alignments and Phylogenetic Trees Chapter 4: 123-153 1 Topics Sequence alignment overview Needleman-Wunsch (skip forward based on HW timing) Recursion Come back for: Dot plots Alignment matrices (PAM and Blosum) Blast 3 Introduction: Why align 2 or more sequences 4 Sequence Alignment Measure their similarity Determine residue correspondence Observe patterns of conservation and variability Infer evolutionary relationships Determine protein function of newly discovered gene 5 Scoring We can "reward" an alignment for a match: match_score = 3 A | A We can "punish" an alignment for a mismatch: mis_match_score = -1 A T We can also score for insertion, or extension of a "gap": gap = -2 AA-TT || | AAGCT Total score: 3 + 3 + -2 + -1 + 3 == 6 6 abcde- || | ab-dcf 7 8 Global Alignment Sidenote: the Lesk book could have found more interesting example phrases to align… 9 Local Alignment 10 11 Order complexity: O (L ^ N) L = length of sequences N = number of sequences "NP hard" 12 Global vs Local Alignment Global -- attempts to align the entire length of 2 sequences Local -- attempts to identify the most similar regions within 2 sequences Note: Needleman-Wunsch is a Global Alignment algorithm Smith-Waterman is a relatively simple variation of NW 13 Semi-global Can also do semi-global (not described by text) 14 Semi-Global Alignment Gaps not penalized at the start or end of sequence 1 Gaps not penalized at the start or end of sequence 2 Any combination 15 Why Semi-global Alignment? Semi-global alignments – example match: 1: mismatch: -1: gap -2 s: CAGCACTTGGATTCTCGG t: CAGCGTGG Align1: CAGCACTTGGATTCTCGG CAGC-----G-T----GG score = -12 Align2: CAGCA-CTTGGATTCTCGG ---CAGCGTGG-------- score = -19 Which alignment is better? 16 Semi-Global Alignment Don't penalize for end spaces match: 1; mismatch: -1; gap -2 Align2: CAGCA-CTTGGATTCTCGG ---CAGCGTGG-------- score = +6-1-24 = -19 Align3: CAGCA-CTTGGATTCTCGG CAGCGTGG score = +6-1-2 = 3 17 Smith-Waterman Example in the book leaves a lot to be desired.... BLAST is a heuristic It is fast (at the expense of being sub-optimal) It is also local alignment only Can we do an optimal local alignment? Can we do optimal global alignment? 20 Sequence Alignment Material 1. Key elements of sequence alignment algorithms 2. Global pairwise alignment: Needleman-Wunsch algorithm 3. Local pairwise alignment: Smith-Waterman algorithm 21 Lecture Outline 1. Key elements of sequence alignment algorithms 2. Global pairwise alignment: Needleman-Wunsch algorithm 3. Local pairwise alignment: Smith-Waterman algorithm 22 Key Issues of Sequence Alignment 1. What types of alignment - pairwise vs. multiple - global vs. local 2. Scoring systems to rank alignments - scoring matrices, gap penalty. 3. Algorithms to find optimal scoring alignments. 4. Statistical methods to evaluate significance of alignment scores 23 Global vs. Local Alignment ✓Global Alignment --T--CC-C-AGT--TATGT-CAGGGGACACG-A-GCATGCAGA-GAC | || | || | | | ||| || | | | | |||| AATTGCCGCC-GTCGT-T-TTCAG----CA-GTTATG-T-CAGAT--C ✓Local Alignment tccCAGTTATGTCAGgggacacgagcatgcagagac |||||||||||| aattgccgccgtcgttttcagCAGTTATGTCAGatc 24 Why Local Alignment? Only certain regions of DNA/proteins (domains) are often conserved, not the entire sequences. SWIRM is an evolutionarily conserved, eukaryotic domain found in proteins implicated in chromatin remodeling and gene expression. atomic level interactions DNA binding 25 Cross-species Genome Similarity Islands of Similarity 28 Scoring Matrices Identity Matrix Match A to A =>1 otherwise 0 Similarity Matrix Match A to A => 1 Match A to R => -1.5 Match A to L => -1 Examples PAM matrices BLOSUM matrices BLOSUM 62 Matrix (for protein sequence alignment) 29 Pairwise Sequence Algorithms: Key Ideas An optimal alignment can be built up using previous solutions of smaller subsequences The approach is based on dynamic programming! Therefore, the best alignment that ends at a given pair of positions (i,j) in the 2 sequences is the score of the best alignment at previous position PLUS the score for aligning current two positions. 30 Lecture Outline Key elements of sequence alignment algorithms Global pairwise alignment: Needleman-Wunsch algorithm Local pairwise alignment: Smith-Waterman algorithm 34 Needleman-Wunsch Algorithm Produce Optimal Global alignment Uses a 2D matrix of partially computed alignment scores (i.e., partial solutions). Must select a scoring function for matches, mismatches, and gaps. Two phases of the algorithm: Construction of alignment score matrix using dynamic programming. Recursive traversal (or iterative) of the matrix to construct the actual alignment (trace back) 35 NW Dynamic Programming Gap penalty: -d ? C Mis-match score: m (xi, yj) j-1 j Match score: s(xi, yj) * F(i-1,j-1) F(i-1,j) i-1 +s(i-1,j-1) -d Select the "best" (max) score from C F(i,j-1) F(i,j) the 3 choices, and that becomes i -d F(i,j) Guide to notation: Assumes we already have value F(i,j) = Score at position (i,j) for F(i-1,j-1), F(i-1,j) and F(i,j-1) (row, column) We don't care what the previous characters (?/*) were. 37 NW Dynamic Programming Gap penalty: -d ? G Mis-match score: m (xi, yj) j-1 j Match score: s(xi, yj) * F(i-1,j-1) F(i-1,j) i-1 +m(i-1,j-1) -d Select the "best" (max) score from C F(i,j-1) F(i,j) the 3 choices, and that becomes i -d F(i,j) Guide to notation: Assumes we already have value F(i,j) = Score at position (i,j) for F(i-1,j-1), F(i-1,j) and F(i,j-1) (row, column) We don't care what the previous characters (?/*) were. 38 Three Simple Rules: If we use the Diagonal, Left, and/or UP cells... 1) If we use "Diagonal" cell F[r-1,c-1], add "Diagonal" cell and MATCH (or MISMATCH) score to F[r,c] 2) If we use the "Left" cell F[r, c-1], add "Left" cell and GAP score to F[r, c] 3) If we use the "Up" cell F[r-1, c], add "Up" cell and GAP score to F[r,c] 39 Needleman-Wunsch Algorithm Initialization: F(0, j) = -j*d d: gap penalty F(i, 0) = -i*d Iteration: F(i, j) = max F(i – 1, j) – d F(i, j – 1) – d F(i – 1, j – 1) + s(xi, yj) F(i – 1, j – 1) + m(xi, yj) 40 NW Dynamic Programming (Possible Alignment Moves) Notice three possible cases at each internal cell of the DP matrix: s, if xi = yj F(i,j) = F(i-1, j-1) + 1. xi aligns to yj m, if not x1……xi-1 xi S, match score y1……yj-1 yj m, mismatch score (either can be negative) 2. xi aligns to a gap (gap in y) F(i,j) = F(i, j-1) - d x1……xi-1 xi y1……yj - 3. yj aligns to a gap (gap in x) F(i,j) = F(i-1, j) - d x1……xi - y1……yj-1 yj 41 Algorithm Basics Create Grid How big? Match: 1 Mismatch: -1 Gap: -2 43 Algorithm Basics A G C 0 1 2 3 2 Sequences 0 0 -2 -4 -6 First row and col are Match: 1 initialized with gap Mismatch: -1 score (penalty) A 1 -2 Gap: -2 Each subsequent cell is the previous cell 2 -4 plus the gap score. A Note: extra row and column A 3 -6 C 4 -8 44 Algorithm Basics A G C Fill in Gap arrows 0 1 2 3 If traversal gets to 0 0 -2 -4 -6 either edge, you are Match: 1 forced into inserting gaps in the "other" 1 -2 Mismatch: -1 A sequence. Gap: -2 A 2 -4 A 3 -6 C 4 -8 45 Algorithm Basics A G C Match Scores 0 1 2 3 0 0 -2 -4 -6 Match: 1 1 -2 1 Mismatch: -1 A Gap: -2 1 A 2 -4 1 A 3 -6 1 C 4 -8 46 Algorithm Basics A G C Mis-match scores 0 1 2 3 0 0 -2 -4 -6 Match: 1 1 -1 -1 Mismatch: -1 A 1 -2 Gap: -2 1 -1 -1 A 2 -4 1 -1 -1 A 3 -6 -1 -1 1 C 4 -8 47 Algorithm Basics A G C Now, iterate thru all cells 0 1 2 3 to find scores. 0 0 -2 -4 -6 Evaluate 3 directions… Match: 1 1 -1 -1 Mismatch: -1 A 1 -2 Gap: -2 1 -1 -1 A 2 -4 1 -1 -1 A 3 -6 -1 -1 1 C 4 -8 48 Algorithm Basics A G C Now, iterate thru all cells 0 1 2 3 to find scores. 0 0 -2 -4 -6 Evaluate 3 directions… Match: 1 1 -1 -1 Mismatch: -1 A 1 -2 Score if we traverse LEFT -4 Gap: -2 1 -1 -1 A 2 -4 1 -1 -1 A 3 -6 -1 -1 1 C 4 -8 49 Algorithm Basics A G C Now, iterate thru all cells 0 1 2 3 to find scores. 0 0 -2 -4 -6 Evaluate 3 directions… Match: 1 1 -1 -1 Mismatch: -1 A 1 -2 Score if we traverse -4 1 Gap: -2 LEFT, DIAGONAL 1 -1 -1 A 2 -4 1 -1 -1 A 3 -6 -1 -1 1 C 4 -8 50 Algorithm Basics A G C Now, iterate thru all cells 0 1 2 3 to find scores. 0 0 -2 -4 -6 Evaluate 3 directions… Match: 1 1 -1 -1 Mismatch: -1 A 1 -2 Score if we traverse -4 1 -4 Gap: -2 LEFT, DIAGONAL, or VERTICAL 1 -1 -1 A 2 -4 1 -1 -1 A 3 -6 -1 -1 1 C 4 -8 51 Algorithm Basics A G C Now, iterate thru all cells 0 1 2 3 to find scores. 0 0 -2 -4 -6 Evaluate 3 directions… Match: 1 1 -1 -1 Mismatch: -1 A 1 -2 Score if we traverse Gap: -2 LEFT, DIAGONAL, or VERTICAL 1 -1 -1 A 2 -4 1 -1 -1 A 3 -6 -1 -1 1 C 4 -8 52 Algorithm Basics A G C Now, iterate thru all cells 0 1 2 3 to find scores. 0 0 -2 -4 -6 Evaluate 3 directions… Match: 1 1 -1 -1 Mismatch: -1 A 1 -2 1 Score if we traverse -4 1 -4 Gap: -2 LEFT, DIAGONAL, or VERTICAL 1 -1 -1 A 2 -4 Score in the cell becomes the MAXIMUM 1 -1 -1 value (best score). A 3 -6 -1 -1 1 C 4 -8 53 Algorithm Basics A G C Now, iterate thru all cells 0 1 2 3 to find scores. 0 0 -2 -4 -6 Evaluate 3 directions… Match: 1 1 -1 -1 Mismatch: -1 A 1 -2 1 Score if we traverse -4 1 -4 Gap: -2 LEFT, DIAGONAL, or VERTICAL 1 -1 -1 A 2 -4 Score in the cell becomes the MAXIMUM 1 -1 -1 value (best score). A 3 -6 -1 -1 1 KEEP all arrows that 4 -8 could produce the C maximum value. 54 Algorithm Basics A G C Next cell 0 1 2 3 0 0 -2 -4 -6 Match: 1 1 -1 -1 Mismatch: -1 A 1 -2 1 -4 1 -4 -1 -3 -6 Gap: -2 1 -1 -1 A 2 -4 1 -1 -1 A 3 -6 -1 -1 1 C 4 -8 55 Algorithm Basics A G C Next cell 0 1 2 3 0 0 -2 -4 -6 Match: 1 1 -1 -1 Mismatch: -1 A 1 -2 1 -1 -4 1 -4 -1 -3 -6 Gap: -2 1 -1 -1 A 2 -4 1 -1 -1 A 3 -6 -1 -1 1 C 4 -8 56 Algorithm Basics A G C Next cell 0 1 2 3 0 0 -2 -4 -6 Match: 1 1 -1 -1 Mismatch: -1 A 1 -2 1 -1 -4 1 -4 -1 -3 -6 -3 -5 -8 Gap: -2 1 -1 -1 A 2 -4 1 -1 -1 A 3 -6 -1 -1 1 C 4 -8 57 Algorithm Basics A G C Next cell 0 1 2 3 0 0 -2 -4 -6 Match: 1 1 -1 -1 Mismatch: -1 A 1 -2 1 -1 -3 -4 1 -4 -1 -3 -6 -3 -5 -8 Gap: -2 1 -1 -1 A 2 -4 1 -1 -1 A 3 -6 -1 -1 1 C 4 -8 58 Algorithm Basics A G C Next cell 0 1 2 3 Note, Diagonal and 0 0 -2 -4 -6 Vertical give max score. Match: 1 1 -1 -1 Mismatch: -1 Retain both directions A 1 -2 1 -1 -3 -4 1 -4 -1 -3 -6 -3 -5 -8 Gap: -2 1 -1 -1 A 2 -4 -6 -1 -1 1 -1 -1 A 3 -6 -1 -1 1 C 4 -8 59 Algorithm Basics A G C Next cell 0 1 2 3 Note, Diagonal and 0 0 -2 -4 -6 Vertical give max score. Match: 1 1 -1 -1 Mismatch: -1 Retain both directions A 1 -2 1 -1 -3 -4 1 -4 -1 -3 -6 -3 -5 -8 Gap: -2 1 -1 -1 A 2 -4 -1 -6 -1 -1 1 -1 -1 A 3 -6 -1 -1 1 C 4 -8 60 Algorithm Basics A G C Final matrix 0 1 2 3 Note, a mistake early, 0 0 -2 -4 -6 will propagate throughout. Match: 1 1 -1 -1 Mismatch: -1 A 1 -2 1 -1 -3 -4 1 -4 -1 -3 -6 -3 -5 -8 Gap: -2 1 -1 -1 A 2 -4 -1 0 -2 -6 -1 -1 -3 0 -3 -2 -2 -5 1 -1 -1 A 3 -6 -3 -2 -1 -8 -3 -3 -5 -2 -2 -4 -1 -4 Optimal Alignment Score: -1 -1 -1 1 C 4 -8 -5 -4 -1 -10 -7 -5 -7 -4 -4 -6 -1 -3 61 Algorithm Basics A G C A matrix: Scores 0 1 2 3 0 0 -2 -4 -6 B matrix: Match and Mismatch 1 -1 -1 scores A 1 -2 1 -1 -3 -4 1 -4 -1 -3 -6 -3 -5 -8 1 -1 -1 C (or "D") matrix: A 2 -4 -1 0 -2 -6 -1 -1 -3 0 -3 -2 -2 -5 Directions 1 -1 -1 A 3 -6 -3 -2 -1 Note -- you would need -8 -3 -3 -5 -2 -2 -4 -1 -4 to encode directions to -1 -1 1 store as a single C 4 -8 -5 -4 -1 numerical value -10 -7 -5 -7 -4 -4 -6 -1 -3 62 Algorithm Basics Checks 1) First, check all SCORES for MAX 2) Then check all blue numbers A G C 3) If you change a blue number, have to recheck all scores (and arrows) 0 1 2 3 0 0 -2 -4 -6 Match: 1 1 -1 -1 Mismatch: -1 A 1 -2 1 -1 -3 -4 1 -4 -1 -3 -6 -3 -5 -8 To obtain alignment, Gap: -2 traverse the matrix to 1 -1 -1 cell 0,0. A 2 -4 -1 0 -2 -6 -1 -1 -3 0 -3 -2 -2 -5 Any path may be 1 -1 -1 followed, and will A 3 -6 -3 -2 -1 produce an optimal -8 -3 -3 -5 -2 -2 -4 -1 -4 alignment. -1 -1 1 There may be multiple C 4 -8 -5 -4 -1 -10 -7 -5 -7 -4 -4 -6 -1 -3 paths. 63 Algorithm Basics Optimal Alignment: A G C C | 0 1 2 3 C 0 0 -2 -4 -6 Match: 1 1 -1 -1 Mismatch: -1 A 1 -2 1 -1 -3 -4 1 -4 -1 -3 -6 -3 -5 -8 To obtain alignment, Gap: -2 traverse the matrix to 1 -1 -1 cell 0,0. A 2 -4 -1 0 -2 -6 -1 -1 -3 0 -3 -2 -2 -5 Any path may be 1 -1 -1 followed, and will A 3 -6 -3 -2 -1 produce an optimal -8 -3 -3 -5 -2 -2 -4 -1 -4 alignment. -1 -1 1 There may be multiple C 4 -8 -5 -4 -1 -10 -7 -5 -7 -4 -4 -6 -1 -3 paths. 64 Algorithm Basics Optimal Alignment: A G C GC | 0 1 2 3 AC 0 0 -2 -4 -6 Note, we are building Match: 1 the alignment string 1 -1 -1 Mismatch: -1 A 1 -2 1 -1 -3 right to left. -4 1 -4 -1 -3 -6 -3 -5 -8 Gap: -2 1 -1 -1 There may be multiple A 2 -4 -1 0 -2 -6 -1 -1 -3 0 -3 -2 -2 -5 paths. 1 -1 -1 A 3 -6 -3 -2 -1 -8 -3 -3 -5 -2 -2 -4 -1 -4 -1 -1 1 C 4 -8 -5 -4 -1 -10 -7 -5 -7 -4 -4 -6 -1 -3 65 Algorithm Basics Optimal Alignment: A G C AGC | | 0 1 2 3 AAC 0 0 -2 -4 -6 Match: 1 1 -1 -1 Mismatch: -1 A 1 -2 1 -1 -3 -4 1 -4 -1 -3 -6 -3 -5 -8 To obtain alignment, Gap: -2 traverse the matrix to 1 -1 -1 cell 0,0. A 2 -4 -1 0 -2 -6 -1 -1 -3 0 -3 -2 -2 -5 Any path may be 1 -1 -1 followed, and will A 3 -6 -3 -2 -1 produce an optimal -8 -3 -3 -5 -2 -2 -4 -1 -4 alignment. -1 -1 1 There may be multiple C 4 -8 -5 -4 -1 -10 -7 -5 -7 -4 -4 -6 -1 -3 paths. 66 Algorithm Basics Optimal Alignment: A G C -AGC | | 0 1 2 3 AAAC 0 0 -2 -4 -6 Match: 1 1 -1 -1 Mismatch: -1 A 1 -2 1 -1 -3 -4 1 -4 -1 -3 -6 -3 -5 -8 The vertical arrow places Gap: -2 a gap in the top 1 -1 -1 sequence. A 2 -4 -1 0 -2 -6 -1 -1 -3 0 -3 -2 -2 -5 A horizontal arrow 1 -1 -1 would put a gap in the A 3 -6 -3 -2 -1 vertical sequence -8 -3 -3 -5 -2 -2 -4 -1 -4 -1 -1 1 C 4 -8 -5 -4 -1 -10 -7 -5 -7 -4 -4 -6 -1 -3 67 Algorithm Basics Optimal Alignments: A G C -AGC | | 0 1 2 3 AAAC 0 0 -2 -4 -6 AG-C Match: 1 | | -1 AAAC 1 -2 1 1 -1 -1 -3 Mismatch: -1 A -4 1 -4 -1 -3 -6 -3 -5 -8 Gap: -2 A-GC | | 1 -1 -1 A 2 -4 -1 0 -2 AAAC -6 -1 -1 -3 0 -3 -2 -2 -5 1 -1 -1 A 3 -6 -3 -2 -1 -8 -3 -3 -5 -2 -2 -4 -1 -4 -1 -1 1 C 4 -8 -5 -4 -1 -10 -7 -5 -7 -4 -4 -6 -1 -3 68 Algorithm Basics Optimal Alignments: A G C -AGC | | 0 1 2 3 AAAC 0 0 -2 -4 -6 AG-C Match: 1 | | -1 AAAC 1 -2 1 1 -1 -1 -3 Mismatch: -1 A -4 1 -4 -1 -3 -6 -3 -5 -8 Gap: -2 A-GC | | 1 -1 -1 A 2 -4 -1 0 -2 AAAC -6 -1 -1 -3 0 -3 -2 -2 -5 1 -1 -1 Add up the score: A 3 -6 -3 -2 -1 -8 -3 -3 -5 -2 -2 -4 -1 -4 1-1-2+1=-1 -1 -1 1 C 4 -8 -5 -4 -1 -10 -7 -5 -7 -4 -4 -6 -1 -3 69 Algorithm Basics Optimal Alignments: A G C -AGC | | 0 1 2 3 AAAC 0 0 -2 -4 -6 AG-C Match: 1 | | -1 AAAC 1 -2 1 1 -1 -1 -3 Mismatch: -1 A -4 1 -4 -1 -3 -6 -3 -5 -8 Gap: -2 A-GC | | 1 -1 -1 A 2 -4 -1 0 -2 Observations: AAAC -6 -1 -1 -3 0 -3 -2 -2 -5 All cells must have 1, 2 or 3 1 -1 -1 Add up the score: A 3 -6 -3 -2 -1 arrows. -8 -3 -3 -5 -2 -2 -4 -1 -4 1-1-2+1=-1 -1 -1 1 C 4 -8 -5 -4 -1 -10 -7 -5 -7 -4 -4 -6 -1 -3 70 Extracting the Alignment Starting with the lower right element, traverse the arrows back to the (0,0). The alignment is constructed on the reverse return path. 71 Simple Rules for Traversal 1) As you leave a cell (diagonally), you add both corresponding nucleotides (row/column) to the alignment. Match OR mismatch 2) As you leave a cell vertically, you add the ROW nucleotide, and put the gap in the COLUMN sequence 3) As you leave a cell horizontally, you add the COLUMN nucleotide, and put the gap in the ROW sequence Memory trick -- horizontal and vertical arrows POINT to where the gap goes. 72 Algorithm Basics: recursion A G C 0 1 2 3 The idea behind a recursive implementation is: 0 0 -2 -4 -6 1) Fill out the cells 2) Recover the alignment 1 -1 -1 A 1 -2 1 -1 -3 -4 1 -4 -1 -3 -6 -3 -5 -8 1 -1 -1 A 2 -4 -1 0 -2 -6 -1 -1 -3 0 -3 -2 -2 -5 For the cells, you call the function on S(4,3). If it has 1 -1 -1 no score, then you call the A 3 -6 -3 -2 -1 function on S(4-1,3), S(4- -8 -3 -3 -5 -2 -2 -4 -1 -4 1,3-1) and S(4,3-1). -1 -1 1 The same "traversal" works C 4 -8 -5 -4 -1 -10 -7 -5 -7 -4 -4 -6 -1 -3 for recovering the "path", and thus the alignment. 73 An Example Traceback A G C Optimal Alignment A A A C 0 1 2 3 A G - C 0 0 -2 -4 -6 Match: 1 Mismatch: -1 A 1 -2 1 -1 -3 Gap: -2 A 2 -4 -1 0 -2 A 3 -6 -3 -2 -1 Optimal Alignment Score C 4 -8 -5 -4 -1 74 Alternative Tracebacks A G C Optimal Alignment A A A C 0 1 2 3 A G - C 0 0 -2 -4 -6 Match: 1 Another optimal Mismatch: -1 alignment A 1 -2 1 -1 -3 Gap: -2 A A A C A - G C A 2 -4 -1 0 -2 A 3 -6 -3 -2 -1 Optimal Alignment Score C 4 -8 -5 -4 -1 75 Alternative Tracebacks A G C Optimal Alignment A A A C 0 1 2 3 A G - C 0 0 -2 -4 -6 Match: 1 Another optimal Mismatch: -1 alignment A 1 -2 1 -1 -3 Gap: -2 A A A C A - G C A 2 -4 -1 0 -2 Another optimal alignment A 3 -6 -3 -2 -1 Optimal A A A C Alignment - A G C Score C 4 -8 -5 -4 -1 76 Example Question, why not start at "highest" value in last row/column? 77 Computational Complexity of NW Algo. Basic NW algo. is O(n2) in speed and memory N MxN cells to fill Brute force requires O(3n) By caching the intermediate results, the algo is really O(n2) Improvements can reduce the M memory need to O(n) (linear M’ space algo.) N’ 78 Lecture Outline Key elements of sequence alignment algorithms Global pairwise alignment: Needleman-Wunsch algorithm Local pairwise alignment: Smith-Waterman algorithm 79 Global Alignment Local Alignment 80 Smith-Waterman Algorithm Idea: Ignore poorly aligned regions. Modifications to Needleman-Wunsch: Initialization: F(0, j) = F(i, 0) = 0 0 Iteration: F(i, j) = max F(i – 1, j) – d F(i, j – 1) – d F(i – 1, j – 1) + s(xi, yj) 81 Smith-Waterman Algorithm Termination and traceback: 1. If we want the optimal local alignment… FOPT = maxi,j F(i, j) Find FOPT and trace back 2. If we want all local alignments scoring > t For all i, j find F(i, j) > t, and trace back. 82 Smith-Waterman Local optimal alignment Negative scoring matrix cells set to 0 Start at highest scoring cell, and proceed until 0 is encountered OR Start at cells scoring above a threshold and proceed until 0 (for population of local alignments above a threshold) 83 HW3 Fill out the matrix for NW. Recover the alignment(s) Make a "B" matrix in Python 84 Recursion example Fibonacci sequence Starting with 2 digits, 0 and 1, add last 2 digits to get next: 0 1 0+1=1 0 1 1 1+1=2 0 1 1 2 1+2=3 0 1 1 2 3 2+3=5 0 1 1 2 3 5 etc... Fibonacci number, at index 4, is: 3 87 #!/usr/bin/env python # import modules used here -- sys is a very standard one import sys # starting at 0 and 1, ADD previous 2 digits to get the next # 0 1 2 3 4 5 6 7 (index values) # 0 1 1 2 3 5 8 13 # Fib(index=7)=13 def fibonacci( number ): ## Find the nth fibonacci term ## these two if statements are BASE CASE if(number < 1): return 0 if(number