Sequence Editing and BLAST Search PDF

Document Details

SociableWillow7055

Uploaded by SociableWillow7055

Universitas Indonesia

Tags

BLAST search sequence editing bioinformatics molecular biology

Summary

This document covers sequence editing techniques and BLAST search methods widely used in bioinformatics. BLAST (Basic Local Alignment Search Tool) is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. The document is an educational resource for students.

Full Transcript

Sequences Editing and BLAST MK BIOINFORMATIKA Wellyzar Sjamsuridzal, Ph.D. Sequence Editing Biosystematics relies on sequence editing and multiple sequence alignment to accurately classify and identify organisms. Definition: Sequence editing involves the process of r...

Sequences Editing and BLAST MK BIOINFORMATIKA Wellyzar Sjamsuridzal, Ph.D. Sequence Editing Biosystematics relies on sequence editing and multiple sequence alignment to accurately classify and identify organisms. Definition: Sequence editing involves the process of reviewing, trimming, and curating genetic sequences obtained from organisms. Definition: Multiple sequence alignment is the process of aligning multiple genetic sequences to identify conserved regions, genetic variations, and evolutionary relationships. Through these processes, researchers can unveil the evolutionary relationships and genetic variations within populations, facilitating advancements in various fields, including medicine, agriculture, and environmental sciences. How DNA Sequence Data is Obtained for Genetic Research Genetic Data Compare Extract DNA from Cells DNA Sequence DNA Sequences to Sample: One Another Organism TTCAACAACAGGCCCAC …TTCACCAACAGGCCCACA… TTCACCAACAGGCCCAC TTCATCAACAGGCCCAC GOALS: Identify the organism from which the DNA was obtained. Compare DNA sequences to each other. Steps in Sequence Editing 1. Quality assessment: Assess the quality of the obtained sequence data, including checking for high-quality reads and determining the presence of any anomalies or errors. 2. Trimming: Remove low-quality bases, adapter sequences, and any remaining artifacts from the sequencing process. 3. Alignment and consensus generation: Align the edited sequences to a reference database and generate a consensus sequence, considering genetic variation within populations Overview of DNA Sequencing DNA Sample Mix with primers Perform sequencing reaction …T T C A C C A A C T G G C C C A C A… DNA Sequence Chromatogram Sequencing Result in Electropherogram Model 310 File: B2-Tiwi-8b-cpg24-3F.ab1 Signal G:2052 A:1969 T:1803 C:1848Page 1 of 2 KB.bcp KB_310_POP6_BDTv3_36Rapid.m ob 8/19/2009 BIO Tiwi-8b-cpg24-3F 310Matrix_BDV31_270308.m tx Spacing: 9.95193481445313 T RACE Lane 7 Points 938 to 6680 BioEdit v ersion 7.0.9.0 (6/27/07) AT T G T A A G GG A A G A G T G CG TT AA T G T G G T GG GC T T T G GG G T G TA AA C T C T T T T C T C GG G A T A TA A T G G T G A TG G T C C T G A G GAA 10 20 30 40 50 60 70 80 G A AG C A T C G G C T A A CT C CG T G C CA G C AG C C G CG G T AC T AC G GA G GA TG C A AG C G T T AT C C G G A AT G AT T G G G C G T A A AG C G T C C G C AG G TG G C G A T G 90 100 110 120 130 140 150 160 170 180 T A AG T C T G C T G T T A AA G A G CA A AG C T T A A CT T T G T AA A AG C A G T G G A A AC T A CA T AG C T A G AG T ACG T T CG G G G C A G AG G G A AT T CC T G G T G T A G CG G TG 190 200 210 220 230 240 250 260 270 280 A A AT G C G T A G A G AT C AG G A AGA AC AC C G G T GG C G A A GG CG C T C T G CT A G G C CG T AA C T G ACA C T G AG G G AC GA A AG CT AG G G G AG C GA ATG G G T A G T C C CC C 290 300 310 320 330 340 350 360 370 380 CC C T T T AA A CA A G A C CA G T G T T GT C G C T T T G C A A T G C T 390 400 410 420 2/27/24 Biosistematika 2012 6 Software used to open.ab1 files Chromas.ab1 files Bioedit Download: http://www.mbio.ncsu.edu/BioEdit/bioedit.html FinchTV SnapGene Viewer Chromatogram Editing Software Steps in sequence analysis Examination for signal to noise ratio Trimming Merging of forward and reverse reads BLAST search Sequence alignment Phylogeny Examination for signal to noise ratio The length of sequences enough for phylogenetic analysis – about 400 nt See the overall sequences – Check the sequence quality! Sequence overview Sequence overview Read the sequence quality Good sequence Bad sequence Ugly sequence Without noise With significant noise With little noise Causes of failed DNA sequencing reactions Poor quality DNA Too much or too little template DNA More than one template Blocked capillary Trimming – to eliminate poor quality peaks Good sequence They should independent in each other, Do not overlap or double peak Lack of ambiguous codes Dye Blops -Peaks of excess dye Causes: Incorrect estimation of template concentration (insufficient used) Poor removal of unincorporated dye terminators post-sequencing reaction Ambiguous code Editing Sequences The more ambiguous codes in chromatogram the less reliable and the low quality of the overall chromatogram Bad sequences No detection of nucleotides Residual background noise probably from sequencer Repeat Residual nucleotides left from sequencing reaction sequence Ugly sequence We can not see the clearly define peaks We see competing peaks from the same position It caused by contamination or faults of sequencing reaction Editing sequence See the sequence manually and see the sequence with ambiguous code Search individual ambiguous code Remove all ambiguous data Remove all ambiguous data In identification, the algorithm will use these ambiguous codes in calculation resulted in less accurate and less reliability the process will become Editing sequence Change the nucleotide with a lower case letter, so when you are back to the sequence later you will now the position that you have changed. Continue through the overall sequences Sequence Replication Assembling Sequences Sequencing with multiple primers PCR and primer walking sequencing Assembling a cloned sequence PCR 16S rRNA gene (1510 nt) Sequencing Assembling sequence 2/27/24 26 Merging of forward and reverse reads Export Nucleotide to Fasta File From Graphic to text-format DNA sequence (before editing) ATTGTAAGGGAAGAGTGCGTTAATGTGGTGGGCTTTGGGGTGTAAACTCTTTTCTCGGGATATAATGGTGATGGTCCTGAGGAAGAAGCATCGGCTA ACTCCGTGCCAGCAGCCGCGGTACTACGGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCCGCAGGTGGCGATGTAAGTCTGCTGTT AAAGAGCAAAGCTTAACTTTGTAAAAGCAGTGGAAACTACATAGCTAGAGTACGTTCGGGGCAGAGGGAATTCCTGGTGTAGCGGTGAAATGCGTA GAGATCAGGAAGAACACCGGTGGCGAAGGCGCTCTGCTAGGCCGTAACTGACACTGAGGGACGAAAGCTAGGGGAGCGAATGGGTAGTCCCCCC CCTTTAAACAAGACCAGTGTTGTCGCTTTGCAATGCT DNA sequence (after editing) GGCTCTTGGGGTGTAAACCTCTTTTCTCAGGGAATAATAATAGTGAAGGTACCTGAGGAAGAAGCATCGGCTAACTCCGTGCCAGCAGCCGCGGTAA TACGGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCCGCAGGTGGCGATGTAAGTCTGCTGTTAAAGAGCAAAGCTTAACTTTGTAA AAGCAGTGGAAACTACATAGCTAGAGTACGTTCGGGGCAGAGGGAATTCCTGGTGTAGCGGTGAAATGCGTAGAGATCAGGAAGAACACCGGTGG CGAAGGCGCTCTGCTAGGCCGTAACTGACACTGAGGGACGAAAGCTAGGGGAGCGAATGGGTAGTCCCCCCCCTTTAAACAAGACCAGTGTTGTCG CTTTGCAATGCT DNA sequence (contig sequence) GCGAAAGCCTGACGGAGCAATACCGCGTGAGGGAGGAAGGCTCTTGGGTTGTAAACCTCTTTTCTCAGGGAATAAGAAAGTGAAGGTACCTGAGG AATAAGCATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCCGCAGGTGGCG ATGTAAGTCTGCTGTTAAAGAGCAAAGCTTAACTTTGTAAAAGCAGTGGAAACTACATAGCTAGAGTACGTTCGGGGCAGAGGGAATTCCTGGTGTA GCGGTGAAATGCGTAGAGATCAGGAAGAACACCGGTGGCGAAGGCGCTCTGCTAGGCCGTAACTGACACTGAGGGACGAAAGCTAGGGGAGCGA ATGGGT 2/27/24 Confirmation of the sequence BLAST at NCBI Identity of a sequence can be confirmed by running a BLAST at NCBI Compares with the sequences in the Database Link: https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PA GE_TYPE=BlastSearch&LINK_LOC=blasthome Find Similarity and Identity of Sequence Data by BLAST Search IDENTIFICATION VIA DNA DATABASE Basic Local Alignment Search Tool (BLAST) BLAST is Search Tool By aligning query sequence against all sequences in a database, alignment can be used to search database for similar sequences >18S rRNA partial sequence AAAGATTAAGCCATGCATGTCTAAGTATAAACAATTATACAGTGAAACTGCGAATGGCTCATTAAATCAG TTATAGTTTATTTGATGATACCTTACTACATGGATAACTGTGGTAATTCTAGAGCTAATACATGCCGAGA CAGCCCCAACCTTTGGAAGGGGTGCATTTATTAGATAAAAAACCAATGGCTTTCGGGTCTCTTTGGTGAT TCATAATAACTTTTCAAATCGTATAGCCTTGTGCTGACGATGCTTCATTCAAATATCTGCCCTATCAACT TTCGATGGTAGGATAGAGGCCTACCATGGTGATGACGGGTAACGGGGAATAAGGGTTCGATTCCGGAGAG Edited AGGGCCTGAGAAACGGCCCTCAAATCTAAGGATTGCAGCAGGCGCGCAAATTACCCAATCCCGACATGGG GAGGTAGTGACAATAAATAACAATGTATGGCTCTTTAGGGTCTTACAATTGGAATGAGTACAATTTAAAT CTCTTAACGAGGATCAATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGT Sequence ATATTAAAATTGTTGACGTTAAAAAGCTCGTAGTCGAACTTCGGCCTCTGATGATTGGTCTGCCTTTTGG TGTGTACTGGTTTATTGGAGGCTTACCTCTTGGTGAACTTCAATGCACTTTACTGGGTGTTGAAGGGAAC CAGGACTTTTACTTTGAAAAAATTAGAGTGTTCAAAGCAGGCTTTTGCCTGAATACATTAGCATGGAATA In Fasta ATAAAATAGGATGTGTGGTCCTATTTTGTTGGTTTCTAGGATCACCGTAATGATTAATAGGGTCAGTTGG GGGCATTTGTATTACATCGTCAGAGGTGAAATTCTTGGATTGATGTAAGACAAACTACTGCGAAAGCATC TGCCAAGGATGACTTCATTGATCAAGAACGAAGGTTAAGGGTTCAAAAACGATCAGATACCGTTGTAGTC TTAACAGTAAACTATGCCGACTGGGGATCAGACAAGGATTTATAATGACTTGTTTGGCACCTAAAGGGAA ACCTGAAGTTTAGGTTCGTGGGGGAGTACGGTCACAAGGCTGAAACTTAAAGGAATTGACGGAAGGGCAC CACCAGGTGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGAAAACTCACCAGGTCCAGACACAGTAAG GATTGACAGATTGATAGCTTTTTCTTGATTTTGTGGTTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGT GATTTGTCTGGTTAATTCCGATAACGAACGAGACCTTCTCCTGCTAAATAGTCTGGCTGGCTTCGGCTGG CTATTGGCTTCTTAGAGGGACTATCAACGTTTAGTTGATGGAAGTTGGAGGCAATAACAGGTCTGTGATG CCCTTAGATGTTCTGGGCCGCACGCGCGCTACACTGACCAAGCCAGCGAGTGTATAACCTTATCCGAAAG GATTGGGTAATCTTGTGAAACTTGGTCGTGATGGGGATAGAGCATTGCAATTATTGCTCTTCAACGAGGA ATACCTAGTAAGCGTATGTCATCAGCATGCGTTGATTACGTCCCTGCCCTTTGTACACACCGCCCGTCGC TACTACCGATTGAATGGCTTAGTGAGGCGTTCGGAGAGCCTATAAAGAGCTGGCAACAGCACTTTACTGG TTCAAAGTTCTACGAACTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTA www.ncbi.nlm.nih.gov NCBI DATABASE BLAST BLAST search BLAST search Paste your sequence (Query) BLAST Search Result TOP HIT Related sequences are displayed here Top Hit BLAST Search Result BLAST result: Aecidium kalanchoes percent identities 96,69% Top Hit BLAST Search Result Aecidium kalanchoes 18S rRNA gene sequence Merging of forward and reverse reads Forward and reverse reads are merged to obtain a consensus sequence Contig = Forward + Reverse complementary of the reverse read Contig is a set of overlapping DNA segments that together represent a consensus region of DNA Merging tools: 1. Chromas, Bioedit 2. Emboss merger: An online tool housed at EMBL. Link: https://www.bioinformatics.nl/cgi-bin/emboss/merger Identity of Query Sequence based on BLAST Search Result QUERY: name of sequence: 18S rRNA partial sequence Aecidium kalanchoes, with percent identities 96,69% 41 Phylogenetics Tree Get related sequences of interest Perform multiple sequence alignments Estimate phylogenetic relationships Tree plotting and Tree editing Interpret results correctly Steps in Multiple Sequence Alignment 1. Sequence retrieval: Obtain a set of genetic sequences representing the target microbial taxa or genes of interest. 2. Pre-alignment processing: Remove any gaps, ambiguous characters, or poorly conserved regions that may hinder accurate alignment. 3. Alignment algorithms: Apply alignment algorithms, such as ClustalW, ClustalX, MUSCLE, or MAFFT, to align the sequences. 4. Consensus generation: Analyze the aligned sequences and generate a consensus alignment, representing the common features among the sequences. Make Sequences in Fasta format 2/27/24 44 Multi-FASTA Format in Notepad >M83548_Aqu_pyr TTCCCTGAAGAGTTTGATCCTGGCTCAGCGCGAACGCTGGCGGCGTGCCTAACACATAGGTGGTGCATGGCCGTCGTCAGCTCGTGTCGTGAGATGTTG GGTTAAGTCCCGCAACGAGCGCAACCCCTGCCCCTAGTTGCTACCCCGAGAGGGGAGCACTCTAGGGGGACCGCCGGCGATAAGCCGGAGGAAGGG GGGGATGACGTCAGGTCAGTATGCCCTTTATGCCCGGGGCCACACAGGCGCTACAGTGGCCGGGACAATGGGAAGCGACCCCGCAAGGGGGAGCTAA TCCCAGAAACCCGGTCATGGTGCGGATTGGGGGCTGAAACTCGCCCCCATGAAGCCGGAATCGGTAGTAACGGGGTATCAGCGATGTCCCCGTGAATA CGTTCTCGGGCCTTGCACACACCGCCCGTCACGCCACGGAAGTCGGTCCGGCCGGAAGTCCCCGATGC >X74066_Ace_ace GAGTTTGATTCTGGCTCAGAGCGAACGCTGGCGGCATGCTTAACACATGCAAGTCGCACGAAGGCTTCGGCCTTAGTGGCGGACGGGTGAGTAACGC GTAGGAATCTATCCATGGGTGGGGGATAACTCCGGGAAACTGGAGCTAATACCGCATGATACCTGAGGGTCAAAGGCGCAAGTCGCCTGTGGAGGAGT CTGCGTTTGATTAGCTTGTTGGTGGGGTAAAGGCCTACCAAGGCGATGATCAATAGCTGGTCTGAGAGGATGATCAGCCACA >AJ276351_Bac_sub CCTGGCTCAGGACGAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAGCGGACAGATGGGAGCTTGCTCCCTGATGTTAGCGGCGGACGGGTGAGT AACACGTGGGTAACCTGCCTGTAAGACTGGGATAACTCCGGGAAACCGGGGCTAATACCGGATGGTTGTTTGAACCGCATGGTTCAAACATAAAAGGT GGCTTCGGCTACCACTTACAGATGGACCCGCGGCGCATTAGCTAGTTGGTGAGGTAACGGCTCACCAAGGCAACGATGCGT >D86187_Bif_pse GTTTCGATTCTGGCTCAGGATGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGGATCCATCAGGCTTTGCTTGGTGGTGAGAGTGGCGAAC GGGTGAGTAATGCGTGACCGACCTGCCCCATACACCGGAATAGCTCCTGGAAACGGGTGGTAATGCCGGATGCTCCGACTCCTCGCATGGGGTGTCGGG AAAGATTTCATCGGTATGGGATGGGGTCGCGTCCTATCAGGTAGTCGGCGGGGTAACGGCCCACCGAGCCTACGACGGGTAGCCGGCCTGAGAGGGC GACCGGCCACATTGGGACTGAGATACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCGACGCC GCGTGCGGGATGACGGCCTTCGGGTTGTAAACCGCTTTTGATCGGGAGCAAGCCTTCGGGTGAGTGCTAAGTCGTAACAAGGTAGCCGTACCGGAAG GTGCGGCTGGATCACC 2/27/24 Continue to MEGA Challenges and Considerations in Microbial Systematics A. Intra-species and inter-species genetic variation: Microbial species often exhibit substantial genetic diversity, making accurate sequence editing and alignment challenging. B. Phylogenetic inference: Sequence editing and alignment are crucial steps for accurate phylogenetic analysis, which helps determine evolutionary relationships among microorganisms. C. Database selection and validation: The choice of reference databases for sequence alignment and editing is critical to ensure accurate classification and identification of microorganisms. Source of information https://www.youtube.com/watch?v=XN05FkrRGi8. PHYLOGENETICS 3: DNA Chromatogram Analysis (Software, Quality Assessment, Editing and Export) https://www.youtube.com/watch?v=kP_S5cRqiSg (Sequence Editing using BioEdit)