Genome Browsers Overview Quiz
149 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What does ClinVar primarily focus on?

  • Nucleotide database management
  • Publicly submitted variations and observations of diseases (correct)
  • Gene expression studies
  • Automated data extraction tools
  • Which of the following is a database that contains information on single nucleotide polymorphisms?

  • GEO
  • Genomic Data Commons
  • dbSNP (correct)
  • Ensembl
  • Which resource is known for containing hundreds of thousands of expression experiments?

  • GEO (correct)
  • PubMed
  • NCBI
  • ClinVar
  • What is the significance of the star rating in ClinVar results?

    <p>It indicates the level of confidence in the results</p> Signup and view all the answers

    What is one potential drawback of using the NCBI extraction tools?

    <p>They can be overly complicated</p> Signup and view all the answers

    Which of the following links to a faster European mirror site for genomic data?

    <p><a href="http://genome-euro.ucsc.edu">http://genome-euro.ucsc.edu</a></p> Signup and view all the answers

    What does the acronym GEO stand for?

    <p>Gene Expression Omnibus</p> Signup and view all the answers

    What is a feature that UCSC Genome Browser offers for exploring gene data?

    <p>Track Viewing</p> Signup and view all the answers

    What feature makes Ensembl particularly useful for data extraction?

    <p>Automated extraction tools</p> Signup and view all the answers

    What does the 'zoom in by factor of 3X' feature enable a user to do?

    <p>Increase the detail of visible genetic data</p> Signup and view all the answers

    Which database is specifically known for providing standardized clinical significance of genetic variants?

    <p>ClinVar</p> Signup and view all the answers

    What does the direction of the arrows in a gene representation indicate?

    <p>The gene is on the reverse (3' to 5') strand</p> Signup and view all the answers

    What distinguishes GRCh38 from GRCh37 in the UCSC Genome Browser?

    <p>GRCh38 has fewer gaps and newer annotations.</p> Signup and view all the answers

    What tool does Ensembl provide for data extraction that is not covered in UCSC?

    <p>Table Browser</p> Signup and view all the answers

    How can a user highlight a specific region for zooming?

    <p>By dragging the mouse over the scale or chromosome track</p> Signup and view all the answers

    What type of analysis can gnomAD provide when looking up variants?

    <p>Minor allele frequency and clinical significance</p> Signup and view all the answers

    What information can be accessed by clicking on a transcript?

    <p>A page describing the transcript and links to resources</p> Signup and view all the answers

    Which browser would be best for database queries related to genomic data?

    <p>NCBI</p> Signup and view all the answers

    What is the purpose of the UCSC tracks feature?

    <p>To organize data by types such as Mapping and Genes</p> Signup and view all the answers

    What functionality does the 'Zoom in/Zoom out' feature provide in the UCSC browser?

    <p>Focusing on a specific genomic region</p> Signup and view all the answers

    What can be observed when zoomed in to the level of DNA and amino acids?

    <p>All six coding frames, three forward and three reverse</p> Signup and view all the answers

    Which is NOT a part of the anatomy of the UCSC Genome Browser?

    <p>Variant classifications</p> Signup and view all the answers

    What do the coordinates chr11:110,045,605-110,767,437 represent?

    <p>The position of a protein coding gene on chromosome 11</p> Signup and view all the answers

    What happens when a user mouses over an exon?

    <p>The corresponding intron number is displayed</p> Signup and view all the answers

    What programming languages are mentioned as having a REST API for data extraction?

    <p>Perl and Java</p> Signup and view all the answers

    What type of data structure is indicated by the term 'lookup'?

    <p>Dictionary</p> Signup and view all the answers

    What is the primary biotype of the BRCA2 gene as described in the lookup?

    <p>protein_coding</p> Signup and view all the answers

    Which gene is associated with Bardet-Biedl syndrome according to the database lookup?

    <p>BBS2</p> Signup and view all the answers

    What assembly name is indicated for the genes mentioned?

    <p>GRCh38</p> Signup and view all the answers

    In the example provided, what is used to retrieve information about multiple gene symbols?

    <p>symbol_post</p> Signup and view all the answers

    What is the display name for the BRCA2 gene as mentioned in the lookup output?

    <p>BRCA2</p> Signup and view all the answers

    What does the strand value of BRCA2 indicate in the lookup result?

    <p>1 (positive)</p> Signup and view all the answers

    Which of the following is NOT a gene symbol mentioned in the content?

    <p>BBS5</p> Signup and view all the answers

    What is the description output for the BRCA2 gene?

    <p>BRCA2 DNA repair associated</p> Signup and view all the answers

    What type of biotype is BBS4 classified as?

    <p>protein_coding</p> Signup and view all the answers

    What is contained in the 'description' field for BBS4?

    <p>Bardet-Biedl syndrome 4 [Source: HGNC Symbol; Acc: HGNC:969]</p> Signup and view all the answers

    Which server is used to obtain the cDNA for the transcript_id?

    <p><a href="http://rest.ensembl.org">http://rest.ensembl.org</a></p> Signup and view all the answers

    What does the variable 'transcript_id' contain after processing from the lookup?

    <p>ENST00000380152</p> Signup and view all the answers

    What is the value of the 'display_name' field for BBS4?

    <p>BBS4</p> Signup and view all the answers

    Which of the following fields indicates the genomic range of BBS4?

    <p>start and end</p> Signup and view all the answers

    What type of molecule is represented in the sequence obtained for the transcript_id?

    <p>dna</p> Signup and view all the answers

    How is the response from the REST API request formatted for the sequence?

    <p>text/x-fasta</p> Signup and view all the answers

    What was the conclusion of the 2017 review regarding first-generation GM crop animal feed?

    <p>There is no clear evidence of adverse effects on animal health.</p> Signup and view all the answers

    What effect would switching to non-GMO animal feed have according to research by Iowa State University?

    <p>Increase greenhouse gas emissions and food prices.</p> Signup and view all the answers

    What modern agricultural technique has been used for the last 100 years to create new crop varieties?

    <p>Mutation breeding using chemicals or radiation.</p> Signup and view all the answers

    How many studies have been conducted examining the health and environmental safety of GMO crops?

    <p>Over 3,000 studies.</p> Signup and view all the answers

    What is a potential consequence of continued use of GMO crops without proper assessment?

    <p>Emergence of resistant pest species.</p> Signup and view all the answers

    How many processed pseudogenes are mentioned in the database?

    <p>Approximately 10,668</p> Signup and view all the answers

    What is the main goal of the Earth BioGenome Project?

    <p>To sequence the genomes of all known eukaryotic species</p> Signup and view all the answers

    Which of the following is a characteristic of non-processed pseudogenes?

    <p>They are not translated into proteins.</p> Signup and view all the answers

    What is the approximate size of the Y chromosome in the human genome?

    <p>52 Mbp</p> Signup and view all the answers

    Which term describes genes that have diverged from a common ancestor?

    <p>Paralogues</p> Signup and view all the answers

    Which of the following is NOT an application of metagenomics?

    <p>Enhancing data extraction methods</p> Signup and view all the answers

    What is a potential purpose of pseudogenes in the genome?

    <p>To serve as a genetic reservoir for evolution</p> Signup and view all the answers

    Which gene is known as a phosphatase that removes phosphates from proteins?

    <p>PTEN</p> Signup and view all the answers

    What is a common feature of pseudogenes?

    <p>They resemble existing gene sequences but are non-functional</p> Signup and view all the answers

    What is one potential health implication associated with the human gut biome?

    <p>Disruptions in the gut biome can lead to worse cancer treatment outcomes</p> Signup and view all the answers

    Which chromosome contains the globin gene clusters mentioned?

    <p>Chromosome 11 and 16</p> Signup and view all the answers

    Which statement best describes orthologues?

    <p>Genes that perform the same function in different species</p> Signup and view all the answers

    What percentage of the human genome is made up of repeated sequences?

    <p>50%</p> Signup and view all the answers

    What is the estimated cost of the Earth BioGenome Project?

    <p>$4.7 billion</p> Signup and view all the answers

    Which pseudogene is recognized as a non-processed pseudogene?

    <p>HBBP1</p> Signup and view all the answers

    What can a genomic tree of life help to understand?

    <p>Processes of speciation, adaptation, and organism dependencies in ecosystems</p> Signup and view all the answers

    What defines a protein as hypothetical?

    <p>It is inferred from a genome sequence.</p> Signup and view all the answers

    What is the main role of post-translational modifications?

    <p>They add phosphates and sugars to proteins.</p> Signup and view all the answers

    What vitamin deficiency is highlighted as a major issue among children between ages 0-5?

    <p>Vitamin A deficiency</p> Signup and view all the answers

    What is Golden Rice genetically modified to contain?

    <p>Beta-carotene</p> Signup and view all the answers

    What has been the primary resistance to the introduction of Golden Rice in some countries?

    <p>Concerns over GMO seeds</p> Signup and view all the answers

    How many children become blind each year due to vitamin A deficiency?

    <p>250,000 to 500,000</p> Signup and view all the answers

    What type of modification has been specifically mentioned as an addition to the COVID-19 spike protein?

    <p>Glycosylation</p> Signup and view all the answers

    What does the addition of a phosphate by a kinase typically influence in proteins?

    <p>It modifies their behavior.</p> Signup and view all the answers

    What is the consequence of cells undergoing senescence?

    <p>They irreversibly withdraw from the cell cycle.</p> Signup and view all the answers

    Which of the following accurately describes Long-non-coding RNAs (lncRNAs)?

    <p>Long sequences greater than 200 nucleotides.</p> Signup and view all the answers

    How frequently do Single Nucleotide Polymorphisms (SNPs) occur in unrelated individuals?

    <p>Approximately 0.1% of sites.</p> Signup and view all the answers

    What privacy concern is raised by the use of consumer DNA sites by law enforcement?

    <p>Potential access to private genetic information without consent.</p> Signup and view all the answers

    What common feature do microRNAs possess?

    <p>They are approximately 22 nucleotides in length.</p> Signup and view all the answers

    What legal implication arose from the use of GEDmatch by law enforcement?

    <p>A judge approved a warrant to search the database.</p> Signup and view all the answers

    Which gene's involvement was critical in solving the 1984 murder case mentioned?

    <p>Thomas Garner.</p> Signup and view all the answers

    How many differences are estimated between any two unrelated individuals in terms of SNPs?

    <p>Approximately 30 million.</p> Signup and view all the answers

    What is the score when traversing left from a specific cell that contains a score of -4?

    <p>-6</p> Signup and view all the answers

    Which of the following directions is evaluated for scoring in this algorithm?

    <p>Left</p> Signup and view all the answers

    If a cell has a gap score of -2, what is inferred about the corresponding alignment?

    <p>An extended gap</p> Signup and view all the answers

    What is the score associated with a mismatch in this context?

    <p>-1</p> Signup and view all the answers

    What is the total score for the direction denoted by cell C 4 within the scoring matrix?

    <p>-8</p> Signup and view all the answers

    What is the score when traversing the LEFT direction from cell A1?

    <p>-4</p> Signup and view all the answers

    Which of the following scores would result from traversing vertically from cell A2?

    <p>-4</p> Signup and view all the answers

    What direction gives a score of 1 when starting from cell A0?

    <p>Vertical</p> Signup and view all the answers

    What is the gap score used when evaluating all directions from a starting cell?

    <p>-2</p> Signup and view all the answers

    What score would be obtained if traversing diagonally from cell A3?

    <p>-6</p> Signup and view all the answers

    What does the Needleman-Wunsch algorithm primarily accomplish?

    <p>Produces optimal global alignment</p> Signup and view all the answers

    What is a key feature of scoring matrices used in sequence alignment?

    <p>They help quantify similarity or identity between sequences.</p> Signup and view all the answers

    In the context of pairwise sequence algorithms, what is dynamic programming utilized for?

    <p>To build optimal alignments from smaller subsequences.</p> Signup and view all the answers

    How does the Smith-Waterman algorithm differ from the Needleman-Wunsch algorithm?

    <p>It focuses on local alignment.</p> Signup and view all the answers

    What is the purpose of including a gap penalty in alignment algorithms?

    <p>To account for insertions and deletions in sequences.</p> Signup and view all the answers

    Which method is used to trace back through an alignment score matrix in the Needleman-Wunsch algorithm?

    <p>Recursive traversal</p> Signup and view all the answers

    What type of scoring technique does an Identity Matrix use for aligning identical bases?

    <p>A score of 1 for matches and 0 otherwise</p> Signup and view all the answers

    What is a primary advantage of the BLOSUM matrices over PAM matrices?

    <p>They provide broader evolutionary coverage.</p> Signup and view all the answers

    What score is assigned when a match occurs during traversal of the algorithm?

    <p>1</p> Signup and view all the answers

    Which direction provides the maximum score during the evaluation process?

    <p>VERTICAL</p> Signup and view all the answers

    What should be kept to potentially produce the maximum value in the algorithm?

    <p>All arrows</p> Signup and view all the answers

    What is assigned to the cell for a mismatch during the traversal?

    <p>-1</p> Signup and view all the answers

    What happens to the score in a cell when the maximum value is reached?

    <p>It becomes the maximum value</p> Signup and view all the answers

    Which of the following scores indicates a gap in the evaluation?

    <p>-2</p> Signup and view all the answers

    During evaluation, scoring for cell values should prioritize which direction?

    <p>The direction yielding the best score</p> Signup and view all the answers

    What is the effect of selecting only the arrows that contribute to the maximum score?

    <p>Maximizes efficiency</p> Signup and view all the answers

    What value is assigned to the cell when traversing from diagonal to the next cell?

    <p>1</p> Signup and view all the answers

    What score does the cell contain if no valid path is discovered?

    <p>-4</p> Signup and view all the answers

    What is the time complexity of the basic Needleman-Wunsch algorithm?

    <p>O(n^2)</p> Signup and view all the answers

    Which modification of the Needleman-Wunsch algorithm is central to the Smith-Waterman algorithm?

    <p>Setting the initialization scores to zero</p> Signup and view all the answers

    What is the purpose of tracing back in the Smith-Waterman algorithm?

    <p>To find the maximum score path</p> Signup and view all the answers

    In the Smith-Waterman algorithm, what is done if F(i, j) is found to be less than a threshold t?

    <p>The value is replaced with zero</p> Signup and view all the answers

    Which of the following statements about the memory requirements of algorithms is true?

    <p>Memory requirements for the Needleman-Wunsch algorithm can be improved to O(n).</p> Signup and view all the answers

    What is the scoring method used in the Smith-Waterman algorithm for calculating F(i, j)?

    <p>max(F(i - 1, j - 1) + s(xi, yj), 0)</p> Signup and view all the answers

    What is the result of the FOPT calculation in the Smith-Waterman algorithm?

    <p>It finds the highest score in the matrix.</p> Signup and view all the answers

    Which of the following methods can be used to initiate a local alignment in the Smith-Waterman algorithm?

    <p>Starting at cells scoring above a threshold</p> Signup and view all the answers

    What is the correct syntax for defining an empty dictionary in Python?

    <p>dict = {}</p> Signup and view all the answers

    Which of the following correctly handles a situation when looking up a value that does not exist in a dictionary?

    <p>Using 'in' before accessing the dictionary will prevent a KeyError.</p> Signup and view all the answers

    Which types of objects are valid keys in a Python dictionary?

    <p>Strings and numbers</p> Signup and view all the answers

    What is the primary function of a dictionary in Python compared to a traditional array?

    <p>It retrieves values based on custom keys.</p> Signup and view all the answers

    What would happen if you try to look up a key that does not exist in a dictionary without handling it properly?

    <p>It will raise a KeyError.</p> Signup and view all the answers

    Which method is used to obtain a list of keys from a dictionary?

    <p>dict.keys()</p> Signup and view all the answers

    What syntax allows you to iterate through both keys and values in a dictionary?

    <p>for k, v in dict.items():</p> Signup and view all the answers

    What is the expected output of print(dict.items()) given a dictionary where keys are 'a', 'o', 'g'?

    <p>[('a', 'alpha'), ('o', 'omega'), ('g', 'gamma')]</p> Signup and view all the answers

    What is one recommended strategy for building a Python program effectively?

    <p>Identify and tackle milestones incrementally.</p> Signup and view all the answers

    What performance advantage does a dictionary provide in programming?

    <p>It organizes data through key-value pairs for quick access.</p> Signup and view all the answers

    What will be the result of trying to print the value of a key that does not exist in a dictionary?

    <p>It raises a KeyError.</p> Signup and view all the answers

    Which method can be used to safely access a value in a dictionary without raising an error if the key is not present?

    <p>dict.get(key)</p> Signup and view all the answers

    What happens if you try to delete a key that is not present in the dictionary using the del statement?

    <p>It raises a KeyError.</p> Signup and view all the answers

    How are keys stored in a Python dictionary when items are inserted?

    <p>They maintain the order of insertion.</p> Signup and view all the answers

    Which of the following statements about dictionaries is incorrect?

    <p>The order of items is not preserved.</p> Signup and view all the answers

    What method should be used to remove an item from a dictionary while avoiding exceptions?

    <p>Check for existence before using del or pop.</p> Signup and view all the answers

    When iterating over a dictionary, what is the default behavior concerning the elements accessed?

    <p>It iterates over the keys.</p> Signup and view all the answers

    What does the statement matrix[1,2] = 5 signify in the context of dictionaries?

    <p>Inserts key (1, 2) with value 5 into matrix.</p> Signup and view all the answers

    What does the function pick_a_codon return?

    <p>A codon based on a specific amino acid</p> Signup and view all the answers

    What does the variable 'stop_test' represent in the context of the code?

    <p>The position of the first stop codon detected</p> Signup and view all the answers

    What will happen if the count of codons is equal to two in the select_random calculation?

    <p>It will select between 0 and 1</p> Signup and view all the answers

    Which statement is true about seq1 in the code?

    <p>seq1 is not an open reading frame.</p> Signup and view all the answers

    What is the purpose of the print statement after checking seq1?

    <p>To indicate whether seq1 contains a stop codon</p> Signup and view all the answers

    How is the codon_list created in the pick_a_codon function?

    <p>By matching amino acids with their corresponding codons</p> Signup and view all the answers

    What should be modified in the approach to selecting a new codon from the codon_list?

    <p>Use select_random without adjusting for the index</p> Signup and view all the answers

    What is the role of the make_aminos function indicated in the sample code?

    <p>To create a dictionary of amino acids</p> Signup and view all the answers

    What is the purpose of the 'make_aminos' function in the provided code?

    <p>To create a dictionary mapping codons to amino acids.</p> Signup and view all the answers

    In Part 2 of the process, what is primarily stored in the 'protein1' string?

    <p>The growing sequence of amino acids translated from seq1.</p> Signup and view all the answers

    What programming structure is suggested for pulling codons from seq1?

    <p>A while loop that pulls three nucleotides at a time.</p> Signup and view all the answers

    When developing the dictionary to be used in the program, which of the following is NOT a characteristic of the entries?

    <p>A codon can correspond to multiple amino acids.</p> Signup and view all the answers

    How is the 'new_seq1' string expected to be modified during the translation process?

    <p>It accumulates new codons based on the translated amino acid sequence.</p> Signup and view all the answers

    What type of data structure is used to store the codon-to-amino acid mappings?

    <p>A dictionary.</p> Signup and view all the answers

    Which sequence is confirmed to be composed of multiples of 3 for processing?

    <p>seq1.</p> Signup and view all the answers

    What should the initial value of the loop variable be to start translating seq1?

    <ol start="0"> <li></li> </ol> Signup and view all the answers

    Study Notes

    Genome Browsers

    • UCSC, NCBI, and Ensembl are the “Big 3” genome browsers
    • NCBI includes ClinVar, a database with publicly submitted variations, and dbSNP, a database of SNPs
    • GnomAD is a genome aggregation database and also a good source for variant information

    UCSC

    • UCSC Genome Browser is available at http://genome.ucsc.edu/
    • UCSC provides the Track View and Table Browser
    • UCSC offers two assemblies: GRCh38 and GRCh37 with different levels of annotation
    • GRCh38 has fewer gaps and is newer, while GRCh37 is older and has more annotations

    NCBI

    • NCBI is available at https://www.ncbi.nlm.nih.gov/
    • NCBI focuses on database queries.
    • ClinVar displays submitted variations and observed disease associations
    • dbSNP (a database of SNPs) is also housed within NCBI
    • GEO (Gene Expression Omnibus) offers over 100,000 expression experiments
    • PubMed lists publications

    Ensembl

    Gnomad

    • Gnomad is available at https://gnomad.broadinstitute.org/
    • Provides information on position, change, minor allele frequency (MAF), consequence, and clinical significance
    • Gnomad is frequently updated, so the content in screenshots may not always be the same.

    UCSC Anatomy

    • Coordinates can be entered by gene symbol or coordinates
    • Track View displays various data types
    • You can zoom in/out and pan left/right
    • You can access DNA view and analyze at the amino acid level
    • The strand direction reveals if the gene is on the reverse (3’ to 5’) strand

    UCSC Tracks

    • The Track View is organized by type, such as mapping, genes, phenotype, etc.
    • Users can turn on/off different tracks of data
    • Tracks provide links to additional information

    NCBI

    • NCBI's data extraction tools are complicated

    Ensembl Automation

    • Ensembl provides automated extraction tools
    • It offers a Perl API for programmatic data extraction and a REST (representational state transfer) interface that supports Perl, Python, Java, and other languages

    Proteomics

    • Proteins need to be experimentally verified to exist and for their properties to be measured
    • Post-translational modifications change protein behavior from what the genetic sequence alone indicates
    • Addition of a phosphate by a kinase is a post-translational modification
    • Addition of sugar to a protein is a post-translational modification, like in the COVID-19 spike protein

    Vitamin A Deficiency

    • 250,000 to 500,000 children become blind every year due to vitamin A deficiency
    • Data on vitamin A deficiency worldwide collected from 1991 to 2013

    Golden Rice

    • Genetically modified to contain beta-carotene, a precursor to vitamin A
    • Anti-GMO groups fought against the introduction of golden rice in Bangladesh

    Concerns about GMOs

    • A 2017 review found no evidence that GMO animal feed has adverse effects on animal health
    • Trillions of animals have been fed with GMO feed without negative health consequences
    • Switching to non-GMO feed has negative consequences
      • Increases greenhouse gas emissions by 7%
      • Increases land use
      • Raises food prices

    Mutation Breeding

    • Modern agriculture has practiced mutation breeding for the last 100 years
    • Chemicals or radiation are used to increase mutations, resulting in new varieties
    • This process is not considered GMO
    • A 2017 PNAS report found no evidence that foods from GE crops are less safe than foods from non-GE crops

    Earth BioGenome Project

    • Proposed to sequence the genomes of all named extant eukaryotes, about 2 million species
    • Aims to create a digital library of life on earth
    • Projected to cost $4.7 billion

    Metagenomics

    • Analysis of all the genomics in a coherent environmental sample
    • Examples of environments studied: 1 ml of ocean water, 1 gram of soil, and human gut

    Applications of Metagenomics

    • Studies human gut biome and its relation to health
    • Applies to agriculture, environmental remediation, anthropology, and biotech

    Gene Terminology

    • Paralogues are genes that diverged from a common ancestor
    • Orthologues are genes that perform the same function in different species

    Pseudogenes

    • Are typically non-functional genes
    • Appear like existing gene sequences
    • Likely arise from DNA duplication or retro-transposition

    Pseudogene.org

    • Comprehensive database and comparison platform for pseudogene annotation
    • Estimates approximately 10,668 processed pseudogenes and 14,000 non-processed pseudogenes

    PTEN: Phosphatase and Tensin Homolog

    • Is a phosphatase, which removes phosphates from a protein

    Gene Names

    • Ambiguous, have aliases, synonyms, and historical names
    • Example: Hemoglobin Subunit Gamma 1

    Globin Gene Clusters

    • Located on chromosomes 16 and 11

    Human Genome

    • Contains 3.2 * 109 nucleotides (haploid)
    • Chromosome sizes range from 49 Mbp to 279 Mbp
    • Repeated sequences make up 50% of the genome

    Cell Senescence

    • Can result from the normal shortening of telomeric DNA
    • Cells irreversibly withdraw from the cell cycle and fail to respond to proliferation-inducing stimuli

    RNA

    • Thousands of transcripts (alternatively spliced) in a human genome
    • LncRNAs: Long-non-coding RNAs (length > 200 nucleotides)
    • MicroRNAs: ~22 nucleotides in length whose function is RNA silencing

    SNPs

    • Single Nucleotide Polymorphisms
    • Compare any 2 un-related individuals, they will be different at approximately 0.1% of sites (1 in 1,000).
    • 3109 * 0.001 = 3107 = 30,000,000 differences

    Privacy Concerns

    • 20 million people have uploaded their genetic profiles to consumer DNA sites
    • Concerns that police can access this genetic data to solve crimes
    • First case in which a judge approved a warrant to penetrate GEDmatch and search its database

    Case Example

    • A Florida jury found Thomas Garner guilty of first-degree murder for the 1984 death of Navy recruit Pamela Cahanes using genetic data

    Atomic Level Interactions

    • DNA binding is a significant process at the atomic level.

    Cross-Species Genome Similarity

    • "Islands of Similarity" indicate areas of shared genetic material between different species.

    Scoring Matrices

    • Identity matrices assign a score of 1 for identical characters and 0 for non-identical ones.
    • Similarity matrices consider the similarity between characters, allowing for scores that reflect the evolutionary relationship between them.
    • PAM and BLOSUM matrices are examples of commonly used scoring matrices.

    Pairwise Sequence Algorithms

    • Optimal alignments can be constructed based on previous solutions for smaller subsequences.
    • Dynamic programming is used to optimize the process, by calculating the best alignment that ends at a given pair of positions (i,j).

    Key Elements of Sequence Alignment Algorithms

    • Global pairwise alignment aims to align the entire length of two sequences, while local alignment identifies the best possible local alignment within the sequences.

    Needleman-Wunsch Algorithm

    • This algorithm produces an optimal global alignment between two sequences.
    • It uses a 2D matrix to store partial alignment scores and iteratively calculates the best alignment.
    • Scoring functions are used to assign values to matches, mismatches, and gaps.
    • The actual alignment is constructed during the traceback phase of the algorithm, starting from the cell with the highest score and tracing back to the beginning.

    Smith-Waterman Algorithm

    • This algorithm focuses on local alignments, seeking the most significant region of similarity within two sequences.
    • It modifies the Needleman-Wunsch algorithm to account for local alignment by setting negative scores to 0.
    • The traceback process starts at the highest-scoring cell and ends when 0 is encountered.
    • The algorithm can also be used to identify all local alignments that exceed a designated threshold.

    Computational Complexity of Needleman-Wunsch Algorithm

    • The algorithm has a computational complexity of O(n^2) for both speed and memory.
    • Advanced versions can reduce memory usage to O(n), achieving linear space complexity.

    Global vs. Local Alignment

    • Global alignment seeks to align the entire lengths of two sequences.
    • Local alignment identifies the best possible local region of similarity within two sequences.

    Smith-Waterman Algorithm

    • The initialization step sets the scores for the first row and column of the matrix to 0.
    • The iteration step computes the score of each cell based on the maximum value obtained from three possible directions: alignment with a gap in the first sequence, alignment with a gap in the second sequence, or aligning the current characters.
    • The termination and traceback phase involves identifying the cell with the highest overall score and tracing back to the starting point to reconstruct the optimal alignment.

    Homework Assignment

    • Students are required to complete a matrix for the Needleman-Wunsch algorithm and recover the corresponding alignment.
    • They also need to create a "B" matrix in Python.

    Recursion Example - Fibonacci Sequence

    • The Fibonacci sequence is a numerical series where each number is the sum of the two preceding numbers.
    • The sequence starts with 0 and 1.
    • To find the Fibonacci sequence:
      • Add the last two digits to get the next digit.
      • Repeat the process to generate the remaining digits.

    Dictionary Hash Tables

    • Python's dict is a hash table data structure that efficiently stores key-value pairs.
    • Dicts are created by using curly braces {} and are written as a series of key:value pairs, e.g., dict = {key1:value1, key2:value2, ...}.
    • An empty dict is represented by an empty pair of curly braces {}.

    Dict Access Operations

    • Accessing or setting a value within a dict uses square brackets:
      • dict['foo'] looks up the value associated with the key 'foo'.
      • dict['foo'] = bar assigns the value bar to the key 'foo'.
    • Keys in dicts can be strings, numbers, and tuples.
    • Values can be of any data type.
    • Attempting to access a non-existent key raises a KeyError.
    • Use the in operator to check if a key exists: 'foo' in dict.
    • The dict.get(key) method returns the value associated with the key, or None if the key is not present.
    • The dict.get(key, not-found) method allows specifying a default value to return if the key is not found.

    Example of Building a Dict

    • A dict can be built incrementally by starting with an empty dict {} and adding key-value pairs using the assignment operator:
      • dict['a'] = 'alpha'
      • dict['g'] = 'gamma'
      • dict['o'] = 'omega'

    Using Dictionaries as N-dimensional Arrays

    • Dictionaries can function like N-dimensional arrays, storing values associated with multi-dimensional keys.
    • The keys can be tuples or other composite data structures.
    • Example: matrix[1, 2] = 5 assigns the value 5 to the key (1, 2).

    Dict Iteration and Ordering

    • By default, iterating over a dictionary (for key in dict: ...) iterates through its keys in an arbitrary order.
    • Key-value pairs in a dictionary maintain the order in which they were originally inserted.
    • dict.keys() returns a list of the keys.
    • dict.values() returns a list of the values.
    • dict.items() returns a list of (key, value) tuples, providing the most efficient way to access all the key-value data.
    • These list methods' output can be sorted using the sorted() function.

    Performance Advantages of Dicts

    • Dictionaries provide a significant performance advantage for looking up and managing data based on keys.
    • Utilize dictionaries when possible to organize and access data efficiently.

    Incremental Development (Rapid Prototyping) in Python

    • Don't write entire Python programs at once.
    • Identify smaller milestones and write code to reach them one at a time.
    • Use print statements to inspect data structures at each milestone.
    • Use sys.exit(0) to halt program execution after a milestone to focus on specific areas.
    • This gradual approach allows for quick testing and iteration, making it easier to build and refine complex programs.

    Introduction to Lab 4 - Creating a Dictionary of Codons and Amino Acids

    • The lab focuses on building a dictionary representing the mapping between codons (three-nucleotide sequences) and their corresponding amino acids.
    • You will complete a dictionary (aminos) containing this information.
    • The lab provides code to translate DNA sequences into amino acid sequences using the aminos dictionary.

    Lab 4 Key Concepts:

    • The code includes a function pick_a_codonthat randomly selects a codon for a given amino acid (if there are multiple codons that map to the same amino acid).
    • The lab emphasizes understanding the structure and manipulation of dictionaries in Python for tasks like translation (DNA -> Amino Acid).
    • It also introduces concepts like open reading frames (sequences that are translated into proteins until a stop codon is encountered).

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Panorama of Life Chapter 3 PDF
    Python Dicts and Files PDF

    Description

    Test your knowledge on the major genome browsers: UCSC, NCBI, and Ensembl. Learn about their features, databases, and offerings, including ClinVar and dbSNP. This quiz covers essential details that every genomics enthusiast should know.

    More Like This

    Genome Editing Tools Quiz
    5 questions

    Genome Editing Tools Quiz

    TrustyRainbow9449 avatar
    TrustyRainbow9449
    Genome Variation in Bacterial Chromosomes Quiz
    18 questions
    Genome Structure and Chromosomes
    40 questions
    Genome Sequencing and Bacterial Infections
    30 questions
    Use Quizgecko on...
    Browser
    Browser