Genome Browsers Overview Quiz

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

What does ClinVar primarily focus on?

  • Nucleotide database management
  • Publicly submitted variations and observations of diseases (correct)
  • Gene expression studies
  • Automated data extraction tools

Which of the following is a database that contains information on single nucleotide polymorphisms?

  • GEO
  • Genomic Data Commons
  • dbSNP (correct)
  • Ensembl

Which resource is known for containing hundreds of thousands of expression experiments?

  • GEO (correct)
  • PubMed
  • NCBI
  • ClinVar

What is the significance of the star rating in ClinVar results?

<p>It indicates the level of confidence in the results (D)</p> Signup and view all the answers

What is one potential drawback of using the NCBI extraction tools?

<p>They can be overly complicated (B)</p> Signup and view all the answers

Which of the following links to a faster European mirror site for genomic data?

<p><a href="http://genome-euro.ucsc.edu">http://genome-euro.ucsc.edu</a> (C)</p> Signup and view all the answers

What does the acronym GEO stand for?

<p>Gene Expression Omnibus (B)</p> Signup and view all the answers

What is a feature that UCSC Genome Browser offers for exploring gene data?

<p>Track Viewing (D)</p> Signup and view all the answers

What feature makes Ensembl particularly useful for data extraction?

<p>Automated extraction tools (B)</p> Signup and view all the answers

What does the 'zoom in by factor of 3X' feature enable a user to do?

<p>Increase the detail of visible genetic data (A)</p> Signup and view all the answers

Which database is specifically known for providing standardized clinical significance of genetic variants?

<p>ClinVar (A)</p> Signup and view all the answers

What does the direction of the arrows in a gene representation indicate?

<p>The gene is on the reverse (3' to 5') strand (A)</p> Signup and view all the answers

What distinguishes GRCh38 from GRCh37 in the UCSC Genome Browser?

<p>GRCh38 has fewer gaps and newer annotations. (C)</p> Signup and view all the answers

What tool does Ensembl provide for data extraction that is not covered in UCSC?

<p>Table Browser (A)</p> Signup and view all the answers

How can a user highlight a specific region for zooming?

<p>By dragging the mouse over the scale or chromosome track (B)</p> Signup and view all the answers

What type of analysis can gnomAD provide when looking up variants?

<p>Minor allele frequency and clinical significance (B)</p> Signup and view all the answers

What information can be accessed by clicking on a transcript?

<p>A page describing the transcript and links to resources (D)</p> Signup and view all the answers

Which browser would be best for database queries related to genomic data?

<p>NCBI (B)</p> Signup and view all the answers

What is the purpose of the UCSC tracks feature?

<p>To organize data by types such as Mapping and Genes (A)</p> Signup and view all the answers

What functionality does the 'Zoom in/Zoom out' feature provide in the UCSC browser?

<p>Focusing on a specific genomic region (C)</p> Signup and view all the answers

What can be observed when zoomed in to the level of DNA and amino acids?

<p>All six coding frames, three forward and three reverse (C)</p> Signup and view all the answers

Which is NOT a part of the anatomy of the UCSC Genome Browser?

<p>Variant classifications (B)</p> Signup and view all the answers

What do the coordinates chr11:110,045,605-110,767,437 represent?

<p>The position of a protein coding gene on chromosome 11 (A)</p> Signup and view all the answers

What happens when a user mouses over an exon?

<p>The corresponding intron number is displayed (A)</p> Signup and view all the answers

What programming languages are mentioned as having a REST API for data extraction?

<p>Perl and Java (B)</p> Signup and view all the answers

What type of data structure is indicated by the term 'lookup'?

<p>Dictionary (B)</p> Signup and view all the answers

What is the primary biotype of the BRCA2 gene as described in the lookup?

<p>protein_coding (B)</p> Signup and view all the answers

Which gene is associated with Bardet-Biedl syndrome according to the database lookup?

<p>BBS2 (A)</p> Signup and view all the answers

What assembly name is indicated for the genes mentioned?

<p>GRCh38 (B)</p> Signup and view all the answers

In the example provided, what is used to retrieve information about multiple gene symbols?

<p>symbol_post (D)</p> Signup and view all the answers

What is the display name for the BRCA2 gene as mentioned in the lookup output?

<p>BRCA2 (B)</p> Signup and view all the answers

What does the strand value of BRCA2 indicate in the lookup result?

<p>1 (positive) (D)</p> Signup and view all the answers

Which of the following is NOT a gene symbol mentioned in the content?

<p>BBS5 (B)</p> Signup and view all the answers

What is the description output for the BRCA2 gene?

<p>BRCA2 DNA repair associated (C)</p> Signup and view all the answers

What type of biotype is BBS4 classified as?

<p>protein_coding (C)</p> Signup and view all the answers

What is contained in the 'description' field for BBS4?

<p>Bardet-Biedl syndrome 4 [Source: HGNC Symbol; Acc: HGNC:969] (A)</p> Signup and view all the answers

Which server is used to obtain the cDNA for the transcript_id?

<p><a href="http://rest.ensembl.org">http://rest.ensembl.org</a> (B)</p> Signup and view all the answers

What does the variable 'transcript_id' contain after processing from the lookup?

<p>ENST00000380152 (B)</p> Signup and view all the answers

What is the value of the 'display_name' field for BBS4?

<p>BBS4 (D)</p> Signup and view all the answers

Which of the following fields indicates the genomic range of BBS4?

<p>start and end (A)</p> Signup and view all the answers

What type of molecule is represented in the sequence obtained for the transcript_id?

<p>dna (B)</p> Signup and view all the answers

How is the response from the REST API request formatted for the sequence?

<p>text/x-fasta (D)</p> Signup and view all the answers

What was the conclusion of the 2017 review regarding first-generation GM crop animal feed?

<p>There is no clear evidence of adverse effects on animal health. (B)</p> Signup and view all the answers

What effect would switching to non-GMO animal feed have according to research by Iowa State University?

<p>Increase greenhouse gas emissions and food prices. (B)</p> Signup and view all the answers

What modern agricultural technique has been used for the last 100 years to create new crop varieties?

<p>Mutation breeding using chemicals or radiation. (A)</p> Signup and view all the answers

How many studies have been conducted examining the health and environmental safety of GMO crops?

<p>Over 3,000 studies. (B)</p> Signup and view all the answers

What is a potential consequence of continued use of GMO crops without proper assessment?

<p>Emergence of resistant pest species. (D)</p> Signup and view all the answers

How many processed pseudogenes are mentioned in the database?

<p>Approximately 10,668 (B)</p> Signup and view all the answers

What is the main goal of the Earth BioGenome Project?

<p>To sequence the genomes of all known eukaryotic species (C)</p> Signup and view all the answers

Which of the following is a characteristic of non-processed pseudogenes?

<p>They are not translated into proteins. (D)</p> Signup and view all the answers

What is the approximate size of the Y chromosome in the human genome?

<p>52 Mbp (A)</p> Signup and view all the answers

Which term describes genes that have diverged from a common ancestor?

<p>Paralogues (C)</p> Signup and view all the answers

Which of the following is NOT an application of metagenomics?

<p>Enhancing data extraction methods (B)</p> Signup and view all the answers

What is a potential purpose of pseudogenes in the genome?

<p>To serve as a genetic reservoir for evolution (D)</p> Signup and view all the answers

Which gene is known as a phosphatase that removes phosphates from proteins?

<p>PTEN (A)</p> Signup and view all the answers

What is a common feature of pseudogenes?

<p>They resemble existing gene sequences but are non-functional (D)</p> Signup and view all the answers

What is one potential health implication associated with the human gut biome?

<p>Disruptions in the gut biome can lead to worse cancer treatment outcomes (C)</p> Signup and view all the answers

Which chromosome contains the globin gene clusters mentioned?

<p>Chromosome 11 and 16 (A)</p> Signup and view all the answers

Which statement best describes orthologues?

<p>Genes that perform the same function in different species (A)</p> Signup and view all the answers

What percentage of the human genome is made up of repeated sequences?

<p>50% (D)</p> Signup and view all the answers

What is the estimated cost of the Earth BioGenome Project?

<p>$4.7 billion (B)</p> Signup and view all the answers

Which pseudogene is recognized as a non-processed pseudogene?

<p>HBBP1 (D)</p> Signup and view all the answers

What can a genomic tree of life help to understand?

<p>Processes of speciation, adaptation, and organism dependencies in ecosystems (A)</p> Signup and view all the answers

What defines a protein as hypothetical?

<p>It is inferred from a genome sequence. (D)</p> Signup and view all the answers

What is the main role of post-translational modifications?

<p>They add phosphates and sugars to proteins. (C)</p> Signup and view all the answers

What vitamin deficiency is highlighted as a major issue among children between ages 0-5?

<p>Vitamin A deficiency (C)</p> Signup and view all the answers

What is Golden Rice genetically modified to contain?

<p>Beta-carotene (C)</p> Signup and view all the answers

What has been the primary resistance to the introduction of Golden Rice in some countries?

<p>Concerns over GMO seeds (B)</p> Signup and view all the answers

How many children become blind each year due to vitamin A deficiency?

<p>250,000 to 500,000 (C)</p> Signup and view all the answers

What type of modification has been specifically mentioned as an addition to the COVID-19 spike protein?

<p>Glycosylation (C)</p> Signup and view all the answers

What does the addition of a phosphate by a kinase typically influence in proteins?

<p>It modifies their behavior. (B)</p> Signup and view all the answers

What is the consequence of cells undergoing senescence?

<p>They irreversibly withdraw from the cell cycle. (C)</p> Signup and view all the answers

Which of the following accurately describes Long-non-coding RNAs (lncRNAs)?

<p>Long sequences greater than 200 nucleotides. (B)</p> Signup and view all the answers

How frequently do Single Nucleotide Polymorphisms (SNPs) occur in unrelated individuals?

<p>Approximately 0.1% of sites. (A)</p> Signup and view all the answers

What privacy concern is raised by the use of consumer DNA sites by law enforcement?

<p>Potential access to private genetic information without consent. (B)</p> Signup and view all the answers

What common feature do microRNAs possess?

<p>They are approximately 22 nucleotides in length. (B)</p> Signup and view all the answers

What legal implication arose from the use of GEDmatch by law enforcement?

<p>A judge approved a warrant to search the database. (C)</p> Signup and view all the answers

Which gene's involvement was critical in solving the 1984 murder case mentioned?

<p>Thomas Garner. (C)</p> Signup and view all the answers

How many differences are estimated between any two unrelated individuals in terms of SNPs?

<p>Approximately 30 million. (B)</p> Signup and view all the answers

What is the score when traversing left from a specific cell that contains a score of -4?

<p>-6 (A)</p> Signup and view all the answers

Which of the following directions is evaluated for scoring in this algorithm?

<p>Left (A)</p> Signup and view all the answers

If a cell has a gap score of -2, what is inferred about the corresponding alignment?

<p>An extended gap (C)</p> Signup and view all the answers

What is the score associated with a mismatch in this context?

<p>-1 (A)</p> Signup and view all the answers

What is the total score for the direction denoted by cell C 4 within the scoring matrix?

<p>-8 (A)</p> Signup and view all the answers

What is the score when traversing the LEFT direction from cell A1?

<p>-4 (D)</p> Signup and view all the answers

Which of the following scores would result from traversing vertically from cell A2?

<p>-4 (C)</p> Signup and view all the answers

What direction gives a score of 1 when starting from cell A0?

<p>Vertical (B)</p> Signup and view all the answers

What is the gap score used when evaluating all directions from a starting cell?

<p>-2 (A)</p> Signup and view all the answers

What score would be obtained if traversing diagonally from cell A3?

<p>-6 (C)</p> Signup and view all the answers

What does the Needleman-Wunsch algorithm primarily accomplish?

<p>Produces optimal global alignment (C)</p> Signup and view all the answers

What is a key feature of scoring matrices used in sequence alignment?

<p>They help quantify similarity or identity between sequences. (D)</p> Signup and view all the answers

In the context of pairwise sequence algorithms, what is dynamic programming utilized for?

<p>To build optimal alignments from smaller subsequences. (A)</p> Signup and view all the answers

How does the Smith-Waterman algorithm differ from the Needleman-Wunsch algorithm?

<p>It focuses on local alignment. (B)</p> Signup and view all the answers

What is the purpose of including a gap penalty in alignment algorithms?

<p>To account for insertions and deletions in sequences. (D)</p> Signup and view all the answers

Which method is used to trace back through an alignment score matrix in the Needleman-Wunsch algorithm?

<p>Recursive traversal (D)</p> Signup and view all the answers

What type of scoring technique does an Identity Matrix use for aligning identical bases?

<p>A score of 1 for matches and 0 otherwise (A)</p> Signup and view all the answers

What is a primary advantage of the BLOSUM matrices over PAM matrices?

<p>They provide broader evolutionary coverage. (D)</p> Signup and view all the answers

What score is assigned when a match occurs during traversal of the algorithm?

<p>1 (B)</p> Signup and view all the answers

Which direction provides the maximum score during the evaluation process?

<p>VERTICAL (A), LEFT (B), DIAGONAL (C)</p> Signup and view all the answers

What should be kept to potentially produce the maximum value in the algorithm?

<p>All arrows (D)</p> Signup and view all the answers

What is assigned to the cell for a mismatch during the traversal?

<p>-1 (B)</p> Signup and view all the answers

What happens to the score in a cell when the maximum value is reached?

<p>It becomes the maximum value (D)</p> Signup and view all the answers

Which of the following scores indicates a gap in the evaluation?

<p>-2 (C)</p> Signup and view all the answers

During evaluation, scoring for cell values should prioritize which direction?

<p>The direction yielding the best score (A)</p> Signup and view all the answers

What is the effect of selecting only the arrows that contribute to the maximum score?

<p>Maximizes efficiency (C)</p> Signup and view all the answers

What value is assigned to the cell when traversing from diagonal to the next cell?

<p>1 (A)</p> Signup and view all the answers

What score does the cell contain if no valid path is discovered?

<p>-4 (D)</p> Signup and view all the answers

What is the time complexity of the basic Needleman-Wunsch algorithm?

<p>O(n^2) (C)</p> Signup and view all the answers

Which modification of the Needleman-Wunsch algorithm is central to the Smith-Waterman algorithm?

<p>Setting the initialization scores to zero (A)</p> Signup and view all the answers

What is the purpose of tracing back in the Smith-Waterman algorithm?

<p>To find the maximum score path (A)</p> Signup and view all the answers

In the Smith-Waterman algorithm, what is done if F(i, j) is found to be less than a threshold t?

<p>The value is replaced with zero (C)</p> Signup and view all the answers

Which of the following statements about the memory requirements of algorithms is true?

<p>Memory requirements for the Needleman-Wunsch algorithm can be improved to O(n). (A)</p> Signup and view all the answers

What is the scoring method used in the Smith-Waterman algorithm for calculating F(i, j)?

<p>max(F(i - 1, j - 1) + s(xi, yj), 0) (B)</p> Signup and view all the answers

What is the result of the FOPT calculation in the Smith-Waterman algorithm?

<p>It finds the highest score in the matrix. (C)</p> Signup and view all the answers

Which of the following methods can be used to initiate a local alignment in the Smith-Waterman algorithm?

<p>Starting at cells scoring above a threshold (A)</p> Signup and view all the answers

What is the correct syntax for defining an empty dictionary in Python?

<p>dict = {} (B)</p> Signup and view all the answers

Which of the following correctly handles a situation when looking up a value that does not exist in a dictionary?

<p>Using 'in' before accessing the dictionary will prevent a KeyError. (C), Using dict.get('nonexistent') will return a default value unless specified. (D)</p> Signup and view all the answers

Which types of objects are valid keys in a Python dictionary?

<p>Strings and numbers (A)</p> Signup and view all the answers

What is the primary function of a dictionary in Python compared to a traditional array?

<p>It retrieves values based on custom keys. (D)</p> Signup and view all the answers

What would happen if you try to look up a key that does not exist in a dictionary without handling it properly?

<p>It will raise a KeyError. (B)</p> Signup and view all the answers

Which method is used to obtain a list of keys from a dictionary?

<p>dict.keys() (A)</p> Signup and view all the answers

What syntax allows you to iterate through both keys and values in a dictionary?

<p>for k, v in dict.items(): (D)</p> Signup and view all the answers

What is the expected output of print(dict.items()) given a dictionary where keys are 'a', 'o', 'g'?

<p>[('a', 'alpha'), ('o', 'omega'), ('g', 'gamma')] (B)</p> Signup and view all the answers

What is one recommended strategy for building a Python program effectively?

<p>Identify and tackle milestones incrementally. (D)</p> Signup and view all the answers

What performance advantage does a dictionary provide in programming?

<p>It organizes data through key-value pairs for quick access. (C)</p> Signup and view all the answers

What will be the result of trying to print the value of a key that does not exist in a dictionary?

<p>It raises a KeyError. (C)</p> Signup and view all the answers

Which method can be used to safely access a value in a dictionary without raising an error if the key is not present?

<p>dict.get(key) (C)</p> Signup and view all the answers

What happens if you try to delete a key that is not present in the dictionary using the del statement?

<p>It raises a KeyError. (D)</p> Signup and view all the answers

How are keys stored in a Python dictionary when items are inserted?

<p>They maintain the order of insertion. (A)</p> Signup and view all the answers

Which of the following statements about dictionaries is incorrect?

<p>The order of items is not preserved. (A)</p> Signup and view all the answers

What method should be used to remove an item from a dictionary while avoiding exceptions?

<p>Check for existence before using del or pop. (B)</p> Signup and view all the answers

When iterating over a dictionary, what is the default behavior concerning the elements accessed?

<p>It iterates over the keys. (D)</p> Signup and view all the answers

What does the statement matrix[1,2] = 5 signify in the context of dictionaries?

<p>Inserts key (1, 2) with value 5 into matrix. (B)</p> Signup and view all the answers

What does the function pick_a_codon return?

<p>A codon based on a specific amino acid (A)</p> Signup and view all the answers

What does the variable 'stop_test' represent in the context of the code?

<p>The position of the first stop codon detected (C)</p> Signup and view all the answers

What will happen if the count of codons is equal to two in the select_random calculation?

<p>It will select between 0 and 1 (A)</p> Signup and view all the answers

Which statement is true about seq1 in the code?

<p>seq1 is not an open reading frame. (B)</p> Signup and view all the answers

What is the purpose of the print statement after checking seq1?

<p>To indicate whether seq1 contains a stop codon (D)</p> Signup and view all the answers

How is the codon_list created in the pick_a_codon function?

<p>By matching amino acids with their corresponding codons (B)</p> Signup and view all the answers

What should be modified in the approach to selecting a new codon from the codon_list?

<p>Use select_random without adjusting for the index (A)</p> Signup and view all the answers

What is the role of the make_aminos function indicated in the sample code?

<p>To create a dictionary of amino acids (B)</p> Signup and view all the answers

What is the purpose of the 'make_aminos' function in the provided code?

<p>To create a dictionary mapping codons to amino acids. (D)</p> Signup and view all the answers

In Part 2 of the process, what is primarily stored in the 'protein1' string?

<p>The growing sequence of amino acids translated from seq1. (A)</p> Signup and view all the answers

What programming structure is suggested for pulling codons from seq1?

<p>A while loop that pulls three nucleotides at a time. (C)</p> Signup and view all the answers

When developing the dictionary to be used in the program, which of the following is NOT a characteristic of the entries?

<p>A codon can correspond to multiple amino acids. (C)</p> Signup and view all the answers

How is the 'new_seq1' string expected to be modified during the translation process?

<p>It accumulates new codons based on the translated amino acid sequence. (A)</p> Signup and view all the answers

What type of data structure is used to store the codon-to-amino acid mappings?

<p>A dictionary. (C)</p> Signup and view all the answers

Which sequence is confirmed to be composed of multiples of 3 for processing?

<p>seq1. (D)</p> Signup and view all the answers

What should the initial value of the loop variable be to start translating seq1?

<ol start="0"> <li>(C)</li> </ol> Signup and view all the answers

Flashcards are hidden until you start studying

Study Notes

Genome Browsers

  • UCSC, NCBI, and Ensembl are the “Big 3” genome browsers
  • NCBI includes ClinVar, a database with publicly submitted variations, and dbSNP, a database of SNPs
  • GnomAD is a genome aggregation database and also a good source for variant information

UCSC

  • UCSC Genome Browser is available at http://genome.ucsc.edu/
  • UCSC provides the Track View and Table Browser
  • UCSC offers two assemblies: GRCh38 and GRCh37 with different levels of annotation
  • GRCh38 has fewer gaps and is newer, while GRCh37 is older and has more annotations

NCBI

  • NCBI is available at https://www.ncbi.nlm.nih.gov/
  • NCBI focuses on database queries.
  • ClinVar displays submitted variations and observed disease associations
  • dbSNP (a database of SNPs) is also housed within NCBI
  • GEO (Gene Expression Omnibus) offers over 100,000 expression experiments
  • PubMed lists publications

Ensembl

Gnomad

  • Gnomad is available at https://gnomad.broadinstitute.org/
  • Provides information on position, change, minor allele frequency (MAF), consequence, and clinical significance
  • Gnomad is frequently updated, so the content in screenshots may not always be the same.

UCSC Anatomy

  • Coordinates can be entered by gene symbol or coordinates
  • Track View displays various data types
  • You can zoom in/out and pan left/right
  • You can access DNA view and analyze at the amino acid level
  • The strand direction reveals if the gene is on the reverse (3’ to 5’) strand

UCSC Tracks

  • The Track View is organized by type, such as mapping, genes, phenotype, etc.
  • Users can turn on/off different tracks of data
  • Tracks provide links to additional information

NCBI

  • NCBI's data extraction tools are complicated

Ensembl Automation

  • Ensembl provides automated extraction tools
  • It offers a Perl API for programmatic data extraction and a REST (representational state transfer) interface that supports Perl, Python, Java, and other languages

Proteomics

  • Proteins need to be experimentally verified to exist and for their properties to be measured
  • Post-translational modifications change protein behavior from what the genetic sequence alone indicates
  • Addition of a phosphate by a kinase is a post-translational modification
  • Addition of sugar to a protein is a post-translational modification, like in the COVID-19 spike protein

Vitamin A Deficiency

  • 250,000 to 500,000 children become blind every year due to vitamin A deficiency
  • Data on vitamin A deficiency worldwide collected from 1991 to 2013

Golden Rice

  • Genetically modified to contain beta-carotene, a precursor to vitamin A
  • Anti-GMO groups fought against the introduction of golden rice in Bangladesh

Concerns about GMOs

  • A 2017 review found no evidence that GMO animal feed has adverse effects on animal health
  • Trillions of animals have been fed with GMO feed without negative health consequences
  • Switching to non-GMO feed has negative consequences
    • Increases greenhouse gas emissions by 7%
    • Increases land use
    • Raises food prices

Mutation Breeding

  • Modern agriculture has practiced mutation breeding for the last 100 years
  • Chemicals or radiation are used to increase mutations, resulting in new varieties
  • This process is not considered GMO
  • A 2017 PNAS report found no evidence that foods from GE crops are less safe than foods from non-GE crops

Earth BioGenome Project

  • Proposed to sequence the genomes of all named extant eukaryotes, about 2 million species
  • Aims to create a digital library of life on earth
  • Projected to cost $4.7 billion

Metagenomics

  • Analysis of all the genomics in a coherent environmental sample
  • Examples of environments studied: 1 ml of ocean water, 1 gram of soil, and human gut

Applications of Metagenomics

  • Studies human gut biome and its relation to health
  • Applies to agriculture, environmental remediation, anthropology, and biotech

Gene Terminology

  • Paralogues are genes that diverged from a common ancestor
  • Orthologues are genes that perform the same function in different species

Pseudogenes

  • Are typically non-functional genes
  • Appear like existing gene sequences
  • Likely arise from DNA duplication or retro-transposition

Pseudogene.org

  • Comprehensive database and comparison platform for pseudogene annotation
  • Estimates approximately 10,668 processed pseudogenes and 14,000 non-processed pseudogenes

PTEN: Phosphatase and Tensin Homolog

  • Is a phosphatase, which removes phosphates from a protein

Gene Names

  • Ambiguous, have aliases, synonyms, and historical names
  • Example: Hemoglobin Subunit Gamma 1

Globin Gene Clusters

  • Located on chromosomes 16 and 11

Human Genome

  • Contains 3.2 * 109 nucleotides (haploid)
  • Chromosome sizes range from 49 Mbp to 279 Mbp
  • Repeated sequences make up 50% of the genome

Cell Senescence

  • Can result from the normal shortening of telomeric DNA
  • Cells irreversibly withdraw from the cell cycle and fail to respond to proliferation-inducing stimuli

RNA

  • Thousands of transcripts (alternatively spliced) in a human genome
  • LncRNAs: Long-non-coding RNAs (length > 200 nucleotides)
  • MicroRNAs: ~22 nucleotides in length whose function is RNA silencing

SNPs

  • Single Nucleotide Polymorphisms
  • Compare any 2 un-related individuals, they will be different at approximately 0.1% of sites (1 in 1,000).
  • 3109 * 0.001 = 3107 = 30,000,000 differences

Privacy Concerns

  • 20 million people have uploaded their genetic profiles to consumer DNA sites
  • Concerns that police can access this genetic data to solve crimes
  • First case in which a judge approved a warrant to penetrate GEDmatch and search its database

Case Example

  • A Florida jury found Thomas Garner guilty of first-degree murder for the 1984 death of Navy recruit Pamela Cahanes using genetic data

Atomic Level Interactions

  • DNA binding is a significant process at the atomic level.

Cross-Species Genome Similarity

  • "Islands of Similarity" indicate areas of shared genetic material between different species.

Scoring Matrices

  • Identity matrices assign a score of 1 for identical characters and 0 for non-identical ones.
  • Similarity matrices consider the similarity between characters, allowing for scores that reflect the evolutionary relationship between them.
  • PAM and BLOSUM matrices are examples of commonly used scoring matrices.

Pairwise Sequence Algorithms

  • Optimal alignments can be constructed based on previous solutions for smaller subsequences.
  • Dynamic programming is used to optimize the process, by calculating the best alignment that ends at a given pair of positions (i,j).

Key Elements of Sequence Alignment Algorithms

  • Global pairwise alignment aims to align the entire length of two sequences, while local alignment identifies the best possible local alignment within the sequences.

Needleman-Wunsch Algorithm

  • This algorithm produces an optimal global alignment between two sequences.
  • It uses a 2D matrix to store partial alignment scores and iteratively calculates the best alignment.
  • Scoring functions are used to assign values to matches, mismatches, and gaps.
  • The actual alignment is constructed during the traceback phase of the algorithm, starting from the cell with the highest score and tracing back to the beginning.

Smith-Waterman Algorithm

  • This algorithm focuses on local alignments, seeking the most significant region of similarity within two sequences.
  • It modifies the Needleman-Wunsch algorithm to account for local alignment by setting negative scores to 0.
  • The traceback process starts at the highest-scoring cell and ends when 0 is encountered.
  • The algorithm can also be used to identify all local alignments that exceed a designated threshold.

Computational Complexity of Needleman-Wunsch Algorithm

  • The algorithm has a computational complexity of O(n^2) for both speed and memory.
  • Advanced versions can reduce memory usage to O(n), achieving linear space complexity.

Global vs. Local Alignment

  • Global alignment seeks to align the entire lengths of two sequences.
  • Local alignment identifies the best possible local region of similarity within two sequences.

Smith-Waterman Algorithm

  • The initialization step sets the scores for the first row and column of the matrix to 0.
  • The iteration step computes the score of each cell based on the maximum value obtained from three possible directions: alignment with a gap in the first sequence, alignment with a gap in the second sequence, or aligning the current characters.
  • The termination and traceback phase involves identifying the cell with the highest overall score and tracing back to the starting point to reconstruct the optimal alignment.

Homework Assignment

  • Students are required to complete a matrix for the Needleman-Wunsch algorithm and recover the corresponding alignment.
  • They also need to create a "B" matrix in Python.

Recursion Example - Fibonacci Sequence

  • The Fibonacci sequence is a numerical series where each number is the sum of the two preceding numbers.
  • The sequence starts with 0 and 1.
  • To find the Fibonacci sequence:
    • Add the last two digits to get the next digit.
    • Repeat the process to generate the remaining digits.

Dictionary Hash Tables

  • Python's dict is a hash table data structure that efficiently stores key-value pairs.
  • Dicts are created by using curly braces {} and are written as a series of key:value pairs, e.g., dict = {key1:value1, key2:value2, ...}.
  • An empty dict is represented by an empty pair of curly braces {}.

Dict Access Operations

  • Accessing or setting a value within a dict uses square brackets:
    • dict['foo'] looks up the value associated with the key 'foo'.
    • dict['foo'] = bar assigns the value bar to the key 'foo'.
  • Keys in dicts can be strings, numbers, and tuples.
  • Values can be of any data type.
  • Attempting to access a non-existent key raises a KeyError.
  • Use the in operator to check if a key exists: 'foo' in dict.
  • The dict.get(key) method returns the value associated with the key, or None if the key is not present.
  • The dict.get(key, not-found) method allows specifying a default value to return if the key is not found.

Example of Building a Dict

  • A dict can be built incrementally by starting with an empty dict {} and adding key-value pairs using the assignment operator:
    • dict['a'] = 'alpha'
    • dict['g'] = 'gamma'
    • dict['o'] = 'omega'

Using Dictionaries as N-dimensional Arrays

  • Dictionaries can function like N-dimensional arrays, storing values associated with multi-dimensional keys.
  • The keys can be tuples or other composite data structures.
  • Example: matrix[1, 2] = 5 assigns the value 5 to the key (1, 2).

Dict Iteration and Ordering

  • By default, iterating over a dictionary (for key in dict: ...) iterates through its keys in an arbitrary order.
  • Key-value pairs in a dictionary maintain the order in which they were originally inserted.
  • dict.keys() returns a list of the keys.
  • dict.values() returns a list of the values.
  • dict.items() returns a list of (key, value) tuples, providing the most efficient way to access all the key-value data.
  • These list methods' output can be sorted using the sorted() function.

Performance Advantages of Dicts

  • Dictionaries provide a significant performance advantage for looking up and managing data based on keys.
  • Utilize dictionaries when possible to organize and access data efficiently.

Incremental Development (Rapid Prototyping) in Python

  • Don't write entire Python programs at once.
  • Identify smaller milestones and write code to reach them one at a time.
  • Use print statements to inspect data structures at each milestone.
  • Use sys.exit(0) to halt program execution after a milestone to focus on specific areas.
  • This gradual approach allows for quick testing and iteration, making it easier to build and refine complex programs.

Introduction to Lab 4 - Creating a Dictionary of Codons and Amino Acids

  • The lab focuses on building a dictionary representing the mapping between codons (three-nucleotide sequences) and their corresponding amino acids.
  • You will complete a dictionary (aminos) containing this information.
  • The lab provides code to translate DNA sequences into amino acid sequences using the aminos dictionary.

Lab 4 Key Concepts:

  • The code includes a function pick_a_codonthat randomly selects a codon for a given amino acid (if there are multiple codons that map to the same amino acid).
  • The lab emphasizes understanding the structure and manipulation of dictionaries in Python for tasks like translation (DNA -> Amino Acid).
  • It also introduces concepts like open reading frames (sequences that are translated into proteins until a stop codon is encountered).

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Panorama of Life Chapter 3 PDF
Python Dicts and Files PDF

More Like This

Genome Sequencing and Bacterial Infections
30 questions
Genome pathomorphology
37 questions

Genome pathomorphology

SophisticatedTulsa2777 avatar
SophisticatedTulsa2777
Use Quizgecko on...
Browser
Browser