Podcast
Questions and Answers
What does ClinVar primarily focus on?
What does ClinVar primarily focus on?
Which of the following is a database that contains information on single nucleotide polymorphisms?
Which of the following is a database that contains information on single nucleotide polymorphisms?
Which resource is known for containing hundreds of thousands of expression experiments?
Which resource is known for containing hundreds of thousands of expression experiments?
What is the significance of the star rating in ClinVar results?
What is the significance of the star rating in ClinVar results?
Signup and view all the answers
What is one potential drawback of using the NCBI extraction tools?
What is one potential drawback of using the NCBI extraction tools?
Signup and view all the answers
Which of the following links to a faster European mirror site for genomic data?
Which of the following links to a faster European mirror site for genomic data?
Signup and view all the answers
What does the acronym GEO stand for?
What does the acronym GEO stand for?
Signup and view all the answers
What is a feature that UCSC Genome Browser offers for exploring gene data?
What is a feature that UCSC Genome Browser offers for exploring gene data?
Signup and view all the answers
What feature makes Ensembl particularly useful for data extraction?
What feature makes Ensembl particularly useful for data extraction?
Signup and view all the answers
What does the 'zoom in by factor of 3X' feature enable a user to do?
What does the 'zoom in by factor of 3X' feature enable a user to do?
Signup and view all the answers
Which database is specifically known for providing standardized clinical significance of genetic variants?
Which database is specifically known for providing standardized clinical significance of genetic variants?
Signup and view all the answers
What does the direction of the arrows in a gene representation indicate?
What does the direction of the arrows in a gene representation indicate?
Signup and view all the answers
What distinguishes GRCh38 from GRCh37 in the UCSC Genome Browser?
What distinguishes GRCh38 from GRCh37 in the UCSC Genome Browser?
Signup and view all the answers
What tool does Ensembl provide for data extraction that is not covered in UCSC?
What tool does Ensembl provide for data extraction that is not covered in UCSC?
Signup and view all the answers
How can a user highlight a specific region for zooming?
How can a user highlight a specific region for zooming?
Signup and view all the answers
What type of analysis can gnomAD provide when looking up variants?
What type of analysis can gnomAD provide when looking up variants?
Signup and view all the answers
What information can be accessed by clicking on a transcript?
What information can be accessed by clicking on a transcript?
Signup and view all the answers
Which browser would be best for database queries related to genomic data?
Which browser would be best for database queries related to genomic data?
Signup and view all the answers
What is the purpose of the UCSC tracks feature?
What is the purpose of the UCSC tracks feature?
Signup and view all the answers
What functionality does the 'Zoom in/Zoom out' feature provide in the UCSC browser?
What functionality does the 'Zoom in/Zoom out' feature provide in the UCSC browser?
Signup and view all the answers
What can be observed when zoomed in to the level of DNA and amino acids?
What can be observed when zoomed in to the level of DNA and amino acids?
Signup and view all the answers
Which is NOT a part of the anatomy of the UCSC Genome Browser?
Which is NOT a part of the anatomy of the UCSC Genome Browser?
Signup and view all the answers
What do the coordinates chr11:110,045,605-110,767,437 represent?
What do the coordinates chr11:110,045,605-110,767,437 represent?
Signup and view all the answers
What happens when a user mouses over an exon?
What happens when a user mouses over an exon?
Signup and view all the answers
What programming languages are mentioned as having a REST API for data extraction?
What programming languages are mentioned as having a REST API for data extraction?
Signup and view all the answers
What type of data structure is indicated by the term 'lookup'?
What type of data structure is indicated by the term 'lookup'?
Signup and view all the answers
What is the primary biotype of the BRCA2 gene as described in the lookup?
What is the primary biotype of the BRCA2 gene as described in the lookup?
Signup and view all the answers
Which gene is associated with Bardet-Biedl syndrome according to the database lookup?
Which gene is associated with Bardet-Biedl syndrome according to the database lookup?
Signup and view all the answers
What assembly name is indicated for the genes mentioned?
What assembly name is indicated for the genes mentioned?
Signup and view all the answers
In the example provided, what is used to retrieve information about multiple gene symbols?
In the example provided, what is used to retrieve information about multiple gene symbols?
Signup and view all the answers
What is the display name for the BRCA2 gene as mentioned in the lookup output?
What is the display name for the BRCA2 gene as mentioned in the lookup output?
Signup and view all the answers
What does the strand value of BRCA2 indicate in the lookup result?
What does the strand value of BRCA2 indicate in the lookup result?
Signup and view all the answers
Which of the following is NOT a gene symbol mentioned in the content?
Which of the following is NOT a gene symbol mentioned in the content?
Signup and view all the answers
What is the description output for the BRCA2 gene?
What is the description output for the BRCA2 gene?
Signup and view all the answers
What type of biotype is BBS4 classified as?
What type of biotype is BBS4 classified as?
Signup and view all the answers
What is contained in the 'description' field for BBS4?
What is contained in the 'description' field for BBS4?
Signup and view all the answers
Which server is used to obtain the cDNA for the transcript_id?
Which server is used to obtain the cDNA for the transcript_id?
Signup and view all the answers
What does the variable 'transcript_id' contain after processing from the lookup?
What does the variable 'transcript_id' contain after processing from the lookup?
Signup and view all the answers
What is the value of the 'display_name' field for BBS4?
What is the value of the 'display_name' field for BBS4?
Signup and view all the answers
Which of the following fields indicates the genomic range of BBS4?
Which of the following fields indicates the genomic range of BBS4?
Signup and view all the answers
What type of molecule is represented in the sequence obtained for the transcript_id?
What type of molecule is represented in the sequence obtained for the transcript_id?
Signup and view all the answers
How is the response from the REST API request formatted for the sequence?
How is the response from the REST API request formatted for the sequence?
Signup and view all the answers
What was the conclusion of the 2017 review regarding first-generation GM crop animal feed?
What was the conclusion of the 2017 review regarding first-generation GM crop animal feed?
Signup and view all the answers
What effect would switching to non-GMO animal feed have according to research by Iowa State University?
What effect would switching to non-GMO animal feed have according to research by Iowa State University?
Signup and view all the answers
What modern agricultural technique has been used for the last 100 years to create new crop varieties?
What modern agricultural technique has been used for the last 100 years to create new crop varieties?
Signup and view all the answers
How many studies have been conducted examining the health and environmental safety of GMO crops?
How many studies have been conducted examining the health and environmental safety of GMO crops?
Signup and view all the answers
What is a potential consequence of continued use of GMO crops without proper assessment?
What is a potential consequence of continued use of GMO crops without proper assessment?
Signup and view all the answers
How many processed pseudogenes are mentioned in the database?
How many processed pseudogenes are mentioned in the database?
Signup and view all the answers
What is the main goal of the Earth BioGenome Project?
What is the main goal of the Earth BioGenome Project?
Signup and view all the answers
Which of the following is a characteristic of non-processed pseudogenes?
Which of the following is a characteristic of non-processed pseudogenes?
Signup and view all the answers
What is the approximate size of the Y chromosome in the human genome?
What is the approximate size of the Y chromosome in the human genome?
Signup and view all the answers
Which term describes genes that have diverged from a common ancestor?
Which term describes genes that have diverged from a common ancestor?
Signup and view all the answers
Which of the following is NOT an application of metagenomics?
Which of the following is NOT an application of metagenomics?
Signup and view all the answers
What is a potential purpose of pseudogenes in the genome?
What is a potential purpose of pseudogenes in the genome?
Signup and view all the answers
Which gene is known as a phosphatase that removes phosphates from proteins?
Which gene is known as a phosphatase that removes phosphates from proteins?
Signup and view all the answers
What is a common feature of pseudogenes?
What is a common feature of pseudogenes?
Signup and view all the answers
What is one potential health implication associated with the human gut biome?
What is one potential health implication associated with the human gut biome?
Signup and view all the answers
Which chromosome contains the globin gene clusters mentioned?
Which chromosome contains the globin gene clusters mentioned?
Signup and view all the answers
Which statement best describes orthologues?
Which statement best describes orthologues?
Signup and view all the answers
What percentage of the human genome is made up of repeated sequences?
What percentage of the human genome is made up of repeated sequences?
Signup and view all the answers
What is the estimated cost of the Earth BioGenome Project?
What is the estimated cost of the Earth BioGenome Project?
Signup and view all the answers
Which pseudogene is recognized as a non-processed pseudogene?
Which pseudogene is recognized as a non-processed pseudogene?
Signup and view all the answers
What can a genomic tree of life help to understand?
What can a genomic tree of life help to understand?
Signup and view all the answers
What defines a protein as hypothetical?
What defines a protein as hypothetical?
Signup and view all the answers
What is the main role of post-translational modifications?
What is the main role of post-translational modifications?
Signup and view all the answers
What vitamin deficiency is highlighted as a major issue among children between ages 0-5?
What vitamin deficiency is highlighted as a major issue among children between ages 0-5?
Signup and view all the answers
What is Golden Rice genetically modified to contain?
What is Golden Rice genetically modified to contain?
Signup and view all the answers
What has been the primary resistance to the introduction of Golden Rice in some countries?
What has been the primary resistance to the introduction of Golden Rice in some countries?
Signup and view all the answers
How many children become blind each year due to vitamin A deficiency?
How many children become blind each year due to vitamin A deficiency?
Signup and view all the answers
What type of modification has been specifically mentioned as an addition to the COVID-19 spike protein?
What type of modification has been specifically mentioned as an addition to the COVID-19 spike protein?
Signup and view all the answers
What does the addition of a phosphate by a kinase typically influence in proteins?
What does the addition of a phosphate by a kinase typically influence in proteins?
Signup and view all the answers
What is the consequence of cells undergoing senescence?
What is the consequence of cells undergoing senescence?
Signup and view all the answers
Which of the following accurately describes Long-non-coding RNAs (lncRNAs)?
Which of the following accurately describes Long-non-coding RNAs (lncRNAs)?
Signup and view all the answers
How frequently do Single Nucleotide Polymorphisms (SNPs) occur in unrelated individuals?
How frequently do Single Nucleotide Polymorphisms (SNPs) occur in unrelated individuals?
Signup and view all the answers
What privacy concern is raised by the use of consumer DNA sites by law enforcement?
What privacy concern is raised by the use of consumer DNA sites by law enforcement?
Signup and view all the answers
What common feature do microRNAs possess?
What common feature do microRNAs possess?
Signup and view all the answers
What legal implication arose from the use of GEDmatch by law enforcement?
What legal implication arose from the use of GEDmatch by law enforcement?
Signup and view all the answers
Which gene's involvement was critical in solving the 1984 murder case mentioned?
Which gene's involvement was critical in solving the 1984 murder case mentioned?
Signup and view all the answers
How many differences are estimated between any two unrelated individuals in terms of SNPs?
How many differences are estimated between any two unrelated individuals in terms of SNPs?
Signup and view all the answers
What is the score when traversing left from a specific cell that contains a score of -4?
What is the score when traversing left from a specific cell that contains a score of -4?
Signup and view all the answers
Which of the following directions is evaluated for scoring in this algorithm?
Which of the following directions is evaluated for scoring in this algorithm?
Signup and view all the answers
If a cell has a gap score of -2, what is inferred about the corresponding alignment?
If a cell has a gap score of -2, what is inferred about the corresponding alignment?
Signup and view all the answers
What is the score associated with a mismatch in this context?
What is the score associated with a mismatch in this context?
Signup and view all the answers
What is the total score for the direction denoted by cell C 4 within the scoring matrix?
What is the total score for the direction denoted by cell C 4 within the scoring matrix?
Signup and view all the answers
What is the score when traversing the LEFT direction from cell A1?
What is the score when traversing the LEFT direction from cell A1?
Signup and view all the answers
Which of the following scores would result from traversing vertically from cell A2?
Which of the following scores would result from traversing vertically from cell A2?
Signup and view all the answers
What direction gives a score of 1 when starting from cell A0?
What direction gives a score of 1 when starting from cell A0?
Signup and view all the answers
What is the gap score used when evaluating all directions from a starting cell?
What is the gap score used when evaluating all directions from a starting cell?
Signup and view all the answers
What score would be obtained if traversing diagonally from cell A3?
What score would be obtained if traversing diagonally from cell A3?
Signup and view all the answers
What does the Needleman-Wunsch algorithm primarily accomplish?
What does the Needleman-Wunsch algorithm primarily accomplish?
Signup and view all the answers
What is a key feature of scoring matrices used in sequence alignment?
What is a key feature of scoring matrices used in sequence alignment?
Signup and view all the answers
In the context of pairwise sequence algorithms, what is dynamic programming utilized for?
In the context of pairwise sequence algorithms, what is dynamic programming utilized for?
Signup and view all the answers
How does the Smith-Waterman algorithm differ from the Needleman-Wunsch algorithm?
How does the Smith-Waterman algorithm differ from the Needleman-Wunsch algorithm?
Signup and view all the answers
What is the purpose of including a gap penalty in alignment algorithms?
What is the purpose of including a gap penalty in alignment algorithms?
Signup and view all the answers
Which method is used to trace back through an alignment score matrix in the Needleman-Wunsch algorithm?
Which method is used to trace back through an alignment score matrix in the Needleman-Wunsch algorithm?
Signup and view all the answers
What type of scoring technique does an Identity Matrix use for aligning identical bases?
What type of scoring technique does an Identity Matrix use for aligning identical bases?
Signup and view all the answers
What is a primary advantage of the BLOSUM matrices over PAM matrices?
What is a primary advantage of the BLOSUM matrices over PAM matrices?
Signup and view all the answers
What score is assigned when a match occurs during traversal of the algorithm?
What score is assigned when a match occurs during traversal of the algorithm?
Signup and view all the answers
Which direction provides the maximum score during the evaluation process?
Which direction provides the maximum score during the evaluation process?
Signup and view all the answers
What should be kept to potentially produce the maximum value in the algorithm?
What should be kept to potentially produce the maximum value in the algorithm?
Signup and view all the answers
What is assigned to the cell for a mismatch during the traversal?
What is assigned to the cell for a mismatch during the traversal?
Signup and view all the answers
What happens to the score in a cell when the maximum value is reached?
What happens to the score in a cell when the maximum value is reached?
Signup and view all the answers
Which of the following scores indicates a gap in the evaluation?
Which of the following scores indicates a gap in the evaluation?
Signup and view all the answers
During evaluation, scoring for cell values should prioritize which direction?
During evaluation, scoring for cell values should prioritize which direction?
Signup and view all the answers
What is the effect of selecting only the arrows that contribute to the maximum score?
What is the effect of selecting only the arrows that contribute to the maximum score?
Signup and view all the answers
What value is assigned to the cell when traversing from diagonal to the next cell?
What value is assigned to the cell when traversing from diagonal to the next cell?
Signup and view all the answers
What score does the cell contain if no valid path is discovered?
What score does the cell contain if no valid path is discovered?
Signup and view all the answers
What is the time complexity of the basic Needleman-Wunsch algorithm?
What is the time complexity of the basic Needleman-Wunsch algorithm?
Signup and view all the answers
Which modification of the Needleman-Wunsch algorithm is central to the Smith-Waterman algorithm?
Which modification of the Needleman-Wunsch algorithm is central to the Smith-Waterman algorithm?
Signup and view all the answers
What is the purpose of tracing back in the Smith-Waterman algorithm?
What is the purpose of tracing back in the Smith-Waterman algorithm?
Signup and view all the answers
In the Smith-Waterman algorithm, what is done if F(i, j) is found to be less than a threshold t?
In the Smith-Waterman algorithm, what is done if F(i, j) is found to be less than a threshold t?
Signup and view all the answers
Which of the following statements about the memory requirements of algorithms is true?
Which of the following statements about the memory requirements of algorithms is true?
Signup and view all the answers
What is the scoring method used in the Smith-Waterman algorithm for calculating F(i, j)?
What is the scoring method used in the Smith-Waterman algorithm for calculating F(i, j)?
Signup and view all the answers
What is the result of the FOPT calculation in the Smith-Waterman algorithm?
What is the result of the FOPT calculation in the Smith-Waterman algorithm?
Signup and view all the answers
Which of the following methods can be used to initiate a local alignment in the Smith-Waterman algorithm?
Which of the following methods can be used to initiate a local alignment in the Smith-Waterman algorithm?
Signup and view all the answers
What is the correct syntax for defining an empty dictionary in Python?
What is the correct syntax for defining an empty dictionary in Python?
Signup and view all the answers
Which of the following correctly handles a situation when looking up a value that does not exist in a dictionary?
Which of the following correctly handles a situation when looking up a value that does not exist in a dictionary?
Signup and view all the answers
Which types of objects are valid keys in a Python dictionary?
Which types of objects are valid keys in a Python dictionary?
Signup and view all the answers
What is the primary function of a dictionary in Python compared to a traditional array?
What is the primary function of a dictionary in Python compared to a traditional array?
Signup and view all the answers
What would happen if you try to look up a key that does not exist in a dictionary without handling it properly?
What would happen if you try to look up a key that does not exist in a dictionary without handling it properly?
Signup and view all the answers
Which method is used to obtain a list of keys from a dictionary?
Which method is used to obtain a list of keys from a dictionary?
Signup and view all the answers
What syntax allows you to iterate through both keys and values in a dictionary?
What syntax allows you to iterate through both keys and values in a dictionary?
Signup and view all the answers
What is the expected output of print(dict.items()) given a dictionary where keys are 'a', 'o', 'g'?
What is the expected output of print(dict.items()) given a dictionary where keys are 'a', 'o', 'g'?
Signup and view all the answers
What is one recommended strategy for building a Python program effectively?
What is one recommended strategy for building a Python program effectively?
Signup and view all the answers
What performance advantage does a dictionary provide in programming?
What performance advantage does a dictionary provide in programming?
Signup and view all the answers
What will be the result of trying to print the value of a key that does not exist in a dictionary?
What will be the result of trying to print the value of a key that does not exist in a dictionary?
Signup and view all the answers
Which method can be used to safely access a value in a dictionary without raising an error if the key is not present?
Which method can be used to safely access a value in a dictionary without raising an error if the key is not present?
Signup and view all the answers
What happens if you try to delete a key that is not present in the dictionary using the del statement?
What happens if you try to delete a key that is not present in the dictionary using the del statement?
Signup and view all the answers
How are keys stored in a Python dictionary when items are inserted?
How are keys stored in a Python dictionary when items are inserted?
Signup and view all the answers
Which of the following statements about dictionaries is incorrect?
Which of the following statements about dictionaries is incorrect?
Signup and view all the answers
What method should be used to remove an item from a dictionary while avoiding exceptions?
What method should be used to remove an item from a dictionary while avoiding exceptions?
Signup and view all the answers
When iterating over a dictionary, what is the default behavior concerning the elements accessed?
When iterating over a dictionary, what is the default behavior concerning the elements accessed?
Signup and view all the answers
What does the statement matrix[1,2] = 5
signify in the context of dictionaries?
What does the statement matrix[1,2] = 5
signify in the context of dictionaries?
Signup and view all the answers
What does the function pick_a_codon return?
What does the function pick_a_codon return?
Signup and view all the answers
What does the variable 'stop_test' represent in the context of the code?
What does the variable 'stop_test' represent in the context of the code?
Signup and view all the answers
What will happen if the count of codons is equal to two in the select_random calculation?
What will happen if the count of codons is equal to two in the select_random calculation?
Signup and view all the answers
Which statement is true about seq1 in the code?
Which statement is true about seq1 in the code?
Signup and view all the answers
What is the purpose of the print statement after checking seq1?
What is the purpose of the print statement after checking seq1?
Signup and view all the answers
How is the codon_list created in the pick_a_codon function?
How is the codon_list created in the pick_a_codon function?
Signup and view all the answers
What should be modified in the approach to selecting a new codon from the codon_list?
What should be modified in the approach to selecting a new codon from the codon_list?
Signup and view all the answers
What is the role of the make_aminos function indicated in the sample code?
What is the role of the make_aminos function indicated in the sample code?
Signup and view all the answers
What is the purpose of the 'make_aminos' function in the provided code?
What is the purpose of the 'make_aminos' function in the provided code?
Signup and view all the answers
In Part 2 of the process, what is primarily stored in the 'protein1' string?
In Part 2 of the process, what is primarily stored in the 'protein1' string?
Signup and view all the answers
What programming structure is suggested for pulling codons from seq1?
What programming structure is suggested for pulling codons from seq1?
Signup and view all the answers
When developing the dictionary to be used in the program, which of the following is NOT a characteristic of the entries?
When developing the dictionary to be used in the program, which of the following is NOT a characteristic of the entries?
Signup and view all the answers
How is the 'new_seq1' string expected to be modified during the translation process?
How is the 'new_seq1' string expected to be modified during the translation process?
Signup and view all the answers
What type of data structure is used to store the codon-to-amino acid mappings?
What type of data structure is used to store the codon-to-amino acid mappings?
Signup and view all the answers
Which sequence is confirmed to be composed of multiples of 3 for processing?
Which sequence is confirmed to be composed of multiples of 3 for processing?
Signup and view all the answers
What should the initial value of the loop variable be to start translating seq1?
What should the initial value of the loop variable be to start translating seq1?
Signup and view all the answers
Study Notes
Genome Browsers
- UCSC, NCBI, and Ensembl are the “Big 3” genome browsers
- NCBI includes ClinVar, a database with publicly submitted variations, and dbSNP, a database of SNPs
- GnomAD is a genome aggregation database and also a good source for variant information
UCSC
- UCSC Genome Browser is available at http://genome.ucsc.edu/
- UCSC provides the Track View and Table Browser
- UCSC offers two assemblies: GRCh38 and GRCh37 with different levels of annotation
- GRCh38 has fewer gaps and is newer, while GRCh37 is older and has more annotations
NCBI
- NCBI is available at https://www.ncbi.nlm.nih.gov/
- NCBI focuses on database queries.
- ClinVar displays submitted variations and observed disease associations
- dbSNP (a database of SNPs) is also housed within NCBI
- GEO (Gene Expression Omnibus) offers over 100,000 expression experiments
- PubMed lists publications
Ensembl
- Located at http://useast.ensembl.org/index.html
- Ensembl is a European Genome Browser
- Ensembl combines features of UCSC and NCBI, such as Track Viewing and Table Browser
Gnomad
- Gnomad is available at https://gnomad.broadinstitute.org/
- Provides information on position, change, minor allele frequency (MAF), consequence, and clinical significance
- Gnomad is frequently updated, so the content in screenshots may not always be the same.
UCSC Anatomy
- Coordinates can be entered by gene symbol or coordinates
- Track View displays various data types
- You can zoom in/out and pan left/right
- You can access DNA view and analyze at the amino acid level
- The strand direction reveals if the gene is on the reverse (3’ to 5’) strand
UCSC Tracks
- The Track View is organized by type, such as mapping, genes, phenotype, etc.
- Users can turn on/off different tracks of data
- Tracks provide links to additional information
NCBI
- NCBI's data extraction tools are complicated
Ensembl Automation
- Ensembl provides automated extraction tools
- It offers a Perl API for programmatic data extraction and a REST (representational state transfer) interface that supports Perl, Python, Java, and other languages
Proteomics
- Proteins need to be experimentally verified to exist and for their properties to be measured
- Post-translational modifications change protein behavior from what the genetic sequence alone indicates
- Addition of a phosphate by a kinase is a post-translational modification
- Addition of sugar to a protein is a post-translational modification, like in the COVID-19 spike protein
Vitamin A Deficiency
- 250,000 to 500,000 children become blind every year due to vitamin A deficiency
- Data on vitamin A deficiency worldwide collected from 1991 to 2013
Golden Rice
- Genetically modified to contain beta-carotene, a precursor to vitamin A
- Anti-GMO groups fought against the introduction of golden rice in Bangladesh
Concerns about GMOs
- A 2017 review found no evidence that GMO animal feed has adverse effects on animal health
- Trillions of animals have been fed with GMO feed without negative health consequences
- Switching to non-GMO feed has negative consequences
- Increases greenhouse gas emissions by 7%
- Increases land use
- Raises food prices
Mutation Breeding
- Modern agriculture has practiced mutation breeding for the last 100 years
- Chemicals or radiation are used to increase mutations, resulting in new varieties
- This process is not considered GMO
- A 2017 PNAS report found no evidence that foods from GE crops are less safe than foods from non-GE crops
Earth BioGenome Project
- Proposed to sequence the genomes of all named extant eukaryotes, about 2 million species
- Aims to create a digital library of life on earth
- Projected to cost $4.7 billion
Metagenomics
- Analysis of all the genomics in a coherent environmental sample
- Examples of environments studied: 1 ml of ocean water, 1 gram of soil, and human gut
Applications of Metagenomics
- Studies human gut biome and its relation to health
- Applies to agriculture, environmental remediation, anthropology, and biotech
Gene Terminology
- Paralogues are genes that diverged from a common ancestor
- Orthologues are genes that perform the same function in different species
Pseudogenes
- Are typically non-functional genes
- Appear like existing gene sequences
- Likely arise from DNA duplication or retro-transposition
Pseudogene.org
- Comprehensive database and comparison platform for pseudogene annotation
- Estimates approximately 10,668 processed pseudogenes and 14,000 non-processed pseudogenes
PTEN: Phosphatase and Tensin Homolog
- Is a phosphatase, which removes phosphates from a protein
Gene Names
- Ambiguous, have aliases, synonyms, and historical names
- Example: Hemoglobin Subunit Gamma 1
Globin Gene Clusters
- Located on chromosomes 16 and 11
Human Genome
- Contains 3.2 * 109 nucleotides (haploid)
- Chromosome sizes range from 49 Mbp to 279 Mbp
- Repeated sequences make up 50% of the genome
Cell Senescence
- Can result from the normal shortening of telomeric DNA
- Cells irreversibly withdraw from the cell cycle and fail to respond to proliferation-inducing stimuli
RNA
- Thousands of transcripts (alternatively spliced) in a human genome
- LncRNAs: Long-non-coding RNAs (length > 200 nucleotides)
- MicroRNAs: ~22 nucleotides in length whose function is RNA silencing
SNPs
- Single Nucleotide Polymorphisms
- Compare any 2 un-related individuals, they will be different at approximately 0.1% of sites (1 in 1,000).
- 3109 * 0.001 = 3107 = 30,000,000 differences
Privacy Concerns
- 20 million people have uploaded their genetic profiles to consumer DNA sites
- Concerns that police can access this genetic data to solve crimes
- First case in which a judge approved a warrant to penetrate GEDmatch and search its database
Case Example
- A Florida jury found Thomas Garner guilty of first-degree murder for the 1984 death of Navy recruit Pamela Cahanes using genetic data
Atomic Level Interactions
- DNA binding is a significant process at the atomic level.
Cross-Species Genome Similarity
- "Islands of Similarity" indicate areas of shared genetic material between different species.
Scoring Matrices
- Identity matrices assign a score of 1 for identical characters and 0 for non-identical ones.
- Similarity matrices consider the similarity between characters, allowing for scores that reflect the evolutionary relationship between them.
- PAM and BLOSUM matrices are examples of commonly used scoring matrices.
Pairwise Sequence Algorithms
- Optimal alignments can be constructed based on previous solutions for smaller subsequences.
- Dynamic programming is used to optimize the process, by calculating the best alignment that ends at a given pair of positions (i,j).
Key Elements of Sequence Alignment Algorithms
- Global pairwise alignment aims to align the entire length of two sequences, while local alignment identifies the best possible local alignment within the sequences.
Needleman-Wunsch Algorithm
- This algorithm produces an optimal global alignment between two sequences.
- It uses a 2D matrix to store partial alignment scores and iteratively calculates the best alignment.
- Scoring functions are used to assign values to matches, mismatches, and gaps.
- The actual alignment is constructed during the traceback phase of the algorithm, starting from the cell with the highest score and tracing back to the beginning.
Smith-Waterman Algorithm
- This algorithm focuses on local alignments, seeking the most significant region of similarity within two sequences.
- It modifies the Needleman-Wunsch algorithm to account for local alignment by setting negative scores to 0.
- The traceback process starts at the highest-scoring cell and ends when 0 is encountered.
- The algorithm can also be used to identify all local alignments that exceed a designated threshold.
Computational Complexity of Needleman-Wunsch Algorithm
- The algorithm has a computational complexity of O(n^2) for both speed and memory.
- Advanced versions can reduce memory usage to O(n), achieving linear space complexity.
Global vs. Local Alignment
- Global alignment seeks to align the entire lengths of two sequences.
- Local alignment identifies the best possible local region of similarity within two sequences.
Smith-Waterman Algorithm
- The initialization step sets the scores for the first row and column of the matrix to 0.
- The iteration step computes the score of each cell based on the maximum value obtained from three possible directions: alignment with a gap in the first sequence, alignment with a gap in the second sequence, or aligning the current characters.
- The termination and traceback phase involves identifying the cell with the highest overall score and tracing back to the starting point to reconstruct the optimal alignment.
Homework Assignment
- Students are required to complete a matrix for the Needleman-Wunsch algorithm and recover the corresponding alignment.
- They also need to create a "B" matrix in Python.
Recursion Example - Fibonacci Sequence
- The Fibonacci sequence is a numerical series where each number is the sum of the two preceding numbers.
- The sequence starts with 0 and 1.
- To find the Fibonacci sequence:
- Add the last two digits to get the next digit.
- Repeat the process to generate the remaining digits.
Dictionary Hash Tables
- Python's
dict
is a hash table data structure that efficiently stores key-value pairs. - Dicts are created by using curly braces
{}
and are written as a series ofkey:value
pairs, e.g.,dict = {key1:value1, key2:value2, ...}
. - An empty dict is represented by an empty pair of curly braces
{}
.
Dict Access Operations
- Accessing or setting a value within a dict uses square brackets:
-
dict['foo']
looks up the value associated with the key'foo'
. -
dict['foo'] = bar
assigns the valuebar
to the key'foo'
.
-
- Keys in dicts can be strings, numbers, and tuples.
- Values can be of any data type.
- Attempting to access a non-existent key raises a
KeyError
. - Use the
in
operator to check if a key exists:'foo' in dict
. - The
dict.get(key)
method returns the value associated with the key, orNone
if the key is not present. - The
dict.get(key, not-found)
method allows specifying a default value to return if the key is not found.
Example of Building a Dict
- A
dict
can be built incrementally by starting with an empty dict{}
and adding key-value pairs using the assignment operator:-
dict['a'] = 'alpha'
-
dict['g'] = 'gamma'
-
dict['o'] = 'omega'
-
Using Dictionaries as N-dimensional Arrays
- Dictionaries can function like N-dimensional arrays, storing values associated with multi-dimensional keys.
- The keys can be tuples or other composite data structures.
- Example:
matrix[1, 2] = 5
assigns the value5
to the key(1, 2)
.
Dict Iteration and Ordering
- By default, iterating over a dictionary (
for key in dict: ...
) iterates through its keys in an arbitrary order. - Key-value pairs in a dictionary maintain the order in which they were originally inserted.
-
dict.keys()
returns a list of the keys. -
dict.values()
returns a list of the values. -
dict.items()
returns a list of (key, value) tuples, providing the most efficient way to access all the key-value data. - These list methods' output can be sorted using the
sorted()
function.
Performance Advantages of Dicts
- Dictionaries provide a significant performance advantage for looking up and managing data based on keys.
- Utilize dictionaries when possible to organize and access data efficiently.
Incremental Development (Rapid Prototyping) in Python
- Don't write entire Python programs at once.
- Identify smaller milestones and write code to reach them one at a time.
- Use
print
statements to inspect data structures at each milestone. - Use
sys.exit(0)
to halt program execution after a milestone to focus on specific areas. - This gradual approach allows for quick testing and iteration, making it easier to build and refine complex programs.
Introduction to Lab 4 - Creating a Dictionary of Codons and Amino Acids
- The lab focuses on building a dictionary representing the mapping between codons (three-nucleotide sequences) and their corresponding amino acids.
- You will complete a dictionary (
aminos
) containing this information. - The lab provides code to translate DNA sequences into amino acid sequences using the
aminos
dictionary.
Lab 4 Key Concepts:
- The code includes a function
pick_a_codon
that randomly selects a codon for a given amino acid (if there are multiple codons that map to the same amino acid). - The lab emphasizes understanding the structure and manipulation of dictionaries in Python for tasks like translation (DNA -> Amino Acid).
- It also introduces concepts like open reading frames (sequences that are translated into proteins until a stop codon is encountered).
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Test your knowledge on the major genome browsers: UCSC, NCBI, and Ensembl. Learn about their features, databases, and offerings, including ClinVar and dbSNP. This quiz covers essential details that every genomics enthusiast should know.