Podcast
Questions and Answers
What role do epigenetic modifications play in gene expression?
What role do epigenetic modifications play in gene expression?
- They completely deactivate all non-coding genes.
- They influence how genes are expressed but do not change the genetic code. (correct)
- They directly alter the DNA sequence of the gene.
- They are part of larger symbiotic networks that enhance gene functions.
What is the significance of alternative splicing in gene function?
What is the significance of alternative splicing in gene function?
- It allows the replication of DNA strands to happen more rapidly.
- It enables a single gene to produce multiple RNA and protein isoforms. (correct)
- It ensures only non-coding genes are expressed in the cell.
- It prevents the expression of regulatory regions.
Which type of molecules can non-coding genes produce?
Which type of molecules can non-coding genes produce?
- Functional RNA molecules such as tRNA and miRNA. (correct)
- Only mechanical enzymes that aid in cellular movement.
- Only structural proteins.
- DNA polymerases and ligases.
What is a common misconception about genes?
What is a common misconception about genes?
What aspect of gene definition has evolved post-ENCODE?
What aspect of gene definition has evolved post-ENCODE?
What percentage of the final project grade is attributed to the oral presentation?
What percentage of the final project grade is attributed to the oral presentation?
Which topic is NOT one of the components covered in the course on single cell RNA seq?
Which topic is NOT one of the components covered in the course on single cell RNA seq?
What is the main focus of the learning objectives in this course?
What is the main focus of the learning objectives in this course?
How many homework assignments contribute to the overall grading rubric?
How many homework assignments contribute to the overall grading rubric?
What does the grading rubric indicate is the largest portion of the grading weight?
What does the grading rubric indicate is the largest portion of the grading weight?
What does principal component analysis (PCA) primarily aim to achieve?
What does principal component analysis (PCA) primarily aim to achieve?
Which of the following is a factor affecting outcomes in this computational biology course?
Which of the following is a factor affecting outcomes in this computational biology course?
What is one of the challenges mentioned regarding the course content?
What is one of the challenges mentioned regarding the course content?
What does the matrix notation X represent in a regression model?
What does the matrix notation X represent in a regression model?
How is the confusion between mathematical and statistical conventions often manifested?
How is the confusion between mathematical and statistical conventions often manifested?
What mathematical concept is highlighted in the learning objectives?
What mathematical concept is highlighted in the learning objectives?
Which of the following statements about the PCA derived by SVD is true?
Which of the following statements about the PCA derived by SVD is true?
In a data table following statistical convention, what does 'n' represent?
In a data table following statistical convention, what does 'n' represent?
What type of data is single-cell RNA-seq primarily considered to be?
What type of data is single-cell RNA-seq primarily considered to be?
Which aspect of single-cell RNA-seq analysis will NOT be covered in the course?
Which aspect of single-cell RNA-seq analysis will NOT be covered in the course?
Why are the methods taught in the course relevant across different computational biology applications?
Why are the methods taught in the course relevant across different computational biology applications?
What is the primary purpose of Discord in the course context?
What is the primary purpose of Discord in the course context?
Which of the following descriptions best represents the limitations addressed in the course?
Which of the following descriptions best represents the limitations addressed in the course?
What was the term 'gene' coined to describe?
What was the term 'gene' coined to describe?
What aspect of DNA did Franklin, Crick, and Watson discover in 1953?
What aspect of DNA did Franklin, Crick, and Watson discover in 1953?
Which of the following statements is associated with George Gamow?
Which of the following statements is associated with George Gamow?
What does the 'coding problem' refer to?
What does the 'coding problem' refer to?
What concept did Crick, Griffith, and Orgel suggest regarding genetic coding?
What concept did Crick, Griffith, and Orgel suggest regarding genetic coding?
What is the significance of the 'RNA tie club'?
What is the significance of the 'RNA tie club'?
Which statement best describes a matrix as used in the given context?
Which statement best describes a matrix as used in the given context?
What common misconception might students have about the number of amino acids and nucleotides?
What common misconception might students have about the number of amino acids and nucleotides?
What is the dimension of the image of the matrix 1 1 4
0 1 2
?
What is the dimension of the image of the matrix 1 1 4
0 1 2
?
What is the result of applying this matrix to vector (1, 0)?: 1 1
0 1
What is the result of applying this matrix to vector (1, 0)?: 1 1
0 1
How does the given example of a gene expression matrix 1 0
0 1
represent a function?
How does the given example of a gene expression matrix 1 0
0 1
represent a function?
How are singular values important in the context of Singular Value Decomposition?
How are singular values important in the context of Singular Value Decomposition?
In the Singular Value Decomposition, U, V are orthogonal matrices representing the...
In the Singular Value Decomposition, U, V are orthogonal matrices representing the...
Flashcards
Computational Biology
Computational Biology
The field that uses computational methods to investigate biological problems and data.
Gene Expression Matrix
Gene Expression Matrix
A matrix that represents the expression levels of genes across different cells.
Matrix as a Linear Map
Matrix as a Linear Map
A mathematical representation of a linear transformation, transforming vectors from one space to another.
Simpson's Paradox
Simpson's Paradox
Signup and view all the flashcards
Single Cell RNA Sequencing (scRNA-seq)
Single Cell RNA Sequencing (scRNA-seq)
Signup and view all the flashcards
scRNA-seq Experimental and Analytical Pipeline
scRNA-seq Experimental and Analytical Pipeline
Signup and view all the flashcards
Computational Biology Applications
Computational Biology Applications
Signup and view all the flashcards
Genes are not always continuous
Genes are not always continuous
Signup and view all the flashcards
Gene regulatory networks
Gene regulatory networks
Signup and view all the flashcards
Alternative splicing
Alternative splicing
Signup and view all the flashcards
Epigenetic modifications
Epigenetic modifications
Signup and view all the flashcards
Non-coding genes
Non-coding genes
Signup and view all the flashcards
What is a gene?
What is a gene?
Signup and view all the flashcards
What is a matrix?
What is a matrix?
Signup and view all the flashcards
What is gene expression?
What is gene expression?
Signup and view all the flashcards
What is the Central Dogma of Molecular Biology?
What is the Central Dogma of Molecular Biology?
Signup and view all the flashcards
What is the genetic code?
What is the genetic code?
Signup and view all the flashcards
What is the adaptor hypothesis?
What is the adaptor hypothesis?
Signup and view all the flashcards
What is a comma-free code?
What is a comma-free code?
Signup and view all the flashcards
Explain how proteins are synthesized.
Explain how proteins are synthesized.
Signup and view all the flashcards
Matrix as a Function
Matrix as a Function
Signup and view all the flashcards
Rank of a Matrix
Rank of a Matrix
Signup and view all the flashcards
Singular Value Decomposition (SVD)
Singular Value Decomposition (SVD)
Signup and view all the flashcards
Orthogonal Matrix
Orthogonal Matrix
Signup and view all the flashcards
Sigma Matrix (Σ)
Sigma Matrix (Σ)
Signup and view all the flashcards
Statistics Table Convention
Statistics Table Convention
Signup and view all the flashcards
Low Rank Data Approximation
Low Rank Data Approximation
Signup and view all the flashcards
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)
Signup and view all the flashcards
Principal Components
Principal Components
Signup and view all the flashcards
Dimension Reduction
Dimension Reduction
Signup and view all the flashcards
Epigenetics
Epigenetics
Signup and view all the flashcards
Study Notes
Course Information
- Course Title: Introduction to Computational Genomics and Systems Biology
- Instructor: Vanessa D Jonsson
- University: University of California, Santa Cruz
- Course: BME 230A
- Semester: Winter 2025
Instructors
- Vanessa Jonsson
- Benedict Paten
- Josh Stuart
Teaching Assistants
- Lydia Mok (Jonsson)
- Gabriel Penunuri (Paten/Stuart)
Class Overview
- Models and methods for single cell RNA seq analysis (5 weeks)
- Efficient sequence comparison (2.5 weeks)
- Somatic Genomics (1 week)
- Intro to AI in biology (1.5 weeks)
Course Policies and Logistics
- Lectures: Tuesdays and Thursdays, 3:20-4:55 pm, PhysSciences 110
- Lectures will be posted on Canvas
- 3 problem sets, posted and submitted on Canvas
- Jupyter notebooks posted and submitted on Canvas
- Collaboration allowed on all homework except midterm and final
- Students must record collaborators and contributions
- Questions and homework discussions on Discord
- Some problems will involve rudimentary programming using Google Colab or Jupyter notebooks
Grading Rubric
- 15% Project proposal presentation (3 slides/ 3 mins per presentation): background/problem/proposed work.
- 25% In-class Jupyter notebooks (submitted at end of lecture day)
- 10% Homework 1: (Jonsson/Mok theoretical + practice, due Jan 21)
- 10% Homework 2: (Jonsson/Mok theoretical + practice, due Feb 11)
- 10% Homework 3: (Paten, due Feb 11)
- 30% Final Project (written report 15%, oral presentation 15%)
Models and Methods in Single Cell RNA Seq
- Introduction
- Dimensionality Reduction/Clustering
- Modeling Counts (statistical distributions)
- Generalized Linear Models
- Variance Stabilization
- Differential Testing
- Multiple Testing, Part 1
- Multiple Testing, Part 2
- Biophysical Models, Part 1 (Subject to change)
- Biophysical Models, Part 2 (Subject to change)
Acknowledgements
- Several slides sourced from and inspired by "An Elementary Introduction to Computational Biology of single-cell RNA-seq." by Lior Pachter, California Institute of Technology.
Learning Objectives
- Understand the scope of the class
- Topics covered in the first 5 weeks
- Understand gene expression matrices
- Matrices as linear maps
- Data/Simpson's paradox
- Purpose of single-cell RNA seq
- Single-cell RNA-seq experimental and analytical pipeline
What the Course is About
- Computational biology: Study of models and methods used in biology and for interpreting biological data
- Warnings:
- Much jargon (underlined for emphasis).
- Outcomes depend on prior knowledge of prerequisites.
- Course focused on one specific area for pedagogical clarity.
Be Careful with Jargon
- Computational biology draws on multiple disciplines (biology, bioengineering, CS, EE, math, stats).
- Terminology can vary, possibly confusing.
(Ideal) Prerequisites
- Mathematics (single/multivariate calculus, linear algebra, measure theory, discrete math.)
- Probability and statistics (probability theory, applied stats, theoretical stats)
- Computer science (programming, algorithms, data structures, software engineering)
- Biology (molecular/cell biology, immunology, neuroscience, evolution, biophysics)
- Domain-specific biological knowledge
Focus for the Single Cell Course
- Focus on common ideas and concepts across many applications.
- Focus on single-cell RNA seq (Lecture 2 will provide more detail).
- Primary focus on the gene expression matrix.
- Background material will be referenced at the end of each slide deck
Single-cell RNA-seq
- Single-cell RNA-seq is not single, cell, or RNA (Lecture 2)
- Single-cell RNA-seq: Group of constantly improving technologies/analysis tools
- Input: Cells
- Output: Gene expression matrix (proxy)
What is a Gene Expression Matrix?
- Expression: Process where information in a gene is used to generate protein/non-coding RNA
- Gene: Term coined by Johannsen (1857-1927), evolving meaning since 1909
- Matrix: Rectangular array of numbers for representing linear maps
The Meaning of "Gene Expression"
- Central Dogma: Information cannot exit a protein
- Sequence of amino acids carries information
- Directionality of information: DNA to RNA to Protein
Watson's Confusion
- Diagram showing DNA replication, transcription and translation is a reaction network rather than central dogma
A Reaction Network of Biopolymers
- Mathematical model of interactions between biopolymers like DNA, RNA and Protein with differential equations
Deciphering the "Genetic Code"
- 1953: DNA double helix discovery
- 1954: RNA Tie Club
- 1955: Adaptor hypothesis suggested
- 1956: Exploring nucleotide requirements for amino acids coding
4^2 < 20 < 4^3
- Theory on amino-acid order in nucleic acid strand
- 20 naturally occurring amino acids versus 4 nucleotides
A Sense and Nonsense Proposal
- Mechanism for coding frame determination (using “comma-free code”)
- Specific assignment of nucleotide triplets for "sense" and "nonsense" codons needed
Two Simple Observations, a Question, and a Theorem
- Repeats and shifts of sense triplets are nonsense.
- Question: Maximum size of a triplet comma-free code for a 4-letter alphabet?
- Theorem: Maximum size is 20.
The Genetic Code
- Determined through experimental series
- Key contributors: Grunberg-Manago, Holley, Khorana, Nirenberg, and Singer
The Genetic Code (Amino Acid Table)
- Amino acid 3-letter codes and single-letter codes
There's Plenty of Room at the Bottom
- Richard Feynman's 1959 Caltech lecture on potential of manipulating matter on atomic/molecular scale
- Importance of having tools developed to get “to the bottom” of gene expression
What is a Gene?
- Gene expression influenced by epigenetic modifications and regulatory networks
- Single gene can produce multiple RNA/protein isoforms through alternative splicing
- Non-coding genes can also produce molecules
Why a Gene Expression Matrix?
- A rectangular array is sufficient to represent a set of variables
A Matrix is Code for a (Linear) Function
- Matrix represents function, mapping input vectors to output vectors
How a Matrix Describes (is code for) a Function
- Matrix multiplication as a function
The Rank of a Matrix
- Rank: Dimension of the image (result of the transformation).
Singular Value Decomposition
- Decomposing a matrix into a product of three matrices: U, Σ, and V*
- U, V are orthogonal matrices, Σ is a diagonal matrix
Recall PCA BME 205
- Principal Component Analysis, used to find patterns in multivariate datasets through Singular Value Decomposition
Principal Component Analysis Can Be Derived by SVD
- Finding eigenvectors/eigenvalues of the covariance matrix is linked to SVD
- Data matrix decomposition via SVD produces principal components
Application to Dimension Reduction
- Using SVD to reduce dimensionality of datasets
A (Statistics) Convention for Tables
- Matrices in statistics represent observations (rows) and features (columns)
- Regression models described as linear combinations of explanatory variables (Xβ + є)
Confusion between the Mathematics and Statistics Conventions
- Seurat and Scanpy: Different conventions for rows/columns in data tables
Summary
- Expression: Process where gene information creates RNA/protein products
- Gene: Term coined by Johannsen, meaning slightly altered over the years
- Matrix: Rectangular array for representing a linear map.
- Gene expression matrix is more than a table, not immediately about expression, without context, may yield uncertain interpretation.
Why Single-Cell RNA-seq?
- Bulk RNA-seq averaging is problematic; single-cell is needed to resolve biases
- Simpson's paradox in bulk data analysis reveals averaging problems
- Resolution increases, introduce more uncertainty
Getting to the Bottom
- Lack of resolution in gene/cell expression masks important relationships/factors
- In this context, getting to the bottom means having proper cell and isoform resolution
The Purpose of Single-Cell RNA-seq
- Decompose tissue/organ expression into its constituent parts.
- Differentiate cell types with molecular signatures.
- Determine cellular differentiation trajectories.
- Develop biomarkers for disease
Overview of a Single-Cell RNA-Seq Experiment and Analysis
- Full workflow diagram of experiment preparation and analysis steps.
- Includes variance stabilization, normalization, SVD, linear regression, dimensionality reduction, etc.
Single-cell RNA-seq as a Theme for Computational Biology
- Data analysis in single-cell RNA-seq utilizes advanced data structures and algorithms pertinent to genomics.
- Statistical challenges are common in biological sciences.
- Mathematical models of biological mechanisms are critical.
- Methods frequently used in three different computational biology apps.
- Large, readily available single-cell dataset.
Single-cell RNA-seq as a Theme: Limitations
- Computational biology has vast unexplored areas.
- Single-cell RNA-seq is primarily count data; other aspects are not covered
- This course is only a survey; a vast number of topics are skipped.
Discord
- Use Discord platform for general questions, student-to-student and instructor-student discussion.
Additional References
- Provide examples of Simpson's paradox in computational biology.
- Link to relevant resources, like Strang's video lectures on linear algebra.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
This quiz covers the key concepts and methods discussed in the BME 230A course, including single cell RNA sequencing, sequence comparison, and somatic genomics. Assess your understanding of AI applications in biology as well. Prepare to test your comprehension of the introductory materials and course logistics.