Podcast
Questions and Answers
What virus is associated with nearly all cervical cancers?
What virus is associated with nearly all cervical cancers?
- Hepatitis B virus (HBV)
- Human immunodeficiency virus (HIV)
- Human papilloma virus (HPV) (correct)
- Epstein-Barr virus (EBV)
In what year did Jonas Salk use HeLa cells to develop the polio vaccine?
In what year did Jonas Salk use HeLa cells to develop the polio vaccine?
- 1943
- 1953 (correct)
- 1973
- 1963
What is the name of the protein that is known as the guardian of the genome?
What is the name of the protein that is known as the guardian of the genome?
- KRAS
- BRCA1
- VEGF
- p53 (correct)
What is the typical number of chromosomes in normal human cells?
What is the typical number of chromosomes in normal human cells?
What laboratory was HeLa originally obtained?
What laboratory was HeLa originally obtained?
What type of cancer did Henrietta Lacks have?
What type of cancer did Henrietta Lacks have?
What is the name of the enzyme that rebuilds telomeres in HeLa cells?
What is the name of the enzyme that rebuilds telomeres in HeLa cells?
Who established the first cell line of HeLa cells?
Who established the first cell line of HeLa cells?
What causes cells to age and ultimately undergo apoptosis or cell death?
What causes cells to age and ultimately undergo apoptosis or cell death?
What unique ability do HeLa cells possess that most normal cells do not?
What unique ability do HeLa cells possess that most normal cells do not?
Flashcards
HeLa cell chromosome count
HeLa cell chromosome count
Normal human cells have 46 chromosomes, while HeLa cells have 76 to 80 heavily mutated chromosomes.
HeLa cell telomerase activity
HeLa cell telomerase activity
In normal cell division, telomeres (DNA at chromosome tips) shorten, leading to aging and cell death. HeLa cells have an overactive telomerase enzyme that rebuilds telomeres.
HeLa growth rate
HeLa growth rate
HeLa cells can still grow unusually fast within 24 hours of culturing the first HeLa sample.
HeLa cell division
HeLa cell division
Signup and view all the flashcards
HeLa and the polio vaccine
HeLa and the polio vaccine
Signup and view all the flashcards
Study Notes
Algorithmic Complexity
- Algorithmic complexity measures the resources an algorithm uses to solve a problem.
- Big O notation is used to express the upper bound of resource usage growth rate as input size increases.
- It is a way to express how fast an algorithm runs in terms of input size.
Big O Examples
- O(1): Excellent
- O(log n): Great
- O(n): Good
- O(n log n): Fair
- O(n^2): Bad
- O(2^n): Horrible
- O(n!): Nightmare
Importance
- Performance: Helps understand algorithm performance with large inputs.
- Scalability: Determines if an algorithm can handle increasing data.
- Optimization: Guides in choosing the most efficient algorithm.
- Resource Management: Enables estimating computation resource requirements.
- Proper algorithms for specific use cases are important in software engineering.
How To Calculate
- Identify the operations performed most frequently as the input size grows.
- Count the number of times dominant operations are executed, expressed as a function of input size (n).
- Express the complexity using Big O notation.
- Drop constant factors and lower-order terms, focusing on the highest-order term.
Big O Examples
O(1)
- Example:
def get_first_element(list):
return list
- Only one operation is performed, regardless of list size.
O(n)
def print_all_elements(list):
for element in list:
print(element)
- One operation is performed for each element added to the list.
O(n^2)
def print_all_possible_pairs(list):
for element1 in list:
for element2 in list:
print(element1, element2)
- n operations are performed for each element in the list.
Machine Learning
- Machine Learning is automatically detecting patterns in data.
- It utilizes patterns to predict future data.
Machine Learning Examples
- Predicting customer ad clicks
- Predicting house prices
- Identifying blog topics
- Detecting spam emails
- Recognizing faces in images
- Recommending movies/products
ML Types
- Supervised Learning: Training data includes desired outputs.
- Unsupervised Learning: Training data does not include desired outputs.
- Reinforcement Learning: An agent learns to maximize a reward in an environment.
Supervised Learning
- Aims to predict a target variable $y$ given input variables $x$.
- Training data includes both inputs and desired outputs.
Supervised Learning Types
- Regression: Target variable is continuous.
- Classification: Target variable is discrete.
Regression Examples
- Predicting house prices
- Predicting tomorrow's temperature
Classification Examples
- Predicting customer ad clicks
- Identifying blog topics
- Detecting spam emails
- Recognizing faces in images
Unsupervised Learning
- Purpose is to discover patterns in data.
- Training data lacks desired outputs.
Unsupervised Learning Types
- Clustering: Grouping similar data points.
- Dimensionality Reduction: Reducing the number of variables to represent data.
- Anomaly Detection: Identifying unusual data points.
Clustering Examples
- Grouping customers by behavior
- Grouping documents by content
Dimensionality Reduction Examples
- Reducing variables to represent images
- Reducing variables for gene expression data
Anomaly Detection Examples
- Detecting fraudulent transactions
- Detecting malfunctioning equipment
Reinforcement Learning
- It aims to train an agent for optimal action in an environment to maximize a reward.
- The agent learns via trial and error and receives reward or punishment feedback.
Reinforcement Learning Examples
- Training a robot to walk
- Training a program to play chess
- Training a program to trade stocks
Supervised Learning Details
- Given training data $(\mathbf{x}_1, y_1), (\mathbf{x}_2, y_2), \dots, (\mathbf{x}_N, y_N)$, where $\mathbf{x}_i$ is the input and $y_i$ is the desired output.
- Learn a function $f(\mathbf{x})$ to map inputs to outputs.
Loss Function
- Measures the difference between the predicted output $f(\mathbf{x})$ and the desired output $y$.
Loss Function Examples
- Regression: Squared error loss: $L(y, f(\mathbf{x})) = (y - f(\mathbf{x}))^2$
- Classification: 0-1 loss: $L(y, f(\mathbf{x})) = \begin{cases} 0 & \text{if } y = f(\mathbf{x}) \ 1 & \text{if } y \ne f(\mathbf{x}) \end{cases}$
Goal
- Minimize the average loss over the training data: $\min_f \frac{1}{N} \sum_{i=1}^N L(y_i, f(\mathbf{x}_i))$
Python Data Visualization (Lab 4)
Objectives
- Learn to create data visualizations using
matplotlib
andseaborn
. - Understanding the appropriate visualizations for data types.
- Gain experience in customizing visualizations.
Materials Needed
- Python 3.x
matplotlib
libraryseaborn
library- Jupyter Notebook
- Dataset (CSV file)
Introduction
- Data visualization helps in understanding patterns and relationships in data.
- Using Python's
matplotlib
andseaborn
libraries allows for creating visualizations.
Setting Up Environment
-
Install
matplotlib
andseaborn
using pip:pip install matplotlib seaborn
-
For Jupyter Notebook, import libraries and set up inline plotting:
import matplotlib.pyplot as plt
import seaborn as sns
## For inline plotting in Jupyter Notebook
%matplotlib inline
Loading Data
- Load dataset into a pandas DataFrame:
import pandas as pd
data = pd.read_csv('your_data_file.csv')
Basic Plots With Matplotlib
Line Plot
- Line plots Useful for showing trends over time or continuous variables.
plt.figure(figsize=(10, 6))
plt.plot(data['x'], data['y'], marker='o', linestyle='-')
plt.title('Line Plot of X vs Y')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.grid(True)
plt.show()
Scatter Plot
- Used to display the relationship between to continuous variables.
plt.figure(figsize=(8, 6))
plt.scatter(data['x'], data['y'], color='red', alpha=0.5)
plt.title('Scatter Plot of X vs Y')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()
Bar Plot
- Bar plots are used to compare categorical data.
category_counts = data['category'].value_counts()
plt.figure(figsize=(8, 6))
plt.bar(category_counts.index, category_counts.values, color='skyblue')
plt.title('Bar Plot of Category Counts')
plt.xlabel('Category')
plt.ylabel('Count')
plt.xticks(rotation=45, ha='right') # Rotate x-axis labels for readability
plt.show()
Histogram
- Used to display the distribution of a single variable.
plt.figure(figsize=(8, 6))
plt.hist(data['value'], bins=20, color='lightgreen', edgecolor='black')
plt.title('Histogram of Value')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()
Advanced Plots with Seaborn
- Seaborn offers a high-level interface for creating aesthetically pleasing visualizations.
Displot
- Versatile; creates histograms and KDE plots.
sns.displot(data['value'], kde=True, color='purple')
plt.title('Displot of Value')
plt.xlabel('Value')
plt.ylabel('Density')
plt.show()
Scatter Plot with Regression Line
sns.regplot(x='x', y='y', data=data, scatter_kws={'alpha':0.6}, line_kws={'color':'red'})
plt.title('Scatter Plot with Regression Line')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()
Box Plot
- Used to the display the distribution of data across categories.
sns.boxplot(x='category', y='value', data=data, color='orange')
plt.title('Box Plot of Value by Category')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()
Violin Plot
- Similar to box plots, which provide more distribution information.
sns.violinplot(x='category', y='value', data=data, color='lightgreen')
plt.title('Violin Plot of Value by Category')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()
Heatmap
- Used to the display correlation between multiple variables.
correlation_matrix = data.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()
Pair Plot
- Used to the visualize the relationships between multiple variables.
sns.pairplot(data, hue='category')
plt.suptitle('Pair Plot of Data', y=1.02) # Adjust title position
plt.show()
Customizing Visualizations
- Customizing visualizations is essential.
Customization Options
- Titles and Labels: needed to indicate what the plot represents.
- Colors: Ensure consistency with color palettes.
- Annotations: Needed to highlight specific data points or trends.
- Legends: Include legends to identify different groups.
- Axis Limits: Needed to adjust the limits for relevant portions of the data.
plt.figure(figsize=(10, 6))
plt.plot(data['x'], data['y'], marker='o', linestyle='-', color='green', label='Data')
plt.title('Customized Line Plot of X vs Y', fontsize=16)
plt.xlabel('X Axis', fontsize=14)
plt.ylabel('Y Axis', fontsize=14)
plt.grid(True)
plt.legend()
plt.xlim(0, 10) # Set x axis limits
plt.ylim(0, 100) # Set y axis limits
plt.text(2, 80, 'Important Point', fontsize=12, color='red') # Add annotation
plt.show()
Lab 4 Exercises
- Load a dataset (e.g., Kaggle, UCI Machine Learning).
- Create a line plot showing the trend of a continuous variable over time.
- Create a scatter plot showing the relationship between two continuous variables.
- Create a bar plot to compare the counts of different categories.
- Create a histogram to visualize the distribution of a single variable.
- Create a box plot to compare the distribution of a variable across different categories.
- Create a heatmap to visualize the correlation between multiple variables.
- Customize the visualizations.
Conclusion
- This lab detailed how to create data visualizations using
matplotlib
andseaborn
.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.