Understanding Algorithmic Complexity

Algorithmic Complexity

Algorithmic complexity measures the resources an algorithm uses to solve a problem.
Big O notation is used to express the upper bound of resource usage growth rate as input size increases.
It is a way to express how fast an algorithm runs in terms of input size.

Big O Examples

O(1): Excellent
O(log n): Great
O(n): Good
O(n log n): Fair
O(n^2): Bad
O(2^n): Horrible
O(n!): Nightmare

Importance

Performance: Helps understand algorithm performance with large inputs.
Scalability: Determines if an algorithm can handle increasing data.
Optimization: Guides in choosing the most efficient algorithm.
Resource Management: Enables estimating computation resource requirements.
Proper algorithms for specific use cases are important in software engineering.

How To Calculate

Identify the operations performed most frequently as the input size grows.
Count the number of times dominant operations are executed, expressed as a function of input size (n).
Express the complexity using Big O notation.
Drop constant factors and lower-order terms, focusing on the highest-order term.

Big O Examples

O(1)

Example:

def get_first_element(list):
    return list

Only one operation is performed, regardless of list size.

O(n)

def print_all_elements(list):
    for element in list:
        print(element)

One operation is performed for each element added to the list.

O(n^2)

def print_all_possible_pairs(list):
    for element1 in list:
        for element2 in list:
            print(element1, element2)

n operations are performed for each element in the list.

Machine Learning

Machine Learning is automatically detecting patterns in data.
It utilizes patterns to predict future data.

Machine Learning Examples

Predicting customer ad clicks
Predicting house prices
Identifying blog topics
Detecting spam emails
Recognizing faces in images
Recommending movies/products

ML Types

Supervised Learning: Training data includes desired outputs.
Unsupervised Learning: Training data does not include desired outputs.
Reinforcement Learning: An agent learns to maximize a reward in an environment.

Supervised Learning

Aims to predict a target variable $y$ given input variables $x$.
Training data includes both inputs and desired outputs.

Supervised Learning Types

Regression: Target variable is continuous.
Classification: Target variable is discrete.

Regression Examples

Predicting house prices
Predicting tomorrow's temperature

Classification Examples

Predicting customer ad clicks
Identifying blog topics
Detecting spam emails
Recognizing faces in images

Unsupervised Learning

Purpose is to discover patterns in data.
Training data lacks desired outputs.

Unsupervised Learning Types

Clustering: Grouping similar data points.
Dimensionality Reduction: Reducing the number of variables to represent data.
Anomaly Detection: Identifying unusual data points.

Clustering Examples

Grouping customers by behavior
Grouping documents by content

Dimensionality Reduction Examples

Reducing variables to represent images
Reducing variables for gene expression data

Anomaly Detection Examples

Detecting fraudulent transactions
Detecting malfunctioning equipment

Reinforcement Learning

It aims to train an agent for optimal action in an environment to maximize a reward.
The agent learns via trial and error and receives reward or punishment feedback.

Reinforcement Learning Examples

Training a robot to walk
Training a program to play chess
Training a program to trade stocks

Supervised Learning Details

Given training data $(\mathbf{x}_1, y_1), (\mathbf{x}_2, y_2), \dots, (\mathbf{x}_N, y_N)$, where $\mathbf{x}_i$ is the input and $y_i$ is the desired output.
Learn a function $f(\mathbf{x})$ to map inputs to outputs.

Loss Function

Measures the difference between the predicted output $f(\mathbf{x})$ and the desired output $y$.

Loss Function Examples

Regression: Squared error loss: $L(y, f(\mathbf{x})) = (y - f(\mathbf{x}))^2$
Classification: 0-1 loss: $L(y, f(\mathbf{x})) = \begin{cases} 0 & \text{if } y = f(\mathbf{x}) \ 1 & \text{if } y \ne f(\mathbf{x}) \end{cases}$

Goal

Minimize the average loss over the training data: $\min_f \frac{1}{N} \sum_{i=1}^N L(y_i, f(\mathbf{x}_i))$

Python Data Visualization (Lab 4)

Objectives

Learn to create data visualizations using matplotlib and seaborn.
Understanding the appropriate visualizations for data types.
Gain experience in customizing visualizations.

Materials Needed

Python 3.x
matplotlib library
seaborn library
Jupyter Notebook
Dataset (CSV file)

Introduction

Data visualization helps in understanding patterns and relationships in data.
Using Python's matplotlib and seaborn libraries allows for creating visualizations.

Setting Up Environment

Install matplotlib and seaborn using pip: pip install matplotlib seaborn
For Jupyter Notebook, import libraries and set up inline plotting:

import matplotlib.pyplot as plt
import seaborn as sns

## For inline plotting in Jupyter Notebook
%matplotlib inline

Loading Data

Load dataset into a pandas DataFrame:

import pandas as pd

data = pd.read_csv('your_data_file.csv')

Basic Plots With Matplotlib

Line Plot

Line plots Useful for showing trends over time or continuous variables.

plt.figure(figsize=(10, 6))  
plt.plot(data['x'], data['y'], marker='o', linestyle='-')
plt.title('Line Plot of X vs Y')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.grid(True) 
plt.show()

Scatter Plot

Used to display the relationship between to continuous variables.

plt.figure(figsize=(8, 6))
plt.scatter(data['x'], data['y'], color='red', alpha=0.5)
plt.title('Scatter Plot of X vs Y')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()

Bar Plot

Bar plots are used to compare categorical data.

category_counts = data['category'].value_counts()
plt.figure(figsize=(8, 6))
plt.bar(category_counts.index, category_counts.values, color='skyblue')
plt.title('Bar Plot of Category Counts')
plt.xlabel('Category')
plt.ylabel('Count')
plt.xticks(rotation=45, ha='right')  # Rotate x-axis labels for readability
plt.show()

Histogram

Used to display the distribution of a single variable.

plt.figure(figsize=(8, 6))
plt.hist(data['value'], bins=20, color='lightgreen', edgecolor='black')
plt.title('Histogram of Value')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

Advanced Plots with Seaborn

Seaborn offers a high-level interface for creating aesthetically pleasing visualizations.

Displot

Versatile; creates histograms and KDE plots.

sns.displot(data['value'], kde=True, color='purple')
plt.title('Displot of Value')
plt.xlabel('Value')
plt.ylabel('Density')
plt.show()

Scatter Plot with Regression Line

sns.regplot(x='x', y='y', data=data, scatter_kws={'alpha':0.6}, line_kws={'color':'red'})
plt.title('Scatter Plot with Regression Line')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()

Box Plot

Used to the display the distribution of data across categories.

sns.boxplot(x='category', y='value', data=data, color='orange')
plt.title('Box Plot of Value by Category')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()

Violin Plot

Similar to box plots, which provide more distribution information.

sns.violinplot(x='category', y='value', data=data, color='lightgreen')
plt.title('Violin Plot of Value by Category')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()

Heatmap

Used to the display correlation between multiple variables.

correlation_matrix = data.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

Pair Plot

Used to the visualize the relationships between multiple variables.

sns.pairplot(data, hue='category')
plt.suptitle('Pair Plot of Data', y=1.02)  # Adjust title position
plt.show()

Customizing Visualizations

Customizing visualizations is essential.

Customization Options

Titles and Labels: needed to indicate what the plot represents.
Colors: Ensure consistency with color palettes.
Annotations: Needed to highlight specific data points or trends.
Legends: Include legends to identify different groups.
Axis Limits: Needed to adjust the limits for relevant portions of the data.

plt.figure(figsize=(10, 6))
plt.plot(data['x'], data['y'], marker='o', linestyle='-', color='green', label='Data')
plt.title('Customized Line Plot of X vs Y', fontsize=16)
plt.xlabel('X Axis', fontsize=14)
plt.ylabel('Y Axis', fontsize=14)
plt.grid(True)
plt.legend()
plt.xlim(0, 10) # Set x axis limits
plt.ylim(0, 100) # Set y axis limits
plt.text(2, 80, 'Important Point', fontsize=12, color='red') # Add annotation
plt.show()

Lab 4 Exercises

Load a dataset (e.g., Kaggle, UCI Machine Learning).
Create a line plot showing the trend of a continuous variable over time.
Create a scatter plot showing the relationship between two continuous variables.
Create a bar plot to compare the counts of different categories.
Create a histogram to visualize the distribution of a single variable.
Create a box plot to compare the distribution of a variable across different categories.
Create a heatmap to visualize the correlation between multiple variables.
Customize the visualizations.

Conclusion

This lab detailed how to create data visualizations using matplotlib and seaborn.

Understanding Algorithmic Complexity

Choose a study mode

Podcast

Questions and Answers

What virus is associated with nearly all cervical cancers?

In what year did Jonas Salk use HeLa cells to develop the polio vaccine?

What is the name of the protein that is known as the guardian of the genome?

What is the typical number of chromosomes in normal human cells?

What laboratory was HeLa originally obtained?

What type of cancer did Henrietta Lacks have?

What is the name of the enzyme that rebuilds telomeres in HeLa cells?

Who established the first cell line of HeLa cells?

What causes cells to age and ultimately undergo apoptosis or cell death?

What unique ability do HeLa cells possess that most normal cells do not?

Flashcards

HeLa cell chromosome count

HeLa cell telomerase activity

HeLa growth rate

HeLa cell division

HeLa and the polio vaccine

Study Notes

Algorithmic Complexity

Big O Examples

Importance

How To Calculate

Big O Examples

O(1)

O(n)

O(n^2)

Machine Learning

Machine Learning Examples

ML Types

Supervised Learning

Supervised Learning Types

Regression Examples

Classification Examples

Unsupervised Learning

Unsupervised Learning Types

Clustering Examples

Dimensionality Reduction Examples

Anomaly Detection Examples

Reinforcement Learning

Reinforcement Learning Examples

Supervised Learning Details

Loss Function

Loss Function Examples

Goal

Python Data Visualization (Lab 4)

Objectives

Materials Needed

Introduction

Setting Up Environment

Loading Data

Basic Plots With Matplotlib

Line Plot

Scatter Plot

Bar Plot

Histogram

Advanced Plots with Seaborn

Displot

Scatter Plot with Regression Line

Box Plot

Violin Plot

Heatmap

Pair Plot

Customizing Visualizations

Customization Options

Lab 4 Exercises

Conclusion

Studying That Suits You

More Like This

Algorithms & Complexity: Big-O, Omega, Theta Notations Quiz

Search Algorithms and Complexity

Algorithmic Complexity

Algorithmic Complexity