Recent Lessons

Show all results for ""

Feature Overview

Ace your exams with our all-in-one platform for creating and sharing quizzes and tests.

Explore our collection of AI-powered tools designed to boost your productivity.

Automatically turn your notes into digital flashcards.

Share, Export & Embed

Share with classmates or export to Excel and your learning management system.

Stats & Reporting

Auto-grading quizzes and tests with detailed stats and reports.

The smarter way to study – wherever you are.

Pricing Schools Business

Login

Features Free Tools Pricing Schools Business

Login Get Started

HTML File Paths and Naming Conventions

8 Questions

0 Views

HTML File Paths and Naming Conventions

Choose a study mode

Play Quiz

Study Flashcards

Spaced Repetition

Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which of the following file paths indicates a document named 'study_gude1.html'?

file:///Users/Documents/study_guide.pdf
file:///Root/System/Important/data.txt
file:///Users/ash/Data_231/Untitled-1.html
file:///Users/ash/Data_231/study_gude1.html (correct)

If a series of files are named 'Untitled-1.html', which attribute is most likely being sequentially updated to differentiate them?

A page or version number (correct)
The user ID
The directory path
The file extension

Based on the file paths, what can be inferred about the user 'ash'?

They have a directory named 'Data_231' for organizing files. (correct)
They do not use the 'Users' directory.
They are exclusively working with system files.
They primarily work with PDF documents.

What is the most likely reason for the multiple files named 'Untitled-1.html'?

<p>To store different drafts or versions of a document. (C)</p> Signup and view all the answers

Which of the following is the most relevant to understanding the context of the listed files?

<p>The sequence and dates of file modifications. (D)</p> Signup and view all the answers

If several files are located under `/Users/ash/Data_231/`, which statement is most likely true?

<p>The files are personal data of the user 'ash'. (A)</p> Signup and view all the answers

Given the series of HTML files 'Untitled-1.html', what does the '.html' suffix indicate?

<p>A hypertext markup document intended for web browsers. (B)</p> Signup and view all the answers

If 'study_gude1.html' and 'Untitled-1.html' exist in the same directory, which is likely true?

<p>'study_gude1.html' is the original and 'Untitled-1.html' is a temporary file. (C)</p> Signup and view all the answers

Flashcards

Not provided

Not provided in the text.

Not provided

Not provided in the text.

Not provided

Not provided in the text.

Not provided

Not provided in the text.

Signup and view all the flashcards

Not provided

Not provided in the text.

Signup and view all the flashcards

Not provided

Not provided in the text.

Signup and view all the flashcards

Not provided

Not provided in the text.

Signup and view all the flashcards

Not provided

Not provided in the text.

Signup and view all the flashcards

Study Notes

Loss Function

Measures the error between predicted values and actual values in a model
The goal is to minimize error by optimizing model parameters

Likelihood and Probability

Likelihood measures how likely a specific set of parameters explains observed data
Probability measures the likelihood of observing an outcome given fixed parameters
Probability deals with data given fixed parameters
Likelihood measures the "fit" of parameters given observed data
For a set of data points X = {x1, x2,..., xn} and parameter θ, the likelihood is L(θ|X) = product from i=1 to n of P(xi|θ)

Maximum Likelihood Estimation (MLE)

Estimates the parameter θ that maximizes the likelihood of observing the given data
θMLE = arg max L(θ|X)

Negative Log-Likelihood (NLL)

Transforms the product of probabilities (likelihood) into a sum for easier optimization
Minimizes NLL instead of maximizing the likelihood
NLL is commonly used as a loss function in classification models, especially in probabilistic models
Minimizing the NLL is equivalent to maximizing the likelihood
min(-log L(θ|X)) is equivalent to max L(θ|X)

Maximum A Posteriori (MAP)

Estimates the parameter θ that maximizes the posterior probability by incorporating prior knowledge
Bayes' Rule defines the posterior as P(θ|X) = P(X|θ)P(θ) / P(X)
Posterior = likelihood × prior / evidence
Use the posterior when incorporating prior knowledge about a parameter
To find the parameter using the posterior, solve θMAP = arg max P(θ|X)

When is a prior useful

Small sample size
Real background knowledge existing outside of current dataset
When the prior serves as a regularizer (e.g., Lasso and Ridge regularization)

Machine Learning Workflow

Data is acquired
A model with parameters is selected
A loss function is optimized to fit the model to the data

Loss Functions

Four types
- Negative Log-Likelihood (NLL)
  - Used in probabilistic models or classification tasks
  - Minimizes the difference between predicted and actual probability distributions
- Sum or Mean Absolute Error (MAE)
  - Appropriate when minimizing large deviations between actual and predicted values (e.g., regression tasks)
  - More robust to outliers than Mean Squared Error (MSE)
- Lasso (L1 Loss/Regularization)
  - Promotes sparsity in the model by forcing some coefficients to zero
  - Great for feature selection
- Ridge (L2 Loss/Regularization)
  - Penalizes large model coefficients without forcing them to zero
  - Prevents overfitting in regression models

Naive Bayes: Parameters, Features, and Labels

Parameters are the underlying values the model learns, like mean and variance in Gaussian Naive Bayes
Features are the input variables to use for prediction
Discrete Labels are the possible categories the model predicts such as spam or not spam

Starting Point of Naive Bayes Classifiers

Start using Bayes' Rule
P(Y|X) = P(X|Y)P(Y) / P(X)
It conditions on the predictor, meaning it calculates the probability of the class given known features

Categorical Naive Bayes

The main assumption is that features are conditionally independent given the class label
This assumption simplifies learning because it computes probabilities independently, reducing computational complexity
Steps for Categorical Naive Bayes:
- Calculate the base rates (priors): P(Y)
- Compute the probability of each class/predictor: P(X|Y)
- Divide the counts of each feature k by the base rates to get the conditional probabilities
The sleep deprivation and symptoms (mild, moderate, severe) example

Pros of Naive Bayes Classification

No extenstive training needed
Relatively fast
Works on both categorical and continuous data
Insensitive to irrelevant data

Cons of Naive Bayes Classification

A zero probability issue exists; a missing categorical variable in training will cause the model to assigns it zero probability
It has a strong independence assumption, but in reality features are often correlated, which affects prediction accuracy
Probabilities can be misleading because the actual values of computed probabilities are often incorrect

When to Use Different Types of Naive Bayes?

Categorical Naive Bayes (CategoricalNB) is used when features are categorical (e.g., presence/absence of symptoms)
Gaussian Naive Bayes (GaussianNB) is used when features are continuous and follow a normal distribution (e.g., height, weight, temperature)
Bernoulli Naive Bayes (BernoulliNB) is used when features are binary (e.g., 0/1 values in text classification)
Multinomial Naive Bayes (MultinomialNB) is used when dealing with count-based data, such as word frequencies in documents (e.g., spam detection)
Complement Naive Bayes (ComplementNB) is used when class imbalances exist, meaning one class has significantly more examples than another (e.g., rare disease detection)
Optimal Naive Bayes (OptimalNB) is used when an optimized form of Naive Bayes is needed, often tuned for specific datasets

Gaussian Naive Bayes (GNB) Assumption

Rather than categorical probabilities, assumes feature values follow a normal (Gaussian) distribution: P(Xj|Y) = N(Xj|µY, σ^2Y), -Xj is the feature -µY is the mean -σ^2Y is the variance of the feature within class Y

Numerical Issues: Underflow & Zero Probability

Underflow occurs when multiplying many small probabilities together; the result can be so small that it rounds to zero due to floating-point limitations
Use the logarithm trick: Instead of computing the product of probabilities, sum their logarithms: log P(Y|X) = log P(Y) + ∑ log P(Xj|Y)
The spam-ham example uses Multinomial Naive Bayes to models word counts
The zero probability problem occurs when a word never appears in a given class in the training dataset, Naive Bayes assigns it zero probability, eliminating that class from consideration
If data missing in the classes are ignored and only ones that do in the spam-ham example are used, the model might be overconfident in its predictions and fail when encountering new words

Solutions to the Zero Probability Problem

Apply Additive Smoothing (Laplace Smoothing)
- Add a small positive number α to all counts: P(Xj|Y) = (count(Xj, Y) + α) / (Σj(count(Xj, Y) + α))
- This method prevents zero probabilities without significantly altering large counts
Adjust Smoothing Parameter α
- Choosing α carefully (commonly α = 1) balances between handling zero probabilities and keeping real probability distributions intact

K-Nearest Neighbors (KNN)

KNN is a non-parametric, instance-based learning algorithm
KNN classifies a new data point based on the majority class of its K nearest neighbors
KNN is a non-parametric algorithm, which means it does not assume a specific functional form for the data
KNN memorizes the training data and makes decisions based on similarity

Increasing/Decreasing K in KNN

Increasing K reduces variance and smooths decision boundaries, but can lead to underfitting
Decreasing K increases variance and makes the model sensitive to noise, but can lead to overfitting

Overfitting

Occurs when a model learns noise rather than patterns in training data
Results in good performance on the training set but poor performance on unseen data
Noise causes the memorization of noise rather than the generalization of data distribution

Underfitting

Occurs when a model is too simple to capture the pattern in the data
Leads to high bias and poor performance on both training and test data

Test Error

Error rate on unseen data
Used to measure a model's generalization ability

Overfitting vs. Underfitting (Train vs. Test Error)

Overfitting: Low training error and high testing error
Underfitting: High training error and high testing error

Cross-Validation (CV)

A technique used to evaluate model performance by splitting data into multiple subsets and training/testing on different parts of the dataset
Prevents overfitting, ensuring model generalizes well to unseen data, and helps select hyperparameters

Train-Test Split

Occurs when splitting data into training and test sets ensures
Model evaluation is done on unseen data
Provides a better measure of generalization.

Leave-One-Out Cross-Validation (LOO-CV)

A special case of k-fold cross-validation where each data point is used as a test set exactly once
Useful for small datasets, but computationally expensive

Regularization

Prevents overfitting
Achieved by adding a penalty to large weights in the model
Controls complexity

L1 vs. L2 Regularization

L1 (Lasso) Regularization encourages sparse features by setting some coefficients to zero for feature selection
- Suitable when many features are irrelevant
L2 (Ridge) Regularization shrinks weights smoothly with no zero coeffficients
- Suitable when all features contribute, but need smaller magnitudes

Bias-Variance Tradeoff

Bias is the error due to simplified assumptions (e.g., underfitting)
Variance is the error due to sensitivity to noise (e.g., overfitting)
The tradeoff is that increasing model complexity lowers bias but increases variance
The goal is to find an optimal balance

Gradient Descent

The gradient is the rate of change of a function with respect to its parameters
Mathematically, the gradient of a function f(x) is ∇f(x) = (∂f/∂x1, ..., ∂f/∂xn)
Gradients on a contour plot the steepest ascent direction, or directions in the direction of increasing function value
With the gradient pointing towards steepest ascent, the function increases which opposes minimization problems

Gradient Descent Algorithm

Basic steps:
- Initialize weights randomly.
- Compute the gradient of the loss function.
- Update parameters in the opposite direction of the gradient.
- Repeat until convergence.
Equation: θ(t+1) = θ(t) – α∇J(θ)
where:
- α = learning rate
- ∇J(θ)) = gradient of the loss function

Learning Rate

Controls how much weights are updated at each step
If too small, convergence is slow
If too large, the model might overshoot and never converge

When is Gradient Descent Valid

The loss function must be differentiable and gradient updates must move the function towards a minimum
To choose the learning rate:
- trial and error, using cross-validation (CV)
- adaptive methods (e.g., Adam, RMSProp)

Step Size

This term determines how far to move in the direction of the gradient
If too large, the model may miss the optimal point
If too small, convergence will be slow

Gradient of the Loss

The Gradient tells how much and the direction to adjust model parameters to minimize the error
Mathematically, if the loss function is J(θ), the gradient is: ∇J(θ) = ∂J/∂θ
Parameter update equation during gradient descent: θ(t+1) = θ(t) – α∇J(θ)

Step Size

Controls how much parameters are adjusted in the direction of the gradient
Small step size yields to slow convergence but stable learning
A large step size yields faster learning but the model may overshoot and not converge
Adaptive step sizes (e.g., Adam optimizer) adjust automatically

Supervised vs. Unsupervised Learning

Supervised learning occurs when a model learns from labeled data, meaning each input has a corresponding known output (label).
- Examples: classification (spam detection, image recognition) and regression (predicting house prices)
Unsupervised learning occurs when a model learns patterns from unlabeled data (i.e., no explicit output labels).
- Examples: clustering (grouping customers based on purchase behavior) and dimensionality reduction (PCA for feature extraction)

Applying Supervised vs. Unsupervised Learning

Supervised learning applies when labeled data and corresponding predictions are desired
Unsupervised learning applies when unlabeled data exists and structure needs to be determined

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Machine Learning Cheat Sheets (PDF)

Description

This lesson explores HTML file paths, naming conventions, and common scenarios like the creation of multiple 'Untitled' files. It covers file differentiation using sequential updates and inferences about user file organization. Understanding file extensions and directory structures is key.

More Like This

File Paths and User Accounts Quiz

3 questions

File Paths and User Accounts Quiz

WellConnectedSunstone

File Paths and Extensions Quiz

8 questions

File Paths and Extensions Quiz

StellarSalmon

Mastering Linux File System Navigation

7 questions

Mastering Linux File System Navigation

CapableAmethyst

Lecture 7: Introduction to HTML

12 questions

Lecture 7: Introduction to HTML

ProperOphicleide

Use Quizgecko on...

Browser