HTML File Paths and Naming Conventions
8 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which of the following file paths indicates a document named 'study_gude1.html'?

  • file:///Users/Documents/study_guide.pdf
  • file:///Root/System/Important/data.txt
  • file:///Users/ash/Data_231/Untitled-1.html
  • file:///Users/ash/Data_231/study_gude1.html (correct)

If a series of files are named 'Untitled-1.html', which attribute is most likely being sequentially updated to differentiate them?

  • A page or version number (correct)
  • The user ID
  • The directory path
  • The file extension

Based on the file paths, what can be inferred about the user 'ash'?

  • They have a directory named 'Data_231' for organizing files. (correct)
  • They do not use the 'Users' directory.
  • They are exclusively working with system files.
  • They primarily work with PDF documents.

What is the most likely reason for the multiple files named 'Untitled-1.html'?

<p>To store different drafts or versions of a document. (C)</p> Signup and view all the answers

Which of the following is the most relevant to understanding the context of the listed files?

<p>The sequence and dates of file modifications. (D)</p> Signup and view all the answers

If several files are located under /Users/ash/Data_231/, which statement is most likely true?

<p>The files are personal data of the user 'ash'. (A)</p> Signup and view all the answers

Given the series of HTML files 'Untitled-1.html', what does the '.html' suffix indicate?

<p>A hypertext markup document intended for web browsers. (B)</p> Signup and view all the answers

If 'study_gude1.html' and 'Untitled-1.html' exist in the same directory, which is likely true?

<p>'study_gude1.html' is the original and 'Untitled-1.html' is a temporary file. (C)</p> Signup and view all the answers

Flashcards

Not provided

Not provided in the text.

Not provided

Not provided in the text.

Not provided

Not provided in the text.

Not provided

Not provided in the text.

Signup and view all the flashcards

Not provided

Not provided in the text.

Signup and view all the flashcards

Not provided

Not provided in the text.

Signup and view all the flashcards

Not provided

Not provided in the text.

Signup and view all the flashcards

Not provided

Not provided in the text.

Signup and view all the flashcards

Study Notes

Loss Function

  • Measures the error between predicted values and actual values in a model
  • The goal is to minimize error by optimizing model parameters

Likelihood and Probability

  • Likelihood measures how likely a specific set of parameters explains observed data
  • Probability measures the likelihood of observing an outcome given fixed parameters
  • Probability deals with data given fixed parameters
  • Likelihood measures the "fit" of parameters given observed data
  • For a set of data points X = {x1, x2,..., xn} and parameter θ, the likelihood is L(θ|X) = product from i=1 to n of P(xi|θ)

Maximum Likelihood Estimation (MLE)

  • Estimates the parameter θ that maximizes the likelihood of observing the given data
  • θMLE = arg max L(θ|X)

Negative Log-Likelihood (NLL)

  • Transforms the product of probabilities (likelihood) into a sum for easier optimization
  • Minimizes NLL instead of maximizing the likelihood
  • NLL is commonly used as a loss function in classification models, especially in probabilistic models
  • Minimizing the NLL is equivalent to maximizing the likelihood
  • min(-log L(θ|X)) is equivalent to max L(θ|X)

Maximum A Posteriori (MAP)

  • Estimates the parameter θ that maximizes the posterior probability by incorporating prior knowledge
  • Bayes' Rule defines the posterior as P(θ|X) = P(X|θ)P(θ) / P(X)
  • Posterior = likelihood × prior / evidence
  • Use the posterior when incorporating prior knowledge about a parameter
  • To find the parameter using the posterior, solve θMAP = arg max P(θ|X)

When is a prior useful

  • Small sample size
  • Real background knowledge existing outside of current dataset
  • When the prior serves as a regularizer (e.g., Lasso and Ridge regularization)

Machine Learning Workflow

  • Data is acquired
  • A model with parameters is selected
  • A loss function is optimized to fit the model to the data

Loss Functions

  • Four types
    • Negative Log-Likelihood (NLL)
      • Used in probabilistic models or classification tasks
      • Minimizes the difference between predicted and actual probability distributions
    • Sum or Mean Absolute Error (MAE)
      • Appropriate when minimizing large deviations between actual and predicted values (e.g., regression tasks)
      • More robust to outliers than Mean Squared Error (MSE)
    • Lasso (L1 Loss/Regularization)
      • Promotes sparsity in the model by forcing some coefficients to zero
      • Great for feature selection
    • Ridge (L2 Loss/Regularization)
      • Penalizes large model coefficients without forcing them to zero
      • Prevents overfitting in regression models

Naive Bayes: Parameters, Features, and Labels

  • Parameters are the underlying values the model learns, like mean and variance in Gaussian Naive Bayes
  • Features are the input variables to use for prediction
  • Discrete Labels are the possible categories the model predicts such as spam or not spam

Starting Point of Naive Bayes Classifiers

  • Start using Bayes' Rule
  • P(Y|X) = P(X|Y)P(Y) / P(X)
  • It conditions on the predictor, meaning it calculates the probability of the class given known features

Categorical Naive Bayes

  • The main assumption is that features are conditionally independent given the class label
  • This assumption simplifies learning because it computes probabilities independently, reducing computational complexity
  • Steps for Categorical Naive Bayes:
    • Calculate the base rates (priors): P(Y)
    • Compute the probability of each class/predictor: P(X|Y)
    • Divide the counts of each feature k by the base rates to get the conditional probabilities
  • The sleep deprivation and symptoms (mild, moderate, severe) example

Pros of Naive Bayes Classification

  • No extenstive training needed
  • Relatively fast
  • Works on both categorical and continuous data
  • Insensitive to irrelevant data

Cons of Naive Bayes Classification

  • A zero probability issue exists; a missing categorical variable in training will cause the model to assigns it zero probability
  • It has a strong independence assumption, but in reality features are often correlated, which affects prediction accuracy
  • Probabilities can be misleading because the actual values of computed probabilities are often incorrect

When to Use Different Types of Naive Bayes?

  • Categorical Naive Bayes (CategoricalNB) is used when features are categorical (e.g., presence/absence of symptoms)
  • Gaussian Naive Bayes (GaussianNB) is used when features are continuous and follow a normal distribution (e.g., height, weight, temperature)
  • Bernoulli Naive Bayes (BernoulliNB) is used when features are binary (e.g., 0/1 values in text classification)
  • Multinomial Naive Bayes (MultinomialNB) is used when dealing with count-based data, such as word frequencies in documents (e.g., spam detection)
  • Complement Naive Bayes (ComplementNB) is used when class imbalances exist, meaning one class has significantly more examples than another (e.g., rare disease detection)
  • Optimal Naive Bayes (OptimalNB) is used when an optimized form of Naive Bayes is needed, often tuned for specific datasets

Gaussian Naive Bayes (GNB) Assumption

  • Rather than categorical probabilities, assumes feature values follow a normal (Gaussian) distribution: P(Xj|Y) = N(Xj|µY, σ^2Y), -Xj is the feature -µY is the mean -σ^2Y is the variance of the feature within class Y

Numerical Issues: Underflow & Zero Probability

  • Underflow occurs when multiplying many small probabilities together; the result can be so small that it rounds to zero due to floating-point limitations
  • Use the logarithm trick: Instead of computing the product of probabilities, sum their logarithms: log P(Y|X) = log P(Y) + ∑ log P(Xj|Y)
  • The spam-ham example uses Multinomial Naive Bayes to models word counts
  • The zero probability problem occurs when a word never appears in a given class in the training dataset, Naive Bayes assigns it zero probability, eliminating that class from consideration
  • If data missing in the classes are ignored and only ones that do in the spam-ham example are used, the model might be overconfident in its predictions and fail when encountering new words

Solutions to the Zero Probability Problem

  • Apply Additive Smoothing (Laplace Smoothing)
    • Add a small positive number α to all counts: P(Xj|Y) = (count(Xj, Y) + α) / (Σj(count(Xj, Y) + α))
    • This method prevents zero probabilities without significantly altering large counts
  • Adjust Smoothing Parameter α
    • Choosing α carefully (commonly α = 1) balances between handling zero probabilities and keeping real probability distributions intact

K-Nearest Neighbors (KNN)

  • KNN is a non-parametric, instance-based learning algorithm
  • KNN classifies a new data point based on the majority class of its K nearest neighbors
  • KNN is a non-parametric algorithm, which means it does not assume a specific functional form for the data
  • KNN memorizes the training data and makes decisions based on similarity

Increasing/Decreasing K in KNN

  • Increasing K reduces variance and smooths decision boundaries, but can lead to underfitting
  • Decreasing K increases variance and makes the model sensitive to noise, but can lead to overfitting

Overfitting

  • Occurs when a model learns noise rather than patterns in training data
  • Results in good performance on the training set but poor performance on unseen data
  • Noise causes the memorization of noise rather than the generalization of data distribution

Underfitting

  • Occurs when a model is too simple to capture the pattern in the data
  • Leads to high bias and poor performance on both training and test data

Test Error

  • Error rate on unseen data
  • Used to measure a model's generalization ability

Overfitting vs. Underfitting (Train vs. Test Error)

  • Overfitting: Low training error and high testing error
  • Underfitting: High training error and high testing error

Cross-Validation (CV)

  • A technique used to evaluate model performance by splitting data into multiple subsets and training/testing on different parts of the dataset
  • Prevents overfitting, ensuring model generalizes well to unseen data, and helps select hyperparameters

Train-Test Split

  • Occurs when splitting data into training and test sets ensures
  • Model evaluation is done on unseen data
  • Provides a better measure of generalization.

Leave-One-Out Cross-Validation (LOO-CV)

  • A special case of k-fold cross-validation where each data point is used as a test set exactly once
  • Useful for small datasets, but computationally expensive

Regularization

  • Prevents overfitting
  • Achieved by adding a penalty to large weights in the model
  • Controls complexity

L1 vs. L2 Regularization

  • L1 (Lasso) Regularization encourages sparse features by setting some coefficients to zero for feature selection
    • Suitable when many features are irrelevant
  • L2 (Ridge) Regularization shrinks weights smoothly with no zero coeffficients
    • Suitable when all features contribute, but need smaller magnitudes

Bias-Variance Tradeoff

  • Bias is the error due to simplified assumptions (e.g., underfitting)
  • Variance is the error due to sensitivity to noise (e.g., overfitting)
  • The tradeoff is that increasing model complexity lowers bias but increases variance
  • The goal is to find an optimal balance

Gradient Descent

  • The gradient is the rate of change of a function with respect to its parameters
  • Mathematically, the gradient of a function f(x) is ∇f(x) = (∂f/∂x1, ..., ∂f/∂xn)
  • Gradients on a contour plot the steepest ascent direction, or directions in the direction of increasing function value
  • With the gradient pointing towards steepest ascent, the function increases which opposes minimization problems

Gradient Descent Algorithm

  • Basic steps:
    • Initialize weights randomly.
    • Compute the gradient of the loss function.
    • Update parameters in the opposite direction of the gradient.
    • Repeat until convergence.
  • Equation: θ(t+1) = θ(t) – α∇J(θ)
  • where:
    • α = learning rate
    • ∇J(θ)) = gradient of the loss function

Learning Rate

  • Controls how much weights are updated at each step
  • If too small, convergence is slow
  • If too large, the model might overshoot and never converge

When is Gradient Descent Valid

  • The loss function must be differentiable and gradient updates must move the function towards a minimum
  • To choose the learning rate:
    • trial and error, using cross-validation (CV)
    • adaptive methods (e.g., Adam, RMSProp)

Step Size

  • This term determines how far to move in the direction of the gradient
  • If too large, the model may miss the optimal point
  • If too small, convergence will be slow

Gradient of the Loss

  • The Gradient tells how much and the direction to adjust model parameters to minimize the error
  • Mathematically, if the loss function is J(θ), the gradient is: ∇J(θ) = ∂J/∂θ
  • Parameter update equation during gradient descent: θ(t+1) = θ(t) – α∇J(θ)

Step Size

  • Controls how much parameters are adjusted in the direction of the gradient
  • Small step size yields to slow convergence but stable learning
  • A large step size yields faster learning but the model may overshoot and not converge
  • Adaptive step sizes (e.g., Adam optimizer) adjust automatically

Supervised vs. Unsupervised Learning

  • Supervised learning occurs when a model learns from labeled data, meaning each input has a corresponding known output (label).
    • Examples: classification (spam detection, image recognition) and regression (predicting house prices)
  • Unsupervised learning occurs when a model learns patterns from unlabeled data (i.e., no explicit output labels).
    • Examples: clustering (grouping customers based on purchase behavior) and dimensionality reduction (PCA for feature extraction)

Applying Supervised vs. Unsupervised Learning

  • Supervised learning applies when labeled data and corresponding predictions are desired
  • Unsupervised learning applies when unlabeled data exists and structure needs to be determined

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Description

This lesson explores HTML file paths, naming conventions, and common scenarios like the creation of multiple 'Untitled' files. It covers file differentiation using sequential updates and inferences about user file organization. Understanding file extensions and directory structures is key.

More Like This

File Paths and User Accounts Quiz
3 questions
File Paths and Extensions Quiz
8 questions
Lecture 7: Introduction to HTML
12 questions
Use Quizgecko on...
Browser
Browser