Probability and Statistics Quiz

SnappySetting6426 avatar
SnappySetting6426
·
·
Download

Start Quiz

Study Flashcards

30 Questions

What is the term for the set of all possible outcomes in an experiment?

Sample space

Can two events with non-zero probabilities be both mutually exclusive and independent?

No, it is not possible

What is the term for a normal distribution with a standard deviation of 1 and a mean of 0?

Standard normal distribution

In a standard normal probability distribution, what is the area to the left of the mean?

0.5

The weight of football players is normally distributed with a mean of 200 pounds and a standard deviation of 25 pounds. What is the probability of a player weighing more than 241.25 pounds?

0.0495

What type of distribution is the Poisson probability distribution?

Discrete probability distribution

What is the range of values that the coefficient of determination can take?

Values between 0 and 1

What is the conclusion when testing the hypothesis of slope in question 6?

Reject the null hypothesis

What is the assumption about the variance of error in linear regression?

It is the same for all values of the independent variable

What is the purpose of calculating the R-squared value?

To measure the goodness of fit of the model

What is the interval estimate of the mean value of y for a given value of x?

A prediction interval

What is the correct interpretation of a 95% confidence interval for B1?

The true value of B1 is between the interval estimates

What is the formula for dissimilarity computation between two objects for categorical variables?

D(i, j) = p-m / p

Which measure of deviation is more affected by an outlier in a data set?

Standard deviation

What is the benefit of standardizing the data during clustering analysis?

It makes the variables more comparable

Which of the following is NOT a possible termination condition in K-Means?

When all observations are assigned to a single cluster

What is the main advantage of using mean absolute deviation over standard deviation?

It is less sensitive to outliers

Which of the following is a possible scenario where standardization is not beneficial during clustering analysis?

When the variables have an absolute value

What is the correct name of the library used to build a decision tree model?

DecisionTreeClassifier

Does the Gini Index enforce the resulting tree to have multiway splits?

False

What shape represents chance nodes in a decision tree?

Squares

What is the measure of uncertainty of a random variable in a decision tree?

Entropy

What is the solution to biased trees created by decision tree learners due to dominant classes?

Balance the dataset prior to fitting

What type of tree is needed to predict the price of a house using a decision tree?

Regression Tree

What is the number of clusters formed if a horizontal line is drawn on the y-axis for y=2?

4

Which clustering algorithm primarily uses a merging approach?

Hierarchical

What is the primary use of Hierarchical clustering?

Exploration

Which metric is used to find dissimilarity between two clusters in Hierarchical clustering?

All of the above

What is true about K-means clustering with k=3, when two variables V1 and V2 have a correlation of 1?

The cluster centroids will be in a straight line

Which clustering algorithm works well when the shape of the clusters is hyperspherical?

K-means

Study Notes

Decision Trees

  • Decision tree models are built using DecisionTreeClassifier or DecisionTreeRegressor.
  • Gini Index does not enforce the resulting tree to have multiway splits.
  • Chance nodes are represented by circles.
  • Entropy is the measure of uncertainty of a random variable, characterizing the impurity of an arbitrary collection of examples.
  • End nodes are represented by squares.
  • Decision tree learners may create biased trees if some classes dominate, and the solution is to balance the dataset prior to fitting.

Statistics

  • A standard normal distribution has a mean of 0 and a standard deviation of 1.
  • The area to the left of the mean in a standard normal probability distribution is 0.5.
  • The probability of a player weighing more than 241.25 pounds in a normal distribution with a mean of 200 pounds and a standard deviation of 25 pounds is 0.0495.
  • The probability of a player weighing less than 250 pounds in the same distribution is 0.9772.

Probability

  • Two events having non-zero probabilities cannot be both mutually exclusive and independent.
  • A normal distribution with a standard deviation of 1 and a mean of 0 is a standard normal distribution.

Regression Analysis

  • The coefficient of determination (R-squared) varies from 0 to 1.
  • In a regression analysis, the null hypothesis is rejected if the p-value is less than the significance level.
  • A 95% confidence interval for B1 can be calculated to test hypotheses.

Data Analytics

  • The variance of error is not the same for all values of the independent variable.
  • The interval estimate of the mean value of y (dependent variable) for a given value of x is defined as the predicted value of y plus or minus the margin of error.
  • The formula for dissimilarity computation between two objects for categorical variables is D(i, j) = p-m / p.

Clustering

  • Std deviation (std_f) and mean absolute deviation (s_f) are not equally affected by outliers in a dataset.
  • Standardizing the data is beneficial during clustering analysis.
  • Possible termination conditions in K-Means include a fixed number of iterations, no change in assignment of observations to clusters between iterations, and no change in centroids between successive iterations.
  • Hierarchical clustering uses a merging approach.
  • Hierarchical clustering should primarily be used for exploration.
  • Average-link is not the only metric used for finding dissimilarity between two clusters in hierarchical clustering.
  • K-means clustering is sensitive to the correlation between variables.

CART Model

  • CART is a supervised learning technique.
  • CART adopts a greedy approach.
  • CART is suitable for building decision trees.

Clustering Algorithms

  • K-means clustering works well when the shape of the clusters is hyperspherical.
  • Agglomerative Hierarchical clustering and Divisive Hierarchical clustering are suitable for building hierarchical clusters.

Test your understanding of probability and statistics concepts, including events, experiments, sample spaces, and normal distributions. Evaluate your knowledge of mutually exclusive and independent events, and standard deviations.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free

More Quizzes Like This

Probability Theory BS Stat Chapter 7
6 questions

Probability Theory BS Stat Chapter 7

SelfDeterminationWalnutTree avatar
SelfDeterminationWalnutTree
Statistics and Probability Quiz
30 questions
Probability Theory Fundamentals
12 questions
Probability Theory Fundamentals
6 questions

Probability Theory Fundamentals

StimulativeLeaningTowerOfPisa avatar
StimulativeLeaningTowerOfPisa
Use Quizgecko on...
Browser
Browser