Machine Learning Unit-2 Quiz

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

What is the term used for the 100-quantiles?

Percentiles

What is the purpose of quartiles?

Quartiles provide information about a distribution's center, spread, and shape.

What is the most widely used form of quantiles?

Median, quartiles, and percentiles

What is the relationship between the first quartile (Q1) and the 25th percentile?

They are the same. Signup and view all the answers

What does the third quartile (Q3) represent?

The 75th percentile Signup and view all the answers

What does the interquartile range (IQR) measure?

The spread of the middle half of the data Signup and view all the answers

What is the formula for calculating the interquartile range (IQR)?

IQR = Q3 - Q1 Signup and view all the answers

Given the salary data (in thousands of dollars): 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110. What is the IQR for this data?

18 Signup and view all the answers

What is the median of the weights of the boxes of raisins?

The median is 32 grams. Signup and view all the answers

Identify the first quartile (Q1) from the weights of the boxes of raisins.

The first quartile (Q1) is 29 grams. Signup and view all the answers

What is the maximum weight of the boxes of raisins?

The maximum weight is 38 grams. Signup and view all the answers

Calculate the range of the weights of the boxes of raisins.

The range is 13 grams. Signup and view all the answers

Construct the lower hinge for the box plot from the given monetary data.

The lower hinge (minimum) is $19. Signup and view all the answers

What is the third quartile (Q3) for the hourly collections from the Salvation Army kettle?

The third quartile (Q3) is $34. Signup and view all the answers

Using the hourly collections, what is the sum of the values between Q1 and Q3?

The sum is $61. Signup and view all the answers

How many data points lie above the median in the hourly collections?

There are 6 data points above the median. Signup and view all the answers

What distinguishes ordinal data from nominal data?

Ordinal data can be arranged in a meaningful order based on assigned values, while nominal data cannot. Signup and view all the answers

Why can't mathematical operations be performed on nominal data?

Mathematical operations cannot be performed on nominal data because it represents categories without inherent numerical value. Signup and view all the answers

What is one key feature of interval data that makes it different from ratio data?

Interval data does not have a true zero point, whereas ratio data does. Signup and view all the answers

Identify an example of ordinal data and explain its order.

An example of ordinal data is customer satisfaction levels like ‘Very Happy’, ‘Happy’, and ‘Unhappy’, which can be ranked from best to worst. Signup and view all the answers

What types of central tendency measures can be utilized with ordinal data?

Mode and median can be used with ordinal data, but the mean cannot be calculated. Signup and view all the answers

How are ratio data characterized in terms of mathematical operations?

Ratio data can be added, subtracted, multiplied, or divided, and can calculate measures of central tendency like mean and standard deviation. Signup and view all the answers

Provide an example of quantitative data and explain why it is considered measurable.

An example of quantitative data is height, which can be measured in units like centimeters. Signup and view all the answers

What makes the difference between measuring temperature in interval data and ratio data?

In interval data, such as temperature in degrees Celsius, there is no true zero, so you cannot say one temperature is 'twice' another; in ratio data, such as weight, there is an absolute zero allowing such comparisons. Signup and view all the answers

How do outliers affect the mean in a dataset?

Outliers can drastically shift the mean, causing it to misrepresent the data. Signup and view all the answers

What is the weighted mean and why is it useful?

The weighted mean is an average where different outcomes have different probabilities, it is useful when outcomes do not contribute equally. Signup and view all the answers

Calculate the final weighted average for three exams with scores 80, 80, and 95 and weights 40%, 40%, and 20%.

The final weighted average is 83. Signup and view all the answers

What does a small deviation between mean and median suggest about a dataset?

It suggests that the dataset is less likely to have significant outliers. Signup and view all the answers

Define the mode in a dataset and provide an example.

The mode is the most common value in a dataset; for example, in 1, 1, 2, 2, 2, the mode is 2. Signup and view all the answers

What is the significance of missing values in an attribute like horsepower?

Missing values can lead to an incomplete analysis and affect overall statistical conclusions. Signup and view all the answers

How does examining measures of data spread contribute to understanding a dataset?

Examining data spread provides insights into variability and the distribution of data points. Signup and view all the answers

Why might certain attributes have significant deviations between mean and median?

Significant deviations often indicate the presence of outliers or skewed data distributions. Signup and view all the answers

What is a potential disadvantage of ignoring tuples with missing values in a dataset?

Ignoring tuples leads to the loss of potentially useful information from other attributes in those tuples. Signup and view all the answers

Why might manually filling in missing values be impractical?

It is time-consuming and may not be feasible for large datasets with many missing values. Signup and view all the answers

What is the risk of using a global constant to fill in missing values?

It may create a misleading concept in the data, as all missing values would share the same constant value. Signup and view all the answers

When should the mean be used to replace a missing value, and when should the median be favored?

The mean should be used for normal data distributions, while the median is better for skewed distributions. Signup and view all the answers

How can the mean or median be utilized for filling missing values within classes?

Missing values can be replaced with the mean or median of that class, such as mean income for a specific credit risk category. Signup and view all the answers

What methods can be used to determine the most probable value for filling in missing data?

Regression, inference-based tools using Bayesian methods, or decision tree induction can be employed. Signup and view all the answers

Define min-max normalization in the context of data scaling.

Min-max normalization scales data to a specified range, typically [0.0, 1.0], by transforming values from an original range. Signup and view all the answers

What is a potential drawback of using the most probable value approach for missing data?

It may oversimplify the data relationships and fail to account for variations within the dataset. Signup and view all the answers

What is the interquartile range if the first quartile is 64 and the third quartile is 77?

The interquartile range is 13. Signup and view all the answers

How do you calculate the first quartile when the sample size is odd, for example, the lower half consists of values 64 and 64?

The first quartile is the mean of the two middle values, calculated as (64+64)/2 = 64. Signup and view all the answers

What does a box plot visually represent in terms of data distribution?

A box plot visually shows the distribution of numerical data, including quartiles and averages. Signup and view all the answers

What marks the mid-point of a data set in a box plot?

The median marks the mid-point of the data and is represented by the line dividing the box. Signup and view all the answers

What is indicated by the lower whisker in a box plot?

The lower whisker indicates scores outside the middle 50%, specifically the lowest scores excluding outliers. Signup and view all the answers

In a dataset, if 75% of scores fall below a certain value, what is this value called?

This value is called the upper quartile or the third quartile. Signup and view all the answers

What is the five number summary in statistics?

The five number summary consists of the minimum, first quartile, median, third quartile, and maximum. Signup and view all the answers

How can you determine if there are outliers in a dataset based on a box plot?

Outliers are shown at the ends of the whiskers and are identified as points outside the whiskers. Signup and view all the answers

Flashcards

Nominal Data

Categorical data without a natural order or ranking.

Ordinal Data

Categorical data with a clear, ordered relationship among values.