BSAN 160 Exam Review Flashcards

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

Which of the following is not a linkage criterion used in clustering models?

Average linkage
Complete linkage
K-fold linkage (correct)
Single linkage

The input variables used in this model are not on the same scale, and this makes comparing the distance between students difficult. We need to convert the input variables to be on a similar scale by standardizing.

False (B)

Based on the distance values in Table 1, the first cluster in a hierarchical clustering model will include {Joe and Hannah}.

True (A)

Using the single linkage criteria, after creating the first cluster, the next step would be:

Add Sam to the cluster {Joe and Hannah} (C) Signup and view all the answers

Using the complete linkage criteria, after creating the first cluster, the next step would be:

Create a cluster that includes {Sam and Max} (B) Signup and view all the answers

When developing a data mining model, we split the original data into training data and testing data in order to evaluate the model performance in a dataset that was not used to develop the model.

True (A) Signup and view all the answers

In evaluating a two-class classification model, the accuracy is __________________.

the ratio of correctly classified positives and correctly classified negatives divided by the sum of all positive (true and false) and negative (true and false) counts. Signup and view all the answers

In ________, the complete data set is randomly split into mutually exclusive subsets and tested multiple times on each left-out subset, using the others as a training set.

k-fold cross-validation Signup and view all the answers

Perfect classification is represented by AUC = 0.5.

False (B) Signup and view all the answers

Based on the distance values in Table 1, the first cluster in a hierarchical clustering model will include:

Dennis and Ross (D) Signup and view all the answers

Using the single linkage criteria, after creating the first cluster, the next step would be:

Add Ben to the cluster {Dennis and Ross} (B) Signup and view all the answers

Using the complete linkage criteria, after creating the first cluster, the next step would be:

Both A and B (B) Signup and view all the answers

What is the false positive (FP) count?

20 Signup and view all the answers

How many mistakes (misclassifications) did the model make?

60 Signup and view all the answers

Based on the results shown in Table 2, the true positive rate is higher than the true negative rate.

False (B) Signup and view all the answers

In a clustering model with two numerical input variables used for clustering, if the input variables are not on the same scale standardizing is used to convert the variables and compare them on a single scale.

True (A) Signup and view all the answers

Due to potential risk of model overfitting, rather than using all available data we split the data into training and testing data, we use the training data for model development and evaluate model performance using the testing data.

True (A) Signup and view all the answers

In text mining, tokenizing is the process of _________________.

breaking a text into simple units, like sentences or words Signup and view all the answers

The Bag-of-Words method uses ____________ to extract feature from textual data.

Word frequencies in a text Signup and view all the answers

In text mining, what is a lexicon?

a catalog of words and scores (or categories) assigned to the words based on their meaning Signup and view all the answers

After removing the stop words, the bigram method creates the vector: [ 'enjoy', 'taking', 'walk', 'rain']

False (B) Signup and view all the answers

Web structure mining focuses on navigation through a website by analyzing the links in Web documents, and web content mining is related to extraction of information from the content of Web pages using text mining.

True (A) Signup and view all the answers

If Lumos wants to look at the worst possible outcome for each decision, which decision would you recommend?

Choose to stay in current location Signup and view all the answers

If Lumos wants to look at the best possible outcome for each decision, which decision would you recommend?

Choose to expand Signup and view all the answers

Which decision would you recommend if Lumos wants to use the Expected Monetary Value (EMV) and pick the decision with the largest EMV?

Choose to expand Signup and view all the answers

Which of the following is/are element(s) of decision models under uncertainty?

All of the above (A, B, and C) (D) Signup and view all the answers

Decision modeling is a __________ analytics method.

Prescriptive Signup and view all the answers

In decision modeling, using the Expected Monetary Value (EMV) criterion guarantees the best outcome.

False (B) Signup and view all the answers

If we look at the worst possible outcome for each decision alternative and choose the decision that has the best 'worst outcome', which decision alternative should we choose?

Decision alternative 2 Signup and view all the answers

If we look at the best possible outcome for each decision alternative and choose the decision that has the best 'best outcome', which decision alternative should we choose?

Decision alternative 1 Signup and view all the answers

Using the Expected Monetary Value (EMV) criterion, which decision alternative should we choose?

Indifferent between Decision alternative 1 and Decision alternative 2 Signup and view all the answers

A probability node of a decision tree for decision modeling represents __________________.

a time when the result of an uncertain outcome becomes known Signup and view all the answers

Which of the following statements is incorrect concerning sensitivity analysis in decision models?

Using sensitivity analysis, selected decision cannot change if we use the same decision criterion. (B) Signup and view all the answers

Optimization is a _________ analytics method.

prescriptive Signup and view all the answers

A feasible solution of a linear programming model is a solution that represents the values for all decision variables that satisfies all the constraints.

True (A) Signup and view all the answers

This is a linear programming model where the objective function is a maximization.

False (B) Signup and view all the answers

What are the decision variables in this model?

Number of snacks, drinks, sun protection items, and clothing items Signup and view all the answers

Which of the following is not a constraint of this model?

Minimum weight of all items combined in the backpack must be at least 1600 grams (A) Signup and view all the answers

In this linear programming model, taking 5 snacks, 6 water bottles, 2 sunscreen items, and 3 clothing items is a feasible solution.

False (B) Signup and view all the answers

The total weight of snacks in your backpack cannot exceed the total weight of water bottles. Which formulation below represents this new constraint?

60*x1 Signup and view all the answers

Decision support systems are computer-based support systems that integrate individuals' expertise and computer capabilities, and they have precise definitions agreed to by practitioners.

False (B) Signup and view all the answers

What is Business Intelligence (BI)?

An umbrella term that combines architectures, databases, analytical tools, applications, and methodologies. Signup and view all the answers

Data is a collection of observations, experiments, and experiences that do not necessarily represent absolute facts that are universally true.

True (A) Signup and view all the answers

What is Descriptive Analytics?

Descriptive Analytics helps managers understand current events in the organization including causes, trends, and patterns. Signup and view all the answers

What type of analytics seeks to recognize what is going on as well as the likely forecast and make decisions to achieve the best performance possible?

Prescriptive Analytics Signup and view all the answers

Which of the following is/are predictive analytics method(s)? (Select all that apply)

Regression analysis (A), Clustering (B), Text analysis (E) Signup and view all the answers

If a model is developed to forecast students at risk of dropping out after the first year of college, what kind of analytics application would this work represent?

Prescriptive Analytics Signup and view all the answers

Which chart type below would be most helpful to show the comparison between worldwide turnover rate compared with tech sector turnover rate?

Bar chart (C) Signup and view all the answers

Which chart type below would be most helpful to show the relative proportions of turnover rate of different categories within the tech sector?

Pie chart (B) Signup and view all the answers

Original (raw) data is usually collected from multiple data sources including various formats, and it is readily usable by analytics tools and algorithms.

False (B) Signup and view all the answers

During data transformation, numeric variables can be converted to categorical variables.

True (A) Signup and view all the answers

Data reduction can be applied to rows (observations) and/or columns (variables) in a given dataset.

True (A) Signup and view all the answers

In data preprocessing step to reduce the dimension of data prior to analysis, sampling the rows is more complex than selecting the columns (variables).

False (B) Signup and view all the answers

The choice of visualization method that meets the presentation requirements for a given data depends on the data types available, purpose of the visual, and context.

True (A) Signup and view all the answers

Which of the below is not a data preprocessing step?

Data separation (B) Signup and view all the answers

Which of the below is a method to deal with filling out the missing values in data?

Data imputation (D) Signup and view all the answers

Which of the below statement(s) is/are correct?

Normalizing values allows for comparison of variables on a single scale. (A) Signup and view all the answers

When analyzing the original data of household income, which of the following methods would be well-suited to prepare the data for descriptive analysis?

Fill in missing values with zeros. (A), Use the original dataset to avoid introducing additional noise. (B), Identify the outliers and replace them with the mean. (C), Identify the outliers and remove them. (D) Signup and view all the answers

Which of the below statement(s) is/are correct?

Visual analytics combines data visualization with different analytics methods. (A), Information dashboards provide interactive visual displays of information. (B) Signup and view all the answers

Which of the following data preprocessing activities fall under data transformation?

Convert numeric values into discrete categories. (A), Reduce the range of values using normalization. (B), Identify and replace extreme values. (E) Signup and view all the answers

What chart should Mia use to visualize the relative proportion of market share of Peloton in 2020 compared to competitors?

Pie chart Signup and view all the answers

What chart should Mia use to visualize the number of new members joining the Peloton community every month from 2012 to 2020?

Line chart Signup and view all the answers

Which data preprocessing activity is not associated with data cleaning?

Deriving a new variable representing total time of class material from existing variables Signup and view all the answers

What is the main purpose of imputation methods in data preprocessing?

Fill in missing values with the most appropriate values Signup and view all the answers

Linear regression models represent the mathematical relationship between dependent variables to explain or predict a binary independent variable.

False (B) Signup and view all the answers

Linear regression analysis can be used to predict an unknown value of a dependent variable using independent variables.

True (A) Signup and view all the answers

Comparing two regression models, which statement(s) is/are correct?

Model 1 captures 42% of the variation in the given data. (A), Model 2 describes 79% of the variation in given data. (B) Signup and view all the answers

Using the correlation between size and selling price, we can predict the selling price of a new house based on size.

False (B) Signup and view all the answers

If the correlation between size and selling price is 0.85, the slope coefficient associated with size in the regression equation would have a positive sign.

True (A) Signup and view all the answers

Assuming a regression model is developed to predict a student's final grade, this model is a multiple linear regression model.

True (A) Signup and view all the answers

What method would help to test the hypothesis about students with higher GPA percentiles having a higher SAT score?

Simple linear regression with high school percentile as the independent variable and SAT as the dependent variable. Signup and view all the answers

Which statement(s) is/are correct about fitting a regression line?

The intercept represents the SAT score at a GPA percentile of 0. (A), The slope will be positive. (B), The slope represents how much the SAT score changes with a 1% change in GPA percentile. (C), The intercept represents the GPA percentile at a SAT score of 0. (D) Signup and view all the answers

What type of regression model would best suit predicting the Combined score of a student?

Multiple Linear Signup and view all the answers

If Neal's SAT score is one point higher than Jimmy's, how much higher is Neal's predicted combined score?

0.06 higher than Jimmy's combined score Signup and view all the answers

What would be the best suited regression model to predict if a student will be retained in the second year?

Logistic Signup and view all the answers

What measure quantifies the ratio of retained students compared to those predicted to be retained?

Precision Signup and view all the answers

What is the measure that counts the number of times the model predicted a student's retention correctly?

Recall Signup and view all the answers

The relational data in a data warehouse are modified and analyzed using Online Analytical Processing (OLAP) tools.

True (A) Signup and view all the answers

What OLAP function allows users to access detailed data from summarized data?

Drill Down Signup and view all the answers

What OLAP function transforms data from rows into data grouped on several columns?

Pivot Signup and view all the answers

What concept is critical in developing a data warehouse due to data growth and complexity?

Scalability Signup and view all the answers

What does it mean that a data warehouse is non-volatile?

After data is entered, previous data is not erased when new data is added. Signup and view all the answers

Classification learns to place new instances into their respective groups based on labeled items.

True (A) Signup and view all the answers

Finding an affinity of two products to be commonly purchased is known as what?

Association rule mining Signup and view all the answers

In Association Rule Mining, confidence is a metric representing the probability of observing items A and B together.

False (B) Signup and view all the answers

What is the support for diapers and beer being purchased together?

60% Signup and view all the answers

What is the confidence for milk and juice?

50% Signup and view all the answers

Which data mining method would best suit predicting the length of night sleep based on variables like day of the week?

Linear Regression Signup and view all the answers

Which data mining method would be best suited to predict the length of night sleep based on the past 60 days?

Time Series Signup and view all the answers

Which method would best identify which days are similar regarding length of mid-day nap and night sleep?

Clustering Signup and view all the answers

Which method is best suited to find frequently observed days and night sleep categories together?

Association Rule Analysis Signup and view all the answers

Which method is best suited to predict the night sleep category using various input variables?

Logistic Regression Signup and view all the answers

Which of the following is a segmentation model that classifies items in a dataset?

K-means Clustering (C) Signup and view all the answers

Flashcards are hidden until you start studying

Study Notes

Decision Support Systems & Business Intelligence

Decision support systems integrate human expertise and computer capabilities but lack precise definitions agreed upon by practitioners.
Business Intelligence (BI) encompasses architectures, databases, analytical tools, applications, and methodologies.

Data Characteristics

Data consists of observations and experiences, not always absolute facts.
Descriptive analytics clarify current organizational events, revealing causes, trends, and patterns.

Types of Analytics

Prescriptive analytics aim to forecast outcomes and guide decision-making for optimal performance.
Predictive analytics methods include regression analysis, clustering, and text analysis.

Data Visualization Techniques

Bar charts effectively compare categories like turnover rates across sectors.
Pie charts illustrate proportions of different categories within a sector.

Data Preprocessing

Original data is often unstructured and needs preprocessing for analytical tools.
Data transformation may involve rescaling and converting data types.
Data reduction can focus on rows (observations) and columns (variables).
Imputation methods are used to fill missing data.

Regression Analysis

Multiple linear regression models analyze relationships between dependent and independent variables.
Logistic regression is suited for binary outcomes in data predictions.
R-squared values measure model fit; higher values indicate better explanation of data variance.

OLAP Functions

OLAP tools like drilling down and pivoting enhance data interrogation and insight extraction.
Scalability is a critical consideration for data warehouse development to manage data growth and query complexity.

Data Mining Techniques

Classification categorizes data based on labeled training sets.
Association rule mining identifies product affinities.
Linear regression predicts continuous outcomes, while logistic regression is used for binary classifications.

Clustering & Hierarchical Models

Hierarchical clustering organizes items based on pairwise distances until all observations are linked.
Linkage criteria in clustering (e.g., single and complete) influence model outcome.

Performance Metrics

In classification models, precision measures the accuracy of positive predictions, while recall assesses the correct identification of positive instances.
Model performance is evaluated using testing datasets separate from training data to ensure unbiased accuracy measurement.### General Definitions
False Positive (FP): Refers to the incorrectly classified positive instances in a confusion matrix. In an example provided, the FP count is 20.
Misclassification Count: Total number of misclassifications made by a model. For a specified case, it totals 60.

K-fold Cross-Validation

Definition: Involves splitting a complete dataset into mutually exclusive subsets, testing multiple times on each left-out subset while using the remaining data for training.

Confusion Matrix Insights

True Positive Rate vs. True Negative Rate: It’s stated that the true positive rate is lower than the true negative rate, which is marked as false.

Clustering Models

Hierarchical Clustering: In a sample with customers, the first cluster includes Dennis and Ross based on their spending patterns.
Single Linkage Criteria: After forming the first cluster, the next step would be to add Ben to the cluster of Dennis and Ross.
Complete Linkage Criteria: Following the initial formation, the next action is to create a cluster with Kristen and Ben or add Kristen to Dennis and Ross’ cluster.

Text Mining Concepts

Tokenizing: The process involves breaking text into simple components, such as sentences or words.
Bag-of-Words Method: Utilizes word frequencies to extract features from textual data.
Lexicon: A repository of words paired with scores or categories reflecting their meanings.

Decision Models

Expected Monetary Value (EMV): A method to estimate the net profit for decisions made under uncertainty. Calculated EMVs show staying at the current location results in an EMV of 100,000, while expanding yields 120,000.
Decision Making Criteria:
- Worst Outcome: Choosing to stay in the current location provides the best bad outcome.
- Best Outcome: Opting for expansion presents the most favorable outcome.
- Weighted Average: The largest EMV suggests that expanding is the optimal choice.

Constraints in Linear Programming

Feasible Solutions: These represent values for decision variables that adhere to all constraints, confirmed as true for a sample linear programming model.
Decision Variables: Include quantities of snacks, drinks, sun protection items, and clothing.
Constraints Identification:
- An invalid constraint example is the minimum weight of all items needing to be at least 1600 grams, as maximum weights should also be considered.

Weight Management in Constraints

Additional Constraint Example: Stipulating that the total weight of snacks must not exceed that of water bottles introduced a new constraint defined by (60*x_1), where (x_1) represents the number of snacks.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

BSAN 160 Exam Review Flashcards

Choose a study mode

Podcast

Questions and Answers

Which of the following is not a linkage criterion used in clustering models?

The input variables used in this model are not on the same scale, and this makes comparing the distance between students difficult. We need to convert the input variables to be on a similar scale by standardizing.

Based on the distance values in Table 1, the first cluster in a hierarchical clustering model will include {Joe and Hannah}.

Using the single linkage criteria, after creating the first cluster, the next step would be:

Using the complete linkage criteria, after creating the first cluster, the next step would be:

When developing a data mining model, we split the original data into training data and testing data in order to evaluate the model performance in a dataset that was not used to develop the model.

In evaluating a two-class classification model, the accuracy is __________________.

In ________, the complete data set is randomly split into mutually exclusive subsets and tested multiple times on each left-out subset, using the others as a training set.

Perfect classification is represented by AUC = 0.5.

Based on the distance values in Table 1, the first cluster in a hierarchical clustering model will include:

Using the single linkage criteria, after creating the first cluster, the next step would be:

Using the complete linkage criteria, after creating the first cluster, the next step would be:

What is the false positive (FP) count?

How many mistakes (misclassifications) did the model make?

Based on the results shown in Table 2, the true positive rate is higher than the true negative rate.

In a clustering model with two numerical input variables used for clustering, if the input variables are not on the same scale standardizing is used to convert the variables and compare them on a single scale.

Due to potential risk of model overfitting, rather than using all available data we split the data into training and testing data, we use the training data for model development and evaluate model performance using the testing data.

In text mining, tokenizing is the process of _________________.

The Bag-of-Words method uses ____________ to extract feature from textual data.

In text mining, what is a lexicon?

After removing the stop words, the bigram method creates the vector: [ 'enjoy', 'taking', 'walk', 'rain']

Web structure mining focuses on navigation through a website by analyzing the links in Web documents, and web content mining is related to extraction of information from the content of Web pages using text mining.

If Lumos wants to look at the worst possible outcome for each decision, which decision would you recommend?

If Lumos wants to look at the best possible outcome for each decision, which decision would you recommend?

Which decision would you recommend if Lumos wants to use the Expected Monetary Value (EMV) and pick the decision with the largest EMV?

Which of the following is/are element(s) of decision models under uncertainty?

Decision modeling is a __________ analytics method.

In decision modeling, using the Expected Monetary Value (EMV) criterion guarantees the best outcome.

If we look at the worst possible outcome for each decision alternative and choose the decision that has the best 'worst outcome', which decision alternative should we choose?

If we look at the best possible outcome for each decision alternative and choose the decision that has the best 'best outcome', which decision alternative should we choose?

Using the Expected Monetary Value (EMV) criterion, which decision alternative should we choose?

A probability node of a decision tree for decision modeling represents __________________.

Which of the following statements is incorrect concerning sensitivity analysis in decision models?

Optimization is a _________ analytics method.

A feasible solution of a linear programming model is a solution that represents the values for all decision variables that satisfies all the constraints.

This is a linear programming model where the objective function is a maximization.

What are the decision variables in this model?

Which of the following is not a constraint of this model?

In this linear programming model, taking 5 snacks, 6 water bottles, 2 sunscreen items, and 3 clothing items is a feasible solution.

The total weight of snacks in your backpack cannot exceed the total weight of water bottles. Which formulation below represents this new constraint?

Decision support systems are computer-based support systems that integrate individuals' expertise and computer capabilities, and they have precise definitions agreed to by practitioners.

What is Business Intelligence (BI)?

Data is a collection of observations, experiments, and experiences that do not necessarily represent absolute facts that are universally true.

What is Descriptive Analytics?

What type of analytics seeks to recognize what is going on as well as the likely forecast and make decisions to achieve the best performance possible?

Which of the following is/are predictive analytics method(s)? (Select all that apply)

If a model is developed to forecast students at risk of dropping out after the first year of college, what kind of analytics application would this work represent?

Which chart type below would be most helpful to show the comparison between worldwide turnover rate compared with tech sector turnover rate?

Which chart type below would be most helpful to show the relative proportions of turnover rate of different categories within the tech sector?

Original (raw) data is usually collected from multiple data sources including various formats, and it is readily usable by analytics tools and algorithms.

During data transformation, numeric variables can be converted to categorical variables.

Data reduction can be applied to rows (observations) and/or columns (variables) in a given dataset.

In data preprocessing step to reduce the dimension of data prior to analysis, sampling the rows is more complex than selecting the columns (variables).

The choice of visualization method that meets the presentation requirements for a given data depends on the data types available, purpose of the visual, and context.

Which of the below is not a data preprocessing step?

Which of the below is a method to deal with filling out the missing values in data?

Which of the below statement(s) is/are correct?

When analyzing the original data of household income, which of the following methods would be well-suited to prepare the data for descriptive analysis?

Which of the below statement(s) is/are correct?

Which of the following data preprocessing activities fall under data transformation?

What chart should Mia use to visualize the relative proportion of market share of Peloton in 2020 compared to competitors?

What chart should Mia use to visualize the number of new members joining the Peloton community every month from 2012 to 2020?

Which data preprocessing activity is not associated with data cleaning?

What is the main purpose of imputation methods in data preprocessing?

Linear regression models represent the mathematical relationship between dependent variables to explain or predict a binary independent variable.

Linear regression analysis can be used to predict an unknown value of a dependent variable using independent variables.

Comparing two regression models, which statement(s) is/are correct?

Using the correlation between size and selling price, we can predict the selling price of a new house based on size.

If the correlation between size and selling price is 0.85, the slope coefficient associated with size in the regression equation would have a positive sign.

Assuming a regression model is developed to predict a student's final grade, this model is a multiple linear regression model.

What method would help to test the hypothesis about students with higher GPA percentiles having a higher SAT score?

Which statement(s) is/are correct about fitting a regression line?

What type of regression model would best suit predicting the Combined score of a student?

If Neal's SAT score is one point higher than Jimmy's, how much higher is Neal's predicted combined score?

What would be the best suited regression model to predict if a student will be retained in the second year?

What measure quantifies the ratio of retained students compared to those predicted to be retained?