Podcast
Questions and Answers
Which of the following is not a linkage criterion used in clustering models?
Which of the following is not a linkage criterion used in clustering models?
The input variables used in this model are not on the same scale, and this makes comparing the distance between students difficult. We need to convert the input variables to be on a similar scale by standardizing.
The input variables used in this model are not on the same scale, and this makes comparing the distance between students difficult. We need to convert the input variables to be on a similar scale by standardizing.
False
Based on the distance values in Table 1, the first cluster in a hierarchical clustering model will include {Joe and Hannah}.
Based on the distance values in Table 1, the first cluster in a hierarchical clustering model will include {Joe and Hannah}.
True
Using the single linkage criteria, after creating the first cluster, the next step would be:
Using the single linkage criteria, after creating the first cluster, the next step would be:
Signup and view all the answers
Using the complete linkage criteria, after creating the first cluster, the next step would be:
Using the complete linkage criteria, after creating the first cluster, the next step would be:
Signup and view all the answers
When developing a data mining model, we split the original data into training data and testing data in order to evaluate the model performance in a dataset that was not used to develop the model.
When developing a data mining model, we split the original data into training data and testing data in order to evaluate the model performance in a dataset that was not used to develop the model.
Signup and view all the answers
In evaluating a two-class classification model, the accuracy is __________________.
In evaluating a two-class classification model, the accuracy is __________________.
Signup and view all the answers
In ________, the complete data set is randomly split into mutually exclusive subsets and tested multiple times on each left-out subset, using the others as a training set.
In ________, the complete data set is randomly split into mutually exclusive subsets and tested multiple times on each left-out subset, using the others as a training set.
Signup and view all the answers
Perfect classification is represented by AUC = 0.5.
Perfect classification is represented by AUC = 0.5.
Signup and view all the answers
Based on the distance values in Table 1, the first cluster in a hierarchical clustering model will include:
Based on the distance values in Table 1, the first cluster in a hierarchical clustering model will include:
Signup and view all the answers
Using the single linkage criteria, after creating the first cluster, the next step would be:
Using the single linkage criteria, after creating the first cluster, the next step would be:
Signup and view all the answers
Using the complete linkage criteria, after creating the first cluster, the next step would be:
Using the complete linkage criteria, after creating the first cluster, the next step would be:
Signup and view all the answers
What is the false positive (FP) count?
What is the false positive (FP) count?
Signup and view all the answers
How many mistakes (misclassifications) did the model make?
How many mistakes (misclassifications) did the model make?
Signup and view all the answers
Based on the results shown in Table 2, the true positive rate is higher than the true negative rate.
Based on the results shown in Table 2, the true positive rate is higher than the true negative rate.
Signup and view all the answers
In a clustering model with two numerical input variables used for clustering, if the input variables are not on the same scale standardizing is used to convert the variables and compare them on a single scale.
In a clustering model with two numerical input variables used for clustering, if the input variables are not on the same scale standardizing is used to convert the variables and compare them on a single scale.
Signup and view all the answers
Due to potential risk of model overfitting, rather than using all available data we split the data into training and testing data, we use the training data for model development and evaluate model performance using the testing data.
Due to potential risk of model overfitting, rather than using all available data we split the data into training and testing data, we use the training data for model development and evaluate model performance using the testing data.
Signup and view all the answers
In text mining, tokenizing is the process of _________________.
In text mining, tokenizing is the process of _________________.
Signup and view all the answers
The Bag-of-Words method uses ____________ to extract feature from textual data.
The Bag-of-Words method uses ____________ to extract feature from textual data.
Signup and view all the answers
In text mining, what is a lexicon?
In text mining, what is a lexicon?
Signup and view all the answers
After removing the stop words, the bigram method creates the vector: [ 'enjoy', 'taking', 'walk', 'rain']
After removing the stop words, the bigram method creates the vector: [ 'enjoy', 'taking', 'walk', 'rain']
Signup and view all the answers
Web structure mining focuses on navigation through a website by analyzing the links in Web documents, and web content mining is related to extraction of information from the content of Web pages using text mining.
Web structure mining focuses on navigation through a website by analyzing the links in Web documents, and web content mining is related to extraction of information from the content of Web pages using text mining.
Signup and view all the answers
If Lumos wants to look at the worst possible outcome for each decision, which decision would you recommend?
If Lumos wants to look at the worst possible outcome for each decision, which decision would you recommend?
Signup and view all the answers
If Lumos wants to look at the best possible outcome for each decision, which decision would you recommend?
If Lumos wants to look at the best possible outcome for each decision, which decision would you recommend?
Signup and view all the answers
Which decision would you recommend if Lumos wants to use the Expected Monetary Value (EMV) and pick the decision with the largest EMV?
Which decision would you recommend if Lumos wants to use the Expected Monetary Value (EMV) and pick the decision with the largest EMV?
Signup and view all the answers
Which of the following is/are element(s) of decision models under uncertainty?
Which of the following is/are element(s) of decision models under uncertainty?
Signup and view all the answers
Decision modeling is a __________ analytics method.
Decision modeling is a __________ analytics method.
Signup and view all the answers
In decision modeling, using the Expected Monetary Value (EMV) criterion guarantees the best outcome.
In decision modeling, using the Expected Monetary Value (EMV) criterion guarantees the best outcome.
Signup and view all the answers
If we look at the worst possible outcome for each decision alternative and choose the decision that has the best 'worst outcome', which decision alternative should we choose?
If we look at the worst possible outcome for each decision alternative and choose the decision that has the best 'worst outcome', which decision alternative should we choose?
Signup and view all the answers
If we look at the best possible outcome for each decision alternative and choose the decision that has the best 'best outcome', which decision alternative should we choose?
If we look at the best possible outcome for each decision alternative and choose the decision that has the best 'best outcome', which decision alternative should we choose?
Signup and view all the answers
Using the Expected Monetary Value (EMV) criterion, which decision alternative should we choose?
Using the Expected Monetary Value (EMV) criterion, which decision alternative should we choose?
Signup and view all the answers
A probability node of a decision tree for decision modeling represents __________________.
A probability node of a decision tree for decision modeling represents __________________.
Signup and view all the answers
Which of the following statements is incorrect concerning sensitivity analysis in decision models?
Which of the following statements is incorrect concerning sensitivity analysis in decision models?
Signup and view all the answers
Optimization is a _________ analytics method.
Optimization is a _________ analytics method.
Signup and view all the answers
A feasible solution of a linear programming model is a solution that represents the values for all decision variables that satisfies all the constraints.
A feasible solution of a linear programming model is a solution that represents the values for all decision variables that satisfies all the constraints.
Signup and view all the answers
This is a linear programming model where the objective function is a maximization.
This is a linear programming model where the objective function is a maximization.
Signup and view all the answers
What are the decision variables in this model?
What are the decision variables in this model?
Signup and view all the answers
Which of the following is not a constraint of this model?
Which of the following is not a constraint of this model?
Signup and view all the answers
In this linear programming model, taking 5 snacks, 6 water bottles, 2 sunscreen items, and 3 clothing items is a feasible solution.
In this linear programming model, taking 5 snacks, 6 water bottles, 2 sunscreen items, and 3 clothing items is a feasible solution.
Signup and view all the answers
The total weight of snacks in your backpack cannot exceed the total weight of water bottles. Which formulation below represents this new constraint?
The total weight of snacks in your backpack cannot exceed the total weight of water bottles. Which formulation below represents this new constraint?
Signup and view all the answers
Decision support systems are computer-based support systems that integrate individuals' expertise and computer capabilities, and they have precise definitions agreed to by practitioners.
Decision support systems are computer-based support systems that integrate individuals' expertise and computer capabilities, and they have precise definitions agreed to by practitioners.
Signup and view all the answers
What is Business Intelligence (BI)?
What is Business Intelligence (BI)?
Signup and view all the answers
Data is a collection of observations, experiments, and experiences that do not necessarily represent absolute facts that are universally true.
Data is a collection of observations, experiments, and experiences that do not necessarily represent absolute facts that are universally true.
Signup and view all the answers
What is Descriptive Analytics?
What is Descriptive Analytics?
Signup and view all the answers
What type of analytics seeks to recognize what is going on as well as the likely forecast and make decisions to achieve the best performance possible?
What type of analytics seeks to recognize what is going on as well as the likely forecast and make decisions to achieve the best performance possible?
Signup and view all the answers
Which of the following is/are predictive analytics method(s)? (Select all that apply)
Which of the following is/are predictive analytics method(s)? (Select all that apply)
Signup and view all the answers
If a model is developed to forecast students at risk of dropping out after the first year of college, what kind of analytics application would this work represent?
If a model is developed to forecast students at risk of dropping out after the first year of college, what kind of analytics application would this work represent?
Signup and view all the answers
Which chart type below would be most helpful to show the comparison between worldwide turnover rate compared with tech sector turnover rate?
Which chart type below would be most helpful to show the comparison between worldwide turnover rate compared with tech sector turnover rate?
Signup and view all the answers
Which chart type below would be most helpful to show the relative proportions of turnover rate of different categories within the tech sector?
Which chart type below would be most helpful to show the relative proportions of turnover rate of different categories within the tech sector?
Signup and view all the answers
Original (raw) data is usually collected from multiple data sources including various formats, and it is readily usable by analytics tools and algorithms.
Original (raw) data is usually collected from multiple data sources including various formats, and it is readily usable by analytics tools and algorithms.
Signup and view all the answers
During data transformation, numeric variables can be converted to categorical variables.
During data transformation, numeric variables can be converted to categorical variables.
Signup and view all the answers
Data reduction can be applied to rows (observations) and/or columns (variables) in a given dataset.
Data reduction can be applied to rows (observations) and/or columns (variables) in a given dataset.
Signup and view all the answers
In data preprocessing step to reduce the dimension of data prior to analysis, sampling the rows is more complex than selecting the columns (variables).
In data preprocessing step to reduce the dimension of data prior to analysis, sampling the rows is more complex than selecting the columns (variables).
Signup and view all the answers
The choice of visualization method that meets the presentation requirements for a given data depends on the data types available, purpose of the visual, and context.
The choice of visualization method that meets the presentation requirements for a given data depends on the data types available, purpose of the visual, and context.
Signup and view all the answers
Which of the below is not a data preprocessing step?
Which of the below is not a data preprocessing step?
Signup and view all the answers
Which of the below is a method to deal with filling out the missing values in data?
Which of the below is a method to deal with filling out the missing values in data?
Signup and view all the answers
Which of the below statement(s) is/are correct?
Which of the below statement(s) is/are correct?
Signup and view all the answers
When analyzing the original data of household income, which of the following methods would be well-suited to prepare the data for descriptive analysis?
When analyzing the original data of household income, which of the following methods would be well-suited to prepare the data for descriptive analysis?
Signup and view all the answers
Which of the below statement(s) is/are correct?
Which of the below statement(s) is/are correct?
Signup and view all the answers
Which of the following data preprocessing activities fall under data transformation?
Which of the following data preprocessing activities fall under data transformation?
Signup and view all the answers
What chart should Mia use to visualize the relative proportion of market share of Peloton in 2020 compared to competitors?
What chart should Mia use to visualize the relative proportion of market share of Peloton in 2020 compared to competitors?
Signup and view all the answers
What chart should Mia use to visualize the number of new members joining the Peloton community every month from 2012 to 2020?
What chart should Mia use to visualize the number of new members joining the Peloton community every month from 2012 to 2020?
Signup and view all the answers
Which data preprocessing activity is not associated with data cleaning?
Which data preprocessing activity is not associated with data cleaning?
Signup and view all the answers
What is the main purpose of imputation methods in data preprocessing?
What is the main purpose of imputation methods in data preprocessing?
Signup and view all the answers
Linear regression models represent the mathematical relationship between dependent variables to explain or predict a binary independent variable.
Linear regression models represent the mathematical relationship between dependent variables to explain or predict a binary independent variable.
Signup and view all the answers
Linear regression analysis can be used to predict an unknown value of a dependent variable using independent variables.
Linear regression analysis can be used to predict an unknown value of a dependent variable using independent variables.
Signup and view all the answers
Comparing two regression models, which statement(s) is/are correct?
Comparing two regression models, which statement(s) is/are correct?
Signup and view all the answers
Using the correlation between size and selling price, we can predict the selling price of a new house based on size.
Using the correlation between size and selling price, we can predict the selling price of a new house based on size.
Signup and view all the answers
If the correlation between size and selling price is 0.85, the slope coefficient associated with size in the regression equation would have a positive sign.
If the correlation between size and selling price is 0.85, the slope coefficient associated with size in the regression equation would have a positive sign.
Signup and view all the answers
Assuming a regression model is developed to predict a student's final grade, this model is a multiple linear regression model.
Assuming a regression model is developed to predict a student's final grade, this model is a multiple linear regression model.
Signup and view all the answers
What method would help to test the hypothesis about students with higher GPA percentiles having a higher SAT score?
What method would help to test the hypothesis about students with higher GPA percentiles having a higher SAT score?
Signup and view all the answers
Which statement(s) is/are correct about fitting a regression line?
Which statement(s) is/are correct about fitting a regression line?
Signup and view all the answers
What type of regression model would best suit predicting the Combined score of a student?
What type of regression model would best suit predicting the Combined score of a student?
Signup and view all the answers
If Neal's SAT score is one point higher than Jimmy's, how much higher is Neal's predicted combined score?
If Neal's SAT score is one point higher than Jimmy's, how much higher is Neal's predicted combined score?
Signup and view all the answers
What would be the best suited regression model to predict if a student will be retained in the second year?
What would be the best suited regression model to predict if a student will be retained in the second year?
Signup and view all the answers
What measure quantifies the ratio of retained students compared to those predicted to be retained?
What measure quantifies the ratio of retained students compared to those predicted to be retained?
Signup and view all the answers
What is the measure that counts the number of times the model predicted a student's retention correctly?
What is the measure that counts the number of times the model predicted a student's retention correctly?
Signup and view all the answers
The relational data in a data warehouse are modified and analyzed using Online Analytical Processing (OLAP) tools.
The relational data in a data warehouse are modified and analyzed using Online Analytical Processing (OLAP) tools.
Signup and view all the answers
What OLAP function allows users to access detailed data from summarized data?
What OLAP function allows users to access detailed data from summarized data?
Signup and view all the answers
What OLAP function transforms data from rows into data grouped on several columns?
What OLAP function transforms data from rows into data grouped on several columns?
Signup and view all the answers
What concept is critical in developing a data warehouse due to data growth and complexity?
What concept is critical in developing a data warehouse due to data growth and complexity?
Signup and view all the answers
What does it mean that a data warehouse is non-volatile?
What does it mean that a data warehouse is non-volatile?
Signup and view all the answers
Classification learns to place new instances into their respective groups based on labeled items.
Classification learns to place new instances into their respective groups based on labeled items.
Signup and view all the answers
Finding an affinity of two products to be commonly purchased is known as what?
Finding an affinity of two products to be commonly purchased is known as what?
Signup and view all the answers
In Association Rule Mining, confidence is a metric representing the probability of observing items A and B together.
In Association Rule Mining, confidence is a metric representing the probability of observing items A and B together.
Signup and view all the answers
What is the support for diapers and beer being purchased together?
What is the support for diapers and beer being purchased together?
Signup and view all the answers
What is the confidence for milk and juice?
What is the confidence for milk and juice?
Signup and view all the answers
Which data mining method would best suit predicting the length of night sleep based on variables like day of the week?
Which data mining method would best suit predicting the length of night sleep based on variables like day of the week?
Signup and view all the answers
Which data mining method would be best suited to predict the length of night sleep based on the past 60 days?
Which data mining method would be best suited to predict the length of night sleep based on the past 60 days?
Signup and view all the answers
Which method would best identify which days are similar regarding length of mid-day nap and night sleep?
Which method would best identify which days are similar regarding length of mid-day nap and night sleep?
Signup and view all the answers
Which method is best suited to find frequently observed days and night sleep categories together?
Which method is best suited to find frequently observed days and night sleep categories together?
Signup and view all the answers
Which method is best suited to predict the night sleep category using various input variables?
Which method is best suited to predict the night sleep category using various input variables?
Signup and view all the answers
Which of the following is a segmentation model that classifies items in a dataset?
Which of the following is a segmentation model that classifies items in a dataset?
Signup and view all the answers
Study Notes
Decision Support Systems & Business Intelligence
- Decision support systems integrate human expertise and computer capabilities but lack precise definitions agreed upon by practitioners.
- Business Intelligence (BI) encompasses architectures, databases, analytical tools, applications, and methodologies.
Data Characteristics
- Data consists of observations and experiences, not always absolute facts.
- Descriptive analytics clarify current organizational events, revealing causes, trends, and patterns.
Types of Analytics
- Prescriptive analytics aim to forecast outcomes and guide decision-making for optimal performance.
- Predictive analytics methods include regression analysis, clustering, and text analysis.
Data Visualization Techniques
- Bar charts effectively compare categories like turnover rates across sectors.
- Pie charts illustrate proportions of different categories within a sector.
Data Preprocessing
- Original data is often unstructured and needs preprocessing for analytical tools.
- Data transformation may involve rescaling and converting data types.
- Data reduction can focus on rows (observations) and columns (variables).
- Imputation methods are used to fill missing data.
Regression Analysis
- Multiple linear regression models analyze relationships between dependent and independent variables.
- Logistic regression is suited for binary outcomes in data predictions.
- R-squared values measure model fit; higher values indicate better explanation of data variance.
OLAP Functions
- OLAP tools like drilling down and pivoting enhance data interrogation and insight extraction.
- Scalability is a critical consideration for data warehouse development to manage data growth and query complexity.
Data Mining Techniques
- Classification categorizes data based on labeled training sets.
- Association rule mining identifies product affinities.
- Linear regression predicts continuous outcomes, while logistic regression is used for binary classifications.
Clustering & Hierarchical Models
- Hierarchical clustering organizes items based on pairwise distances until all observations are linked.
- Linkage criteria in clustering (e.g., single and complete) influence model outcome.
Performance Metrics
- In classification models, precision measures the accuracy of positive predictions, while recall assesses the correct identification of positive instances.
- Model performance is evaluated using testing datasets separate from training data to ensure unbiased accuracy measurement.### General Definitions
- False Positive (FP): Refers to the incorrectly classified positive instances in a confusion matrix. In an example provided, the FP count is 20.
- Misclassification Count: Total number of misclassifications made by a model. For a specified case, it totals 60.
K-fold Cross-Validation
- Definition: Involves splitting a complete dataset into mutually exclusive subsets, testing multiple times on each left-out subset while using the remaining data for training.
Confusion Matrix Insights
- True Positive Rate vs. True Negative Rate: It’s stated that the true positive rate is lower than the true negative rate, which is marked as false.
Clustering Models
- Hierarchical Clustering: In a sample with customers, the first cluster includes Dennis and Ross based on their spending patterns.
- Single Linkage Criteria: After forming the first cluster, the next step would be to add Ben to the cluster of Dennis and Ross.
- Complete Linkage Criteria: Following the initial formation, the next action is to create a cluster with Kristen and Ben or add Kristen to Dennis and Ross’ cluster.
Text Mining Concepts
- Tokenizing: The process involves breaking text into simple components, such as sentences or words.
- Bag-of-Words Method: Utilizes word frequencies to extract features from textual data.
- Lexicon: A repository of words paired with scores or categories reflecting their meanings.
Decision Models
- Expected Monetary Value (EMV): A method to estimate the net profit for decisions made under uncertainty. Calculated EMVs show staying at the current location results in an EMV of 100,000, while expanding yields 120,000.
-
Decision Making Criteria:
- Worst Outcome: Choosing to stay in the current location provides the best bad outcome.
- Best Outcome: Opting for expansion presents the most favorable outcome.
- Weighted Average: The largest EMV suggests that expanding is the optimal choice.
Constraints in Linear Programming
- Feasible Solutions: These represent values for decision variables that adhere to all constraints, confirmed as true for a sample linear programming model.
- Decision Variables: Include quantities of snacks, drinks, sun protection items, and clothing.
-
Constraints Identification:
- An invalid constraint example is the minimum weight of all items needing to be at least 1600 grams, as maximum weights should also be considered.
Weight Management in Constraints
- Additional Constraint Example: Stipulating that the total weight of snacks must not exceed that of water bottles introduced a new constraint defined by (60*x_1), where (x_1) represents the number of snacks.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
Prepare for your BSAN 160 exam with these flashcards covering key concepts like Decision Support Systems and Business Intelligence. Each card tests your understanding and retention of crucial terms and definitions. Perfect for quick revisions before the exam!