BSAN 160 Exam Review Flashcards
93 Questions
100 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which of the following is not a linkage criterion used in clustering models?

  • Average linkage
  • Complete linkage
  • K-fold linkage (correct)
  • Single linkage
  • The input variables used in this model are not on the same scale, and this makes comparing the distance between students difficult. We need to convert the input variables to be on a similar scale by standardizing.

    False

    Based on the distance values in Table 1, the first cluster in a hierarchical clustering model will include {Joe and Hannah}.

    True

    Using the single linkage criteria, after creating the first cluster, the next step would be:

    <p>Add Sam to the cluster {Joe and Hannah}</p> Signup and view all the answers

    Using the complete linkage criteria, after creating the first cluster, the next step would be:

    <p>Create a cluster that includes {Sam and Max}</p> Signup and view all the answers

    When developing a data mining model, we split the original data into training data and testing data in order to evaluate the model performance in a dataset that was not used to develop the model.

    <p>True</p> Signup and view all the answers

    In evaluating a two-class classification model, the accuracy is __________________.

    <p>the ratio of correctly classified positives and correctly classified negatives divided by the sum of all positive (true and false) and negative (true and false) counts.</p> Signup and view all the answers

    In ________, the complete data set is randomly split into mutually exclusive subsets and tested multiple times on each left-out subset, using the others as a training set.

    <p>k-fold cross-validation</p> Signup and view all the answers

    Perfect classification is represented by AUC = 0.5.

    <p>False</p> Signup and view all the answers

    Based on the distance values in Table 1, the first cluster in a hierarchical clustering model will include:

    <p>Dennis and Ross</p> Signup and view all the answers

    Using the single linkage criteria, after creating the first cluster, the next step would be:

    <p>Add Ben to the cluster {Dennis and Ross}</p> Signup and view all the answers

    Using the complete linkage criteria, after creating the first cluster, the next step would be:

    <p>Both A and B</p> Signup and view all the answers

    What is the false positive (FP) count?

    <p>20</p> Signup and view all the answers

    How many mistakes (misclassifications) did the model make?

    <p>60</p> Signup and view all the answers

    Based on the results shown in Table 2, the true positive rate is higher than the true negative rate.

    <p>False</p> Signup and view all the answers

    In a clustering model with two numerical input variables used for clustering, if the input variables are not on the same scale standardizing is used to convert the variables and compare them on a single scale.

    <p>True</p> Signup and view all the answers

    Due to potential risk of model overfitting, rather than using all available data we split the data into training and testing data, we use the training data for model development and evaluate model performance using the testing data.

    <p>True</p> Signup and view all the answers

    In text mining, tokenizing is the process of _________________.

    <p>breaking a text into simple units, like sentences or words</p> Signup and view all the answers

    The Bag-of-Words method uses ____________ to extract feature from textual data.

    <p>Word frequencies in a text</p> Signup and view all the answers

    In text mining, what is a lexicon?

    <p>a catalog of words and scores (or categories) assigned to the words based on their meaning</p> Signup and view all the answers

    After removing the stop words, the bigram method creates the vector: [ 'enjoy', 'taking', 'walk', 'rain']

    <p>False</p> Signup and view all the answers

    Web structure mining focuses on navigation through a website by analyzing the links in Web documents, and web content mining is related to extraction of information from the content of Web pages using text mining.

    <p>True</p> Signup and view all the answers

    If Lumos wants to look at the worst possible outcome for each decision, which decision would you recommend?

    <p>Choose to stay in current location</p> Signup and view all the answers

    If Lumos wants to look at the best possible outcome for each decision, which decision would you recommend?

    <p>Choose to expand</p> Signup and view all the answers

    Which decision would you recommend if Lumos wants to use the Expected Monetary Value (EMV) and pick the decision with the largest EMV?

    <p>Choose to expand</p> Signup and view all the answers

    Which of the following is/are element(s) of decision models under uncertainty?

    <p>All of the above (A, B, and C)</p> Signup and view all the answers

    Decision modeling is a __________ analytics method.

    <p>Prescriptive</p> Signup and view all the answers

    In decision modeling, using the Expected Monetary Value (EMV) criterion guarantees the best outcome.

    <p>False</p> Signup and view all the answers

    If we look at the worst possible outcome for each decision alternative and choose the decision that has the best 'worst outcome', which decision alternative should we choose?

    <p>Decision alternative 2</p> Signup and view all the answers

    If we look at the best possible outcome for each decision alternative and choose the decision that has the best 'best outcome', which decision alternative should we choose?

    <p>Decision alternative 1</p> Signup and view all the answers

    Using the Expected Monetary Value (EMV) criterion, which decision alternative should we choose?

    <p>Indifferent between Decision alternative 1 and Decision alternative 2</p> Signup and view all the answers

    A probability node of a decision tree for decision modeling represents __________________.

    <p>a time when the result of an uncertain outcome becomes known</p> Signup and view all the answers

    Which of the following statements is incorrect concerning sensitivity analysis in decision models?

    <p>Using sensitivity analysis, selected decision cannot change if we use the same decision criterion.</p> Signup and view all the answers

    Optimization is a _________ analytics method.

    <p>prescriptive</p> Signup and view all the answers

    A feasible solution of a linear programming model is a solution that represents the values for all decision variables that satisfies all the constraints.

    <p>True</p> Signup and view all the answers

    This is a linear programming model where the objective function is a maximization.

    <p>False</p> Signup and view all the answers

    What are the decision variables in this model?

    <p>Number of snacks, drinks, sun protection items, and clothing items</p> Signup and view all the answers

    Which of the following is not a constraint of this model?

    <p>Minimum weight of all items combined in the backpack must be at least 1600 grams</p> Signup and view all the answers

    In this linear programming model, taking 5 snacks, 6 water bottles, 2 sunscreen items, and 3 clothing items is a feasible solution.

    <p>False</p> Signup and view all the answers

    The total weight of snacks in your backpack cannot exceed the total weight of water bottles. Which formulation below represents this new constraint?

    <p>60*x1</p> Signup and view all the answers

    Decision support systems are computer-based support systems that integrate individuals' expertise and computer capabilities, and they have precise definitions agreed to by practitioners.

    <p>False</p> Signup and view all the answers

    What is Business Intelligence (BI)?

    <p>An umbrella term that combines architectures, databases, analytical tools, applications, and methodologies.</p> Signup and view all the answers

    Data is a collection of observations, experiments, and experiences that do not necessarily represent absolute facts that are universally true.

    <p>True</p> Signup and view all the answers

    What is Descriptive Analytics?

    <p>Descriptive Analytics helps managers understand current events in the organization including causes, trends, and patterns.</p> Signup and view all the answers

    What type of analytics seeks to recognize what is going on as well as the likely forecast and make decisions to achieve the best performance possible?

    <p>Prescriptive Analytics</p> Signup and view all the answers

    Which of the following is/are predictive analytics method(s)? (Select all that apply)

    <p>Regression analysis</p> Signup and view all the answers

    If a model is developed to forecast students at risk of dropping out after the first year of college, what kind of analytics application would this work represent?

    <p>Prescriptive Analytics</p> Signup and view all the answers

    Which chart type below would be most helpful to show the comparison between worldwide turnover rate compared with tech sector turnover rate?

    <p>Bar chart</p> Signup and view all the answers

    Which chart type below would be most helpful to show the relative proportions of turnover rate of different categories within the tech sector?

    <p>Pie chart</p> Signup and view all the answers

    Original (raw) data is usually collected from multiple data sources including various formats, and it is readily usable by analytics tools and algorithms.

    <p>False</p> Signup and view all the answers

    During data transformation, numeric variables can be converted to categorical variables.

    <p>True</p> Signup and view all the answers

    Data reduction can be applied to rows (observations) and/or columns (variables) in a given dataset.

    <p>True</p> Signup and view all the answers

    In data preprocessing step to reduce the dimension of data prior to analysis, sampling the rows is more complex than selecting the columns (variables).

    <p>False</p> Signup and view all the answers

    The choice of visualization method that meets the presentation requirements for a given data depends on the data types available, purpose of the visual, and context.

    <p>True</p> Signup and view all the answers

    Which of the below is not a data preprocessing step?

    <p>Data separation</p> Signup and view all the answers

    Which of the below is a method to deal with filling out the missing values in data?

    <p>Data imputation</p> Signup and view all the answers

    Which of the below statement(s) is/are correct?

    <p>Normalizing values allows for comparison of variables on a single scale.</p> Signup and view all the answers

    When analyzing the original data of household income, which of the following methods would be well-suited to prepare the data for descriptive analysis?

    <p>Fill in missing values with zeros.</p> Signup and view all the answers

    Which of the below statement(s) is/are correct?

    <p>Visual analytics combines data visualization with different analytics methods.</p> Signup and view all the answers

    Which of the following data preprocessing activities fall under data transformation?

    <p>Convert numeric values into discrete categories.</p> Signup and view all the answers

    What chart should Mia use to visualize the relative proportion of market share of Peloton in 2020 compared to competitors?

    <p>Pie chart</p> Signup and view all the answers

    What chart should Mia use to visualize the number of new members joining the Peloton community every month from 2012 to 2020?

    <p>Line chart</p> Signup and view all the answers

    Which data preprocessing activity is not associated with data cleaning?

    <p>Deriving a new variable representing total time of class material from existing variables</p> Signup and view all the answers

    What is the main purpose of imputation methods in data preprocessing?

    <p>Fill in missing values with the most appropriate values</p> Signup and view all the answers

    Linear regression models represent the mathematical relationship between dependent variables to explain or predict a binary independent variable.

    <p>False</p> Signup and view all the answers

    Linear regression analysis can be used to predict an unknown value of a dependent variable using independent variables.

    <p>True</p> Signup and view all the answers

    Comparing two regression models, which statement(s) is/are correct?

    <p>Model 1 captures 42% of the variation in the given data.</p> Signup and view all the answers

    Using the correlation between size and selling price, we can predict the selling price of a new house based on size.

    <p>False</p> Signup and view all the answers

    If the correlation between size and selling price is 0.85, the slope coefficient associated with size in the regression equation would have a positive sign.

    <p>True</p> Signup and view all the answers

    Assuming a regression model is developed to predict a student's final grade, this model is a multiple linear regression model.

    <p>True</p> Signup and view all the answers

    What method would help to test the hypothesis about students with higher GPA percentiles having a higher SAT score?

    <p>Simple linear regression with high school percentile as the independent variable and SAT as the dependent variable.</p> Signup and view all the answers

    Which statement(s) is/are correct about fitting a regression line?

    <p>The intercept represents the SAT score at a GPA percentile of 0.</p> Signup and view all the answers

    What type of regression model would best suit predicting the Combined score of a student?

    <p>Multiple Linear</p> Signup and view all the answers

    If Neal's SAT score is one point higher than Jimmy's, how much higher is Neal's predicted combined score?

    <p>0.06 higher than Jimmy's combined score</p> Signup and view all the answers

    What would be the best suited regression model to predict if a student will be retained in the second year?

    <p>Logistic</p> Signup and view all the answers

    What measure quantifies the ratio of retained students compared to those predicted to be retained?

    <p>Precision</p> Signup and view all the answers

    What is the measure that counts the number of times the model predicted a student's retention correctly?

    <p>Recall</p> Signup and view all the answers

    The relational data in a data warehouse are modified and analyzed using Online Analytical Processing (OLAP) tools.

    <p>True</p> Signup and view all the answers

    What OLAP function allows users to access detailed data from summarized data?

    <p>Drill Down</p> Signup and view all the answers

    What OLAP function transforms data from rows into data grouped on several columns?

    <p>Pivot</p> Signup and view all the answers

    What concept is critical in developing a data warehouse due to data growth and complexity?

    <p>Scalability</p> Signup and view all the answers

    What does it mean that a data warehouse is non-volatile?

    <p>After data is entered, previous data is not erased when new data is added.</p> Signup and view all the answers

    Classification learns to place new instances into their respective groups based on labeled items.

    <p>True</p> Signup and view all the answers

    Finding an affinity of two products to be commonly purchased is known as what?

    <p>Association rule mining</p> Signup and view all the answers

    In Association Rule Mining, confidence is a metric representing the probability of observing items A and B together.

    <p>False</p> Signup and view all the answers

    What is the support for diapers and beer being purchased together?

    <p>60%</p> Signup and view all the answers

    What is the confidence for milk and juice?

    <p>50%</p> Signup and view all the answers

    Which data mining method would best suit predicting the length of night sleep based on variables like day of the week?

    <p>Linear Regression</p> Signup and view all the answers

    Which data mining method would be best suited to predict the length of night sleep based on the past 60 days?

    <p>Time Series</p> Signup and view all the answers

    Which method would best identify which days are similar regarding length of mid-day nap and night sleep?

    <p>Clustering</p> Signup and view all the answers

    Which method is best suited to find frequently observed days and night sleep categories together?

    <p>Association Rule Analysis</p> Signup and view all the answers

    Which method is best suited to predict the night sleep category using various input variables?

    <p>Logistic Regression</p> Signup and view all the answers

    Which of the following is a segmentation model that classifies items in a dataset?

    <p>K-means Clustering</p> Signup and view all the answers

    Study Notes

    Decision Support Systems & Business Intelligence

    • Decision support systems integrate human expertise and computer capabilities but lack precise definitions agreed upon by practitioners.
    • Business Intelligence (BI) encompasses architectures, databases, analytical tools, applications, and methodologies.

    Data Characteristics

    • Data consists of observations and experiences, not always absolute facts.
    • Descriptive analytics clarify current organizational events, revealing causes, trends, and patterns.

    Types of Analytics

    • Prescriptive analytics aim to forecast outcomes and guide decision-making for optimal performance.
    • Predictive analytics methods include regression analysis, clustering, and text analysis.

    Data Visualization Techniques

    • Bar charts effectively compare categories like turnover rates across sectors.
    • Pie charts illustrate proportions of different categories within a sector.

    Data Preprocessing

    • Original data is often unstructured and needs preprocessing for analytical tools.
    • Data transformation may involve rescaling and converting data types.
    • Data reduction can focus on rows (observations) and columns (variables).
    • Imputation methods are used to fill missing data.

    Regression Analysis

    • Multiple linear regression models analyze relationships between dependent and independent variables.
    • Logistic regression is suited for binary outcomes in data predictions.
    • R-squared values measure model fit; higher values indicate better explanation of data variance.

    OLAP Functions

    • OLAP tools like drilling down and pivoting enhance data interrogation and insight extraction.
    • Scalability is a critical consideration for data warehouse development to manage data growth and query complexity.

    Data Mining Techniques

    • Classification categorizes data based on labeled training sets.
    • Association rule mining identifies product affinities.
    • Linear regression predicts continuous outcomes, while logistic regression is used for binary classifications.

    Clustering & Hierarchical Models

    • Hierarchical clustering organizes items based on pairwise distances until all observations are linked.
    • Linkage criteria in clustering (e.g., single and complete) influence model outcome.

    Performance Metrics

    • In classification models, precision measures the accuracy of positive predictions, while recall assesses the correct identification of positive instances.
    • Model performance is evaluated using testing datasets separate from training data to ensure unbiased accuracy measurement.### General Definitions
    • False Positive (FP): Refers to the incorrectly classified positive instances in a confusion matrix. In an example provided, the FP count is 20.
    • Misclassification Count: Total number of misclassifications made by a model. For a specified case, it totals 60.

    K-fold Cross-Validation

    • Definition: Involves splitting a complete dataset into mutually exclusive subsets, testing multiple times on each left-out subset while using the remaining data for training.

    Confusion Matrix Insights

    • True Positive Rate vs. True Negative Rate: It’s stated that the true positive rate is lower than the true negative rate, which is marked as false.

    Clustering Models

    • Hierarchical Clustering: In a sample with customers, the first cluster includes Dennis and Ross based on their spending patterns.
    • Single Linkage Criteria: After forming the first cluster, the next step would be to add Ben to the cluster of Dennis and Ross.
    • Complete Linkage Criteria: Following the initial formation, the next action is to create a cluster with Kristen and Ben or add Kristen to Dennis and Ross’ cluster.

    Text Mining Concepts

    • Tokenizing: The process involves breaking text into simple components, such as sentences or words.
    • Bag-of-Words Method: Utilizes word frequencies to extract features from textual data.
    • Lexicon: A repository of words paired with scores or categories reflecting their meanings.

    Decision Models

    • Expected Monetary Value (EMV): A method to estimate the net profit for decisions made under uncertainty. Calculated EMVs show staying at the current location results in an EMV of 100,000, while expanding yields 120,000.
    • Decision Making Criteria:
      • Worst Outcome: Choosing to stay in the current location provides the best bad outcome.
      • Best Outcome: Opting for expansion presents the most favorable outcome.
      • Weighted Average: The largest EMV suggests that expanding is the optimal choice.

    Constraints in Linear Programming

    • Feasible Solutions: These represent values for decision variables that adhere to all constraints, confirmed as true for a sample linear programming model.
    • Decision Variables: Include quantities of snacks, drinks, sun protection items, and clothing.
    • Constraints Identification:
      • An invalid constraint example is the minimum weight of all items needing to be at least 1600 grams, as maximum weights should also be considered.

    Weight Management in Constraints

    • Additional Constraint Example: Stipulating that the total weight of snacks must not exceed that of water bottles introduced a new constraint defined by (60*x_1), where (x_1) represents the number of snacks.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Description

    Prepare for your BSAN 160 exam with these flashcards covering key concepts like Decision Support Systems and Business Intelligence. Each card tests your understanding and retention of crucial terms and definitions. Perfect for quick revisions before the exam!

    More Like This

    Use Quizgecko on...
    Browser
    Browser