STU-SIT Quiz on Statistical Analysis
48 Questions
1 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which statistical technique should be used to test if there is a statistically significant difference in the mean ratings for AA hotel and BB hotel?

  • T-test (correct)
  • Chi-square test
  • Paired t-test
  • Kruskal-Wallis test
  • The MPAA rating is considered a nominal scale.

    False

    What measurement scale is used for the feature 'Total Gross'?

    ratio

    The variable 'Release Date' is measured on the ___ scale.

    <p>interval</p> Signup and view all the answers

    Match the following features with their corresponding measurement scales:

    <p>Genre = Nominal Release Date = Interval Total Gross = Ratio MPAA Rating = Ordinal</p> Signup and view all the answers

    In a regression analysis, which variable is likely to be insignificant at a 95% significance level?

    <p>Independent variable with p-value greater than 0.05</p> Signup and view all the answers

    The Kruskal-Wallis test is suitable for comparing more than two independent groups.

    <p>True</p> Signup and view all the answers

    Which of the following techniques is NOT suitable for tumor cell detection in x-ray images?

    <p>Linear Regression</p> Signup and view all the answers

    What is the primary purpose of a Chi-square test?

    <p>To test the independence of categorical variables</p> Signup and view all the answers

    All forms of data can be classified as either structured, semi-structured, or unstructured.

    <p>True</p> Signup and view all the answers

    Name one advantage of using Neural Networks for detecting tumor cells.

    <p>Neural Networks can learn complex patterns in data.</p> Signup and view all the answers

    PDF file format is developed by Adobe to present documents independent of __________.

    <p>application software, hardware and operating systems</p> Signup and view all the answers

    Which of the following best describes a structured data type?

    <p>Data organized into a predefined model</p> Signup and view all the answers

    A decision tree is a type of regression model.

    <p>False</p> Signup and view all the answers

    How does the repetition of processes affect data classification?

    <p>Data may be classified as repetitive or non-repetitive.</p> Signup and view all the answers

    Match the following data types with their characteristics:

    <p>Structured = Organized with a predefined schema Semi-structured = Contains both fixed and variable fields Unstructured = Not organized in a pre-defined format Repetitive = Generated from recurrent processes</p> Signup and view all the answers

    Which statement is true regarding the population trends in Singapore from 1990 to 2015?

    <p>There is a spike in the Total Fertility rate for the year 2000.</p> Signup and view all the answers

    The Total Fertility rate in Singapore has shown a consistent increase from 1990 to 2015.

    <p>False</p> Signup and view all the answers

    What percentage of the age column is missing in the employee data?

    <p>90%</p> Signup and view all the answers

    It is suggested to __________ the missing age value using k-NN.

    <p>impute</p> Signup and view all the answers

    What is an appropriate data manipulation step for handling a column with 90% missing values?

    <p>Impute the missing age value using k-NN.</p> Signup and view all the answers

    Replacing missing values with a new category 'missing' is a valid data manipulation technique.

    <p>True</p> Signup and view all the answers

    Match the methods of data preparation with their descriptions:

    <p>Imputation using k-NN = Statistical method that predicts missing values based on similar records Mean imputation = Replacing missing values with the average of available data Eliminating records = Removing rows with missing values from the dataset Categorical replacement = Creating a new category for missing data</p> Signup and view all the answers

    What statistical analysis method would you use to determine the significance of the difference in meal costs?

    <p>t-test or ANOVA</p> Signup and view all the answers

    What does the Certificate of Entitlement (COE) in Singapore allow an individual to do?

    <p>Own and operate a vehicle</p> Signup and view all the answers

    The number of available COEs in each category is fixed and does not change.

    <p>False</p> Signup and view all the answers

    What method is used to visualize the number of monthly confirmed cases?

    <p>Data visualization</p> Signup and view all the answers

    The outcome variable in the predictive model is the __________ Indicator.

    <p>Response</p> Signup and view all the answers

    Match the following data visualization components with their purposes:

    <p>Heat map = Mode of shipping Line chart = Trend over time Bar chart = Comparison between categories Pie chart = Proportion of parts to a whole</p> Signup and view all the answers

    Which of the following is a potential issue with the input selection for the predictive model?

    <p>Inputs lack diversity</p> Signup and view all the answers

    Visualizations should always be complicated to ensure thorough data representation.

    <p>False</p> Signup and view all the answers

    Suggest a way to improve a heat map visualization.

    <p>By adding labels or scaling colors for better interpretation.</p> Signup and view all the answers

    What is the primary purpose of creating visualizations for the Superstore dataset?

    <p>To determine marketing strategy variations</p> Signup and view all the answers

    The visualizations should include a comparison of sales by product category and customer loyalty.

    <p>False</p> Signup and view all the answers

    What factor is multiplied to determine sales in the Superstore dataset?

    <p>Selling price and quantity</p> Signup and view all the answers

    The visualizations for investigating the proportion of product category bought should focus on _____ sold.

    <p>quantity</p> Signup and view all the answers

    Match the following visualization purposes with their descriptions:

    <p>Comparison of sales = To determine effective marketing strategies Proportion of product category = To understand purchasing behavior by region Investigate selling price = To analyze price differences across segments and regions</p> Signup and view all the answers

    What is the main purpose of analyzing the relationship between advertisements and website traffic?

    <p>To assess marketing effectiveness</p> Signup and view all the answers

    The exhibits provide adequate information to determine if increased advertisements cause increased website traffic.

    <p>False</p> Signup and view all the answers

    What type of chart would best illustrate the relationship between the number of advertisements and website traffic?

    <p>Scatter plot</p> Signup and view all the answers

    The y-axis of the suggested chart should represent _____ and the x-axis should represent _____ in analyzing the advertisements and website traffic.

    Signup and view all the answers

    What information is crucial for determining whether increased advertisements affect website traffic?

    <p>The monthly trends of website visitors and advertisements</p> Signup and view all the answers

    The exhibits provide all necessary information to conclude that increased advertisements lead to increased website traffic.

    <p>False</p> Signup and view all the answers

    What type of chart would be most effective in illustrating the relationship between the number of advertisements and website traffic?

    <p>Scatter plot</p> Signup and view all the answers

    Match the following types of data with their corresponding characteristics:

    <p>Structured Data = Organized in a predefined format with identifiable patterns Unstructured Data = Information that does not have a predefined data model Semi-structured Data = Contains elements of both structured and unstructured data Qualitative Data = Descriptive information that cannot be measured numerically</p> Signup and view all the answers

    Which of the following is a valid method for checking data quality in the Sales dataset?

    <p>Identifying duplicates in customer entries</p> Signup and view all the answers

    Replacing missing values with their mean is always the most accurate method of data cleaning.

    <p>False</p> Signup and view all the answers

    What should be done to replace missing values for the monthly_premium column?

    <p>Replace with the mean value of the column</p> Signup and view all the answers

    Study Notes

    Neo Teng Yong STU-SIT Quiz Notes

    • Attempt 4: Submission date: April 14, 2022, 12:15-12:16 PM.

    • Question 1: Customers rated AA and BB hotels independently. A T-test should be used to check for statistically significant difference in mean ratings between the two hotels.

    • Question 1 Data:

      • AA hotel: 7, 6, 6, 10, 6, 6, 10, 1, 6, 5, 4, 10, 10, 1, 6, 6,10, 1, 8, 6
      • BB hotel: 8, 10, 6, 5, 10, 3,4, 9,
    • Question 2: Movie title, Release date, Genre, MPAA rating, and Total Gross.

    • Question 2 Data: Film data.

    • Question 3: Fitness Index is the dependent variable.

    • Question 3 Data: Coefficient, Standard Error, t Stat, P-value of age, sleep quality, pulse rate, and female.

    • Question 3 Results: Identify insignificant variables at a 95% significance level.

    • Question 4: Suggest the appropriate data visualisation technique for analyzing total government expenditure on education.

    • Question 4 Data: Total expenditure on education data by year.

    • Question 4 Results: Line chart would be suitable.

    • Question 5: Which variable should not be included in a statistical predictive model with outcome variable "Turnover"?

    • Question 5 Data: Variable Name, Role, Measurement Level, Description

    • Question 5 Results: Age should not be included.

    • Question 6: Which regression model should be chosen based on adjusted R-squared values?

    • Question 6 Data: Model C: Adjusted R-square = 0.68; Model B: Adjusted R-square = 0.88; Model D: Adjusted R-square = 0.26; Model A: Adjusted R-square = 0.79

    • Question 6 Results: Model B with highest adjusted R-squared.

    • Question 7: Appropriate statistical techniques for finding effect sizes of various factors on sales growth rate.

    • Question 7 Results: Logistic regression, decision tree, Neural Network Analysis, and Linear Regression are all potential techniques.

    • Question 8: Determine the validity of statements about the scatterplot of two variables.

    • Question 8 Data: Plot analysis.

    • Question 8 Results: Y's variability is unequal across X's range, there is positive linear correlation between x and y.

    • Question 9: Appropriate techniques for detecting tumor cells in x-ray images.

    • Question 9 Results: Decision Tree, Support Vector Machines, Neural Network, and Linear Regression.

    • Question 10: Data classification type.

    • Question 10 Data: Data types, PDF File

    • Question 10 Results: PDF format data would be classified as structured repetitive

    • Question 11: N/A.

    • Question 12: Data preparation steps for missing age values in employee data.

    • Question 12 Results: Eliminating records with missing values or imputing missing values using K-NN or the mean are possible solutions.

    • Question 13: No data for analysis is presented

    • Question 14: No data for analysis is presented

    • Question 15: Data issues with the input selection for the study. (No details provided)

    • Question 16: No data for analysis is presented

    • Question 17: Evaluation of suitability of datasets for analysis. (No details provided)

    • Question 18: Data preparation steps for predictive modeling for accident severity. (No details provided)

    • Question 19: No data for analysis is presented.

    • Question 20: Data cleaning steps for two datasets (Sales and Town). (No details provided)

    • Question 21: No data for analysis is presented.

    • Question 22: Creating visualizations for sales comparison by category and region, comparing product category purchases by region, and analyzing selling prices by region and customer segments. (No details provided)

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Description

    This quiz covers statistical analysis techniques using real data, including T-tests for comparing hotel ratings and regression analysis for understanding fitness indices. It also explores data visualization methods for government expenditure analysis. Test your knowledge of these important statistical concepts and practices.

    More Like This

    Use Quizgecko on...
    Browser
    Browser