SHC-C-NYP-L2CDA Certified Data Analyst Past Paper PDF
Document Details
Uploaded by Deleted User
NYP
2022
Tags
Summary
This is a past paper for a certified data analyst course. It contains multiple-choice questions covering various aspects of data analysis, including statistical techniques, regression analysis, and data visualization. The questions test the student's understanding of core data analysis concepts.
Full Transcript
- SHC-C-NYP-L2CDA - Certified Data Analyst - POLITEMall 14/4/22, 12:17 PM Neo Teng Yong STU-SIT (username: [email protected]) Attempt 4 Written: 14 April 2022 12:15 PM - 14 April 2022 12:16 PM Submission V...
- SHC-C-NYP-L2CDA - Certified Data Analyst - POLITEMall 14/4/22, 12:17 PM Neo Teng Yong STU-SIT (username: [email protected]) Attempt 4 Written: 14 April 2022 12:15 PM - 14 April 2022 12:16 PM Submission View Your quiz has been submitted successfully. MCQs and Multi-answers Question 1 0 / 1 point The exhibit shows the ratings given by customers for AA hotel and BB hotel. The customers who rated for AA hotel is independent from the customers who rated for BB hotel. Which of the following statistical technique should be used to test if there is statistically significant difference in the mean ratings for the 2 hotels? Chi-square test Kruskal-Wallis test Paired t-test T-test https://lms.polite.edu.sg/d2l/lms/quizzing/user/quiz_submission…Popup=0&isprv=0&dnb=1&cfql=1&fromQB=0&d2l_body_type=1&ou=81200 Page 1 of 15 - SHC-C-NYP-L2CDA - Certified Data Analyst - POLITEMall 14/4/22, 12:17 PM Question 2 0 / 1 point Given the above dataset for prediction of the relationship between box office gross and MPAA ratings (G, PG, PG- 13, R, and N-17) in Disney move. Match the correct measurement scale (binary/nominal/ordinal/interval/ratio) for the following features/columns. Genre ___binary___ (nominal) Release Date ___binary___ (interval) Total Gross ___binary___ (ratio) MPAA Rating ___binary___ (ordinal) Question 3 0 / 1 point The exhibit shows the results of a regression analysis with Fitness Index as the dependent variable. Which of the following variable is insignificant at 95% significance level? Pulse rate Sleep quality Female Age Question 4 0 / 1 point https://lms.polite.edu.sg/d2l/lms/quizzing/user/quiz_submission…Popup=0&isprv=0&dnb=1&cfql=1&fromQB=0&d2l_body_type=1&ou=81200 Page 2 of 15 - SHC-C-NYP-L2CDA - Certified Data Analyst - POLITEMall 14/4/22, 12:17 PM The exhibit shows the total government expenditure on education. Suggest the appropriate visualisation to use. Line chart Pie chart Histogram Stacked bar chart Question 5 0 / 1 point The exhibit shows the choice of inputs for a predictive model where the outcome variable is the "Turnover". Suggest an input that should not be included. Age Gender Employee ID https://lms.polite.edu.sg/d2l/lms/quizzing/user/quiz_submission…Popup=0&isprv=0&dnb=1&cfql=1&fromQB=0&d2l_body_type=1&ou=81200 Page 3 of 15 - SHC-C-NYP-L2CDA - Certified Data Analyst - POLITEMall 14/4/22, 12:17 PM Duration Question 6 0 / 1 point An analyst has performed 4 regression models based on the same dependent variable, using a different number of independent variables for each model out of all the ones available to him. Which model should he choose? Model C: Adjusted R-square =0.68 Model B: Adjusted R-square =0.88 Model D: Adjusted R-square =0.26 Model A: Adjusted R-square =0.79 Question 7 0 / 1 point An analyst would like to understand the relative effect of various factors on sales growth rate. Choose the appropriate technique(s) he can use to find the effect size of each input variable, assuming he wants the output to be a numerical variable. Choose all options that apply. Logistic Regression Decision Tree Neural Network Analysis Linear Regression Question 8 0 / 1 point https://lms.polite.edu.sg/d2l/lms/quizzing/user/quiz_submission…Popup=0&isprv=0&dnb=1&cfql=1&fromQB=0&d2l_body_type=1&ou=81200 Page 4 of 15 - SHC-C-NYP-L2CDA - Certified Data Analyst - POLITEMall 14/4/22, 12:17 PM Given the exhibit, which of the following is/are true? Choose all options that apply. Y's variability is unequal across X's range. Y's variability is equal across X's range. There is positive linear correlation between x and y. There is negative linear correlation between x and y. https://lms.polite.edu.sg/d2l/lms/quizzing/user/quiz_submission…Popup=0&isprv=0&dnb=1&cfql=1&fromQB=0&d2l_body_type=1&ou=81200 Page 5 of 15 - SHC-C-NYP-L2CDA - Certified Data Analyst - POLITEMall 14/4/22, 12:17 PM Question 9 0 / 1 point In the detection of tumour cells within x-ray images where the outcome variable is "Positive" and "Negative", state the appropriate techniques to use. Choose all options that apply. Decision Tree Support Vector Machine Neural Network Linear Regression Question 10 0 / 1 point Data exists in different forms and sizes but most of it can be presented as structured, semi-structured or un- structured, depending on its characteristics and whether the schema is represented in a data model. Data may also be classified as repetitive or non-repetitive depending on whether they were generated from repetitive processes. Portable Document Format (PDF) is a file format developed by Adobe to present documents, including text formatting and images, in a manner independent of application software, hardware and operating systems. How would PDF file be classified as a data type? unstructured non-repetitive structured repetitive unstructure repetitive structured repetitive Question 11 0 / 1 point https://lms.polite.edu.sg/d2l/lms/quizzing/user/quiz_submission…Popup=0&isprv=0&dnb=1&cfql=1&fromQB=0&d2l_body_type=1&ou=81200 Page 6 of 15 - SHC-C-NYP-L2CDA - Certified Data Analyst - POLITEMall 14/4/22, 12:17 PM The exhibit shows the gross reproduction rate, net reproduction rate and total fertility rate in Singapore. Which of the following is true? The total population is decreasing over the years. There is population growth each year. The Total Fertility rate is on an increasing trend over the years from 1990 to 2015. There is a spike in the Total Fertility rate for the year 2000. Question 12 0 / 1 point Human Resource (HR) department has approached you to perform a data manipulation step for the employee data before piping it into the dashboard. You notice that around 90% of age column is missing. What form of data preparation can you set up? Impute the missing age value using k-NN Eliminate the records containing missing age values Impute the missing age value using the mean age Replace the missing value with a new category 'missing' Short Answers Question 13 0 / 1 point https://lms.polite.edu.sg/d2l/lms/quizzing/user/quiz_submission…Popup=0&isprv=0&dnb=1&cfql=1&fromQB=0&d2l_body_type=1&ou=81200 Page 7 of 15 - SHC-C-NYP-L2CDA - Certified Data Analyst - POLITEMall 14/4/22, 12:17 PM Exhibit A Exhibit B Exhibit A shows the cost of meals in Restaurant A and Restaurant B. Given the statistical results in exhibit B, give your interpretation if there is statistical significant difference in the mean cost of meals between Restaurant A and Restaurant B. - No text entered - This question has not been graded. The correct answer is not displayed for Written Response type questions. Question 14 0 / 1 point The Certificate of Entitlement (COE) gives one the rights to own and operate a vehicle in Singapore. The number of available COEs in each category is determined by the Vehicle Quota. A taxi rental company would like to forecast the number of vehicle quota for taxis category. Below is the number of monthly motor vehicle population collected from 2020 to 2021. https://lms.polite.edu.sg/d2l/lms/quizzing/user/quiz_submission…Popup=0&isprv=0&dnb=1&cfql=1&fromQB=0&d2l_body_type=1&ou=81200 Page 8 of 15 - SHC-C-NYP-L2CDA - Certified Data Analyst - POLITEMall 14/4/22, 12:17 PM As part of the data understanding, you have visualised the monthly confirmed cases. https://lms.polite.edu.sg/d2l/lms/quizzing/user/quiz_submission…Popup=0&isprv=0&dnb=1&cfql=1&fromQB=0&d2l_body_type=1&ou=81200 Page 9 of 15 - SHC-C-NYP-L2CDA - Certified Data Analyst - POLITEMall 14/4/22, 12:17 PM Based on the dataset and visualisation above, cite with example in the data shown below, identify the data issue with the dataset. - No text entered - This question has not been graded. The correct answer is not displayed for Written Response type questions. Question 15 0 / 1 point The exhibit shows the choice of inputs from a training dataset for a predictive model where the outcome variable is the "Response Indicator". Suggest an issue with the input selection. - No text entered - This question has not been graded. The correct answer is not displayed for Written Response type questions. Question 16 0 / 1 point The heat map shows the mode of shipping for a company's transactions. a. Suggest an issue with the chart. b. How can the visualisation be improved? - No text entered - This question has not been graded. https://lms.polite.edu.sg/d2l/lms/quizzing/user/quiz_submissio…Popup=0&isprv=0&dnb=1&cfql=1&fromQB=0&d2l_body_type=1&ou=81200 Page 10 of 15 - SHC-C-NYP-L2CDA - Certified Data Analyst - POLITEMall 14/4/22, 12:17 PM The correct answer is not displayed for Written Response type questions. Question 17 0 / 1 point The visualisations below show the population, housing and infrastructure in Singapore. Based on the visualisations given, the analysts would like to find which is the most cluttered town and which town has the most facilities. Are these visualisations suitable for this purpose? If yes, describe how you will use them to answer the question. If not, describe what visualisations you will create instead. - No text entered - This question has not been graded. The correct answer is not displayed for Written Response type questions. https://lms.polite.edu.sg/d2l/lms/quizzing/user/quiz_submission…Popup=0&isprv=0&dnb=1&cfql=1&fromQB=0&d2l_body_type=1&ou=81200 Page 11 of 15 - SHC-C-NYP-L2CDA - Certified Data Analyst - POLITEMall 14/4/22, 12:17 PM Question 18 0 / 1 point Due to many factors and negligence of traffic rules, many road traffic accidents are happening in the world. You are required to build a predictive model to predict accident severity of a traffic accident. The first 10 examples of the dataset are shown in the table below. Identify two (2) data preparation steps you will apply on the dataset before modelling. Cite examples in the data shown when possible. - No text entered - This question has not been graded. The correct answer is not displayed for Written Response type questions. Question 19 0 / 1 point Exhibit A https://lms.polite.edu.sg/d2l/lms/quizzing/user/quiz_submissio…Popup=0&isprv=0&dnb=1&cfql=1&fromQB=0&d2l_body_type=1&ou=81200 Page 12 of 15 - SHC-C-NYP-L2CDA - Certified Data Analyst - POLITEMall 14/4/22, 12:17 PM Exhibit B The charts above show the website traffic and number of advertisements for Company A in the year 2021. Exhibit A shows the number of visitors to the website each month. Exhibit B shows the number of advertisements shown on television each month. a. Suppose the analyst would like to know if the number of advertisements shown increases the website traffic, what information is lacking in both exhibits? b. Suggest what type of chart(s) would help the analyst decide. c. Specify the y-axis and x-axis for the suggested chart(s) in part b. - No text entered - This question has not been graded. The correct answer is not displayed for Written Response type questions. Script & File Question 20 0 / 1 point This question contains 3 sub-questions on data quality on datasets called Sales (Sales.csv) and Town (Town.csv). Download the datasets and either the Python or R template to work on this question: Data Quality Practice Test Template.ipynb Data Quality Practice Test Template.py Data Quality Practice Test Template.r The final submission will be Python/R template with the code filled in. 1. Read the Dataset a. Read the Sales dataset into a dataframe and print the first five rows of the dataframe. b. Merge the town column from the Town dataset to the Sales dataset. 2. Data Cleaning: Check for missing data a. Find the columns that contain missing data 3. Replace missing data with meaningful replacements a. Replace the missing value for monthly_premium with its mean value. Verify that this column no https://lms.polite.edu.sg/d2l/lms/quizzing/user/quiz_submissio…Popup=0&isprv=0&dnb=1&cfql=1&fromQB=0&d2l_body_type=1&ou=81200 Page 13 of 15 - SHC-C-NYP-L2CDA - Certified Data Analyst - POLITEMall 14/4/22, 12:17 PM longer contains missing data. b. Replace the missing value for gender with the string 'Unknown'. Verify that this column no longer contains missing data. c. Replace the missing value for total_claims by using number_of_claims multiply by per_claim_amount. Verify that this column no longer contains missing data. - No text entered - This question has not been graded. The correct answer is not displayed for Written Response type questions. Question 21 0 / 1 point This question contains 3 sub-questions on a dataset called Job (Job.csv. Download the dataset and either the Python or R template to work on this question. Exploratory Analysis Practice Test Template.ipynb Exploratory Analysis Practice Test Template.py Exploratory Analysis Practice Test Template.r The final submission will be Python/R template with the code filled in. 1. Read the Dataset a. Read the Job dataset into a dataframe b. Display the first 5 rows of the Job dataset 2. Descriptive Statistics a. Find the basic statistics (mean, standard deviation, min, max) of the Job dataset 3. Create calculated fields a. Create a new Annual Salary column to calculate the annual salary of each employee assuming number of working days in a year is 260 days (summarise using Pay Rate and 260) b. Create a new Work Year column to calculate the number of years each employee has been working in the company as of 2021 (summarise using 2021 and Year of Hire) c. Using the two new columns (Annual Salary and Work Year), create a new Total Salary column to calculate the total salary earned by each employee as of end 2021 - No text entered - This question has not been graded. The correct answer is not displayed for Written Response type questions. Question 22 0 / 1 point This question requires you to use any visualisation software or Python/R packages of your choice to create suitable charts for the given business requirements below. The dataset you will use is Superstore (Superstore.csv). 1. Create a visualisation to compare sales (product of selling price * quantity, summed across all orders) by product category and region, to help management determine whether to vary marketing efforts in one or more specific areas. 2. Create a visualisation to investigate the proportion product category bought by companies in various region in terms of quantity sold. 3. Create a visualisation to investigate the selling price among region and customer segment. - No text entered - https://lms.polite.edu.sg/d2l/lms/quizzing/user/quiz_submissio…Popup=0&isprv=0&dnb=1&cfql=1&fromQB=0&d2l_body_type=1&ou=81200 Page 14 of 15 - SHC-C-NYP-L2CDA - Certified Data Analyst - POLITEMall 14/4/22, 12:17 PM This question has not been graded. The correct answer is not displayed for Written Response type questions. Attempt Score:0 / 22 - 0 % Overall Grade (highest attempt):11 / 22 - 50 % Done https://lms.polite.edu.sg/d2l/lms/quizzing/user/quiz_submissio…Popup=0&isprv=0&dnb=1&cfql=1&fromQB=0&d2l_body_type=1&ou=81200 Page 15 of 15