GEA1000 Lecture Notes (2) PDF
Document Details
Uploaded by RaptPrimrose4407
Tags
Summary
This document contains questions and answers related to research methods in a scientific context, particularly focusing on controlled experiments and observational studies. The questions cover topics like random assignment, treatment and control groups, and different types of research questions. The document also includes exercises and examples of applying these methods.
Full Transcript
Exercise 1 25 Exercise 1 1. The following is a research question from a scientific journal. population interest What percen...
Exercise 1 25 Exercise 1 1. The following is a research question from a scientific journal. population interest What percentage of Singaporeans are keen to take vaccine X? estimate the % of Singaporeans keen What type of research question is this? to take vaccine X (A) Make an estimate about the population. (B) Test a claim about the population. (C) Compare two sub-populations. (D) All of the other options. 2. Drug X is a new drug created. It is intended to be taken as a tablet by people who have skin allergy reactions. However, before pushing it into the market, researchers need to test the e!ectiveness of drug X. Thus, they designed a study, with two groups - a treatment and a control group. Subjects with skin allergy reactions were invited to the study and placed into either of the two groups. The subjects in the treatment group received a drug X tablet to consume. Subjects were studied to see if their allergic reactions were successfully alleviated and were marked as either ‘successful’ or ‘unsuccessful’. should be empty/existing treatment/placebo What are the possible tablets to give to the control group subjects, for comparing drug X’s e!ec- tiveness, assuming all tablets look and taste the same? Select all that apply. (A) An empty tablet. (B) A tablet containing glucose. It is a definite known fact that glucose has a 2% better success rate than an empty tablet. (C) A tablet containing salt, with an unknown success rate. 3. Which of the following scenarios is an example of random assignment in a controlled experiment? (I) For each subject, Peter throws a fair die of six sides. Subjects are assigned based on the number shown on the top surface of the die. Before the start of the assignment, Peter determines that the numbers “1”, “2” and “6” will be assigned to the treatment group, and “3”, “4” and “5” to the control group. (involves the use of chance (II) James lists all subjects in the experiment by alphabetical order, and selects subjects whose last name starts with “A”, “B”, “C” till “M” to place in the treatment group, while subjects whose last name starts with “N”, “O”, “P” till “Z” are placed in the control group. (A) Only (I). (B) Only (II). (C) Neither (I) nor (II). (D) Both (I) and (II). 4. Which of the following statements is/are always true about controlled experiments and observa- tional studies? participants self-assign to control and treatment group (I) There is no control group in observational studies. (II) Randomised assignment of subjects does not occur in observational studies. ~ (III) There are no confounders in controlled experiments. if subjects are not randomly assigned into the different groups (A) Only (I) and (II). (B) Only (I) and (III). 26 Chapter 1. Getting Data (C) Only (II) and (III). (D) Only (II). 5. Siti conducted an investigation by randomly assigning each subject from a randomly selected sample of 50 participants, either to watch Netflix for 1 hour 4 times a week, or to listen to Symphony 924FM for 1 hour 4 times a week. After 6 months, the changes in the subjects’ blood pressure readings over the same period were recorded. The changes were compared for the two groups. Which of the following is true? (A) This is a randomised experiment because blood pressure was measured at the beginning and end of the study. (B) X This is an observational study because the two groups were compared at the end of the study. (C) This is a randomised experiment because the participants were randomly assigned to either activity. (D) This is a randomised experiment because a random sample of participants was used. 6. A medical researcher assigned 80000 patients to receive either a new drug or an old drug randomly. Among the 40123 patients who received the new drug, 24007 were male. Among the 80000 patients, what is the most likely proportion of females? Random assignments of large no. tend to produce groups similar (A) 20%. in all aspects. Applying to proportion of females since sok is large (B) 30%. (C) 40%. ↓ male 24007 % (D) 50%. = 60 % new drug 40123 7. May, an owner of a tuition center, wishes to find out if using iPads during tuition class improves her students’ academic performance. She decided to conduct an experiment as follows: 1. She groups all the students in her center according to the day they come for tuition. For simplicity’s sake, we can assume each student only goes for tuition once per week, there is at least one class of tuition every day in her center, and no student drops out halfway. 2. Every student who goes for tuition on weekends will be given an iPad to use during class. The students who go for tuition on weekdays will not be given an iPad. 3. She then keeps track of all her students’ academic performance for the next 6 months. probability not used in selection for Which of the following statements is/are true? treatment/control (I) She used a probability sampling method. X (II) This is a controlled experiment without random assignment. weekend grp treatment: (A) X Only (I). weekday grp control : no random assignment involved (B) Only (II). (C) X Both (I) and (II). (D) Neither (I) nor (II). 8. A researcher has invited 500 people to participate in his study. He uses random assignment to assign the subjects into the Treatment and Control groups. The Treatment group has 200 subjects, and the demographics of the 200 subjects are as follows: Male Female Old 83 18 Young 32 67 Exercise 1 27 The 300 remaining subjects are in the Control group. The researcher should expect the number of young males in the Control group to be around. 32 (A) 32. 200 = 0 16. (Treatment group (B) 48. (C) 51. 0. 16(300) = 48 (Control group) (D) 173. 9. Adam is a supervisor of the Call-Centre department of his company. He is interested to know if there is a relationship between having mid-day naps and the average number of calls completed per individual among all 500 workers in his department. Of the 500 workers, 400 are females and 100 are males. He uses a randomised mechanism in assigning 250 of the workers to the treatment group and 250 of the workers to the control group. Workers in the treatment group are given a mid-day nap between 2p.m.–2.30p.m. every day, while those in the control group were not given the mid-day nap. We assume that all in the treatment group took their daily nap of 30 minutes. The number of calls cleared by the individual workers were recorded for a month, and it is noted that there is a positive association between nap and daily average number of completed calls among the 500 workers. Which of the following must be true? the study is a census (A) The study’s findings are applicable to the target population of interest. (B) X There will be 50 males and 200 females randomly assigned to the treatment group, and 50 males and 200 females randomly assigned to the control group. (C) The above is an example of a randomised, double-blind controlled experiment. not stated if administrators and workers (D) None of the other options. were blinded 10. Suppose that in an experimental study on tea consumption and its association with blood pressure level, we have 500 participants who were assigned into two groups – the treatment and control groups. Participants were randomly given a number from 1-500, following which participants numbered 1 - 250 were assigned to the treatment group whilst the rest were assigned to the control group. Individuals in the treatment group were given freshly brewed Osmanthus Tea, while those in the control group were given plain water. To alleviate any concerns about the nature of the drink, the drinks were prepared in front of the subjects and the assessors, regardless of whether the random subjects were receiving tea or plain water. Which of the following best describes the study above? assignment (A) The above is a non-randomised, non-blinded controlled experiment. can differentiate (B) The above is a randomised, non-blinded controlled experiment. so non-blinded (C) The above is a non-randomised, single-blinded controlled experiment. (D) The above is a randomised, single-blinded controlled experiment. 11. From the options given, select all possible words that can be used to complete the sentence below. Probability sampling refers to a sampling process whereby the probability of selection of individuals within the sampling frame must be. Definition known randomized mechanism : (A) non-zero (B) known (C) high 28 Chapter 1. Getting Data 12. Paracetamol company NAS owns a tablet press machine that produces Paracetamol tablets. On one shift, 3000 batches of tablets were manufactured. Each batch contains 10 tablets - a total of 30,000 tablets were manufactured. A researcher wants to ensure the dosage in the tablets is correct but has no time to check every single tablet. Hence, she decides to sample some of the tablets instead. Which of the following describes a probability sampling method? Select all that apply. example of simple random sampling (A) Select 3000 tablets at random. (B) Label all the tablets in each batch from 1 to 10, select a number from 1 to 10 at random, and select the unit from every batch that corresponds to that number. (C) Select 300 batches at random, and then sample all tablets in every selected batch. (D) Select the first 3000 tablets that were manufactured. 13. A recent study revealed that Singapore is “the most tired country in the world, due to work and internet.” A researcher decided to conduct a further study on internet usage behaviour and working hours among all Singaporean adults in Singapore. Data was collected by interviewing commuters alighting from Pasir Ris MRT (East), Woodlands MRT (North), Redhill MRT (South) and Jurong East MRT (West) from 8am to 11pm over a period of 7 days. not random sampling Which of the following statements is necessarily true? (A) As data was collected from di!erent parts of Singapore, it is generalisable to the population of Singapore. (B) Due to the equal representation of Northern, Southern, Eastern and Western parts of Singa- pore, selection bias is minimised. (C) In this example, non-response bias exists because of a bad sampling plan. (D) None of the other options. 14. The United States government conducts a Census of Agriculture every five years. The census comprises farmland usage in all the 50 states in the country. John generated a sample of 3000 counties across all states from this census. He then collected data on the number of acres of land space these counties in the sample devoted to farms, and summarised his findings in a report as follows: (I) Of the 3000 counties selected, 25 counties were selected more than once in the sampling process. supports (II) 18% of the counties selected in this sample were from the state of Virginia, while none were from the states of Alaska, Arizona, Connecticut, Delaware, Hawaii, Rhode Island, Utah or Wyoming. John claimed that he obtained the sample of 3000 counties by Stratified Sampling with replace- ment, with the stratum being every state in the United States. Assuming that statements (I) and (II) are true, which of the statements do not/does not support John’s claim on his sampling method? where stratified sampling and sampling with (A) Only (I). replacement is done with each stratum. (B) Only (II). there is a chance of repeated obtaining a observation. Stratified Sampling is done If (C) Neither (I) nor (II). such that every state is represented it is (D) Both (I) and (II). , not possible for cases where some states are not represented. Exercise 1 29 15. For the following two cases, determine which sampling plan was used. Case 1: In an opinion poll, an airline company made a list of all its flights on 1 Jan 2022 and then selected a simple random sample of 30 flights. All the passengers on those flights selected were asked to fill out a questionnaire form. cluster each randomly selected flight all passengers were : , involved Case 2: A departmental store wanted to find out if customers would be willing to pay slightly higher prices for their products in order to have a smartphone app which customers can use to help them locate items in the store. The store hired an interviewer John and placed him at the only entrance on a particular day. John was asked to collect a sample of 100 opinions by interviewing the next person who came through the entrance each time he finishes an interview. next person selected (A) Case 1: Cluster sampling plan; Case 2: Systematic sampling plan. was not randomly (B) Case 1: Stratified sampling plan; Case 2: Systematic sampling plan. selected (C) Case 1: Stratified sampling plan; Case 2: Non-probability sampling. (D) Case 1: Cluster sampling plan; Case 2: Non-probability sampling. 16. A military o”cer was interested in reducing the number of casualties sustained in aerial battle. His population of interest was all planes under his charge. He tasked his men to examine the planes that returned from the war front, and then take note of which parts of the planes sustained ammunition damage. He collated all the data and presented it on a single blueprint of the plane, as shown below (the dots denote where ammunition damage occurred): imperfect sampling frame The o”cer then concludes: “Based on my sample data, I propose to fortify the plane armour for regions where ammunition damage was concentrated (using the above blueprint as a guide), so as to help these planes survive better.” Would you agree with his assessment and why? (A) Yes. The sample collected came from a good sampling frame. (B) No. The sample collected came from an imperfect sampling frame. (C) Yes. The sample size is big enough. (D) No. The sample size is too small. 17. If a sampling frame is the target population, it will not lead to a loss in the general- isability of the results from the sample to the population. Which of the following can be used to fill the blank appropriately? Select all that apply. (A) equal to Note that a sampling frame should cover be equal / larger than the target population to achieve good (B) smaller than coverage. (C) larger than 18. Tom selected 4 samples of 20 integers from the population {1, 2,... , 100} using 4 di!erent methods. They are 1. simple random sampling (SRS). 30 Chapter 1. Getting Data 2. stratified sampling: the population was divided into the 10 strata {1, 2,... , 10}, {11, 12,... , 20},..., {91, 92,... , 100}; and a SRS of 2 numbers was drawn from each of the 10 strata. 3. cluster sampling: the population was divided into 20 clusters {1, 2, 3, 4, 5}, {6, 7, 8, 9, 10},..., {96, 97, 98, 99, 100}; and a SRS of 4 of these clusters was selected. 4. systematic sampling: a random starting point between 1 to 5 was selected; and every 5th unit thereafter was selected too. He created dot plots for exactly 3 of the samples generated. Identify the sampling method depicted by each of the following plots. (11) CherRandoms Sampling method depicted by Figure 1: Sampling method depicted by Figure 2: Sampling method depicted by Figure 3: 19. A study was conducted to find out the marital status of students who had graduated from univer- sity. The population of interest was: University XYZ students who had graduated in 2019. The study questionnaire was sent to all University XYZ 2019 graduates. 20% of the said graduates responded to the survey. Among those who responded, 30% are married, 65% are single and 5% are divorced/widowed. Are the above results likely to be a good reflection of the actual marital status of all XYZ graduates in 2019? (A) Yes, because the questionnaire was sent to the entire population of interest. low response of 20 % (B) Yes, because there is good representation of every marital status in the results. not representing entire (C) No, because the sampling method is non-probability. conducted a census population (D) No, because only 20% of the said graduates responded to the survey. 20. A researcher wishes to estimate the average IQ of Primary 4 students studying in government schools in Singapore. He carries out the following procedures to arrive at his estimate. He collates a list of all government Primary Schools in Singapore located within a 5km radius of where he stays, since it makes traveling to the schools easier for him. From the collated list, he contacts the principal of each school and asks for permission to conduct the IQ test on 50 Primary 4 students selected from that school via simple random sampling. All the contacted principals were able to obtain consent from all the parents of the selected students to conduct the IQ test. Exercise 1 31 He conducts the IQ test for all the selected students and then proceeds to calculate the average IQ. You may assume that the following statements are true: He has not made any mistakes in the marking of the IQ test. He has not made any mistakes in the calculation of the average IQ. The selected students attempted the test to the best of their ability. Based on the above description, which of the following statements must be true? X The researcher has employed cluster sampling in the selection of students. (A) (B) There is no selection bias present in the study. Does not cover all schools in Singapore (C) X The calculation of the average IQ is likely to be a good estimate of the average IQ of Primary 4 students studying in government schools. (D) There is no non-response bias in his study. 21. Select the correct word from the list for the respective blank in the sentence. coefficient of variation mean “The (1) is used to quantify the degree of spread relative to the (2) and is a useful statistic for comparing the degree of variation across di!erent variables within a data set.” List: Coe!cient of variation, interquartile range, standard deviation, mean, median. coefficient of variation SX where * = = , 0 22. Let x1 , x2 ,... , xn be values of a numerical variable x within a data set containing n points. Which of the following statements are definitely true with regards to the standard deviation? Select all that apply. (x , x(2 + (x2 (2 + (xn E)2 0 = (- () + (xz z) +... (Xn E)" - - -... = + - - n 1= (A) If the standard deviation of x is 0, then xi = x for all i ranging from 1 through n. (B) If the standard deviation of x is 0, then xi = 0 for all i ranging from 1 through n. (C) If xi = c, for all i ranging from 1 through n, where c is a constant, then the standard deviation of x is 0. (D) If the mean of x is 0 in the data set, then the standard deviation of x is also 0 in the data set. 23. A telecommunication company is interested in understanding how many mobile phones people own. Their population of interest is all 2000 people in town X. They took a random sample of 100 people from town X. Assuming there is 100% response rate, which of the following statements is/are correct? If all 100 numbers are greater than1 equal to 2 the mean , will also be greater than/ equal to 2 Opposite is not always true. · (I) If among the 100 people sampled, every person has 2 or more mobile phones, then the mean number of mobile phones in the sample will be greater than or equal to 2. (II) If the mean number of mobile phones in this sample is greater than or equal to 2, then everyone among the 100 people sampled has 2 or more mobile phones. (A) Only (I). A large mean can result from some people (B) Only (II). having very large values while others have , low values. (C) Both (I) and (II). (D) Neither (I) nor (II). 24. An examination was given to Class A and Class B, which consisted of 20 students each. The score of each student is between 0 and 100. The range of scores in Class A is from 70 to 90. All the students in Class B scored less than 40 marks. Due to manpower shortages, Class A and Class B were combined to form Class C. Hence Class C now contains 40 students, who were previously from Class A and Class B. Which of the following statements about the relationship between the mean score in Class C and the mean score in Class A is always true? The mean for Class A is strictly higher than that of Class B The overall. mean of class A + mean of Class B divided by 2 which will be strictly less than the mean of Class A. 32 Chapter 1. Getting Data (A) The mean score in Class C must be lower than the mean score in Class A. (B) The mean score in Class C must be the same as the mean score in Class A. (C) The mean score in Class C must be higher than the mean score in Class A. (D) There is insu”cient information to deduce the relationship between the mean score of Class C and the mean score of Class A. 25. Consider the following numerical values: 14, 15, 18, 20, 24, 29, 33, 34, x, where x is unknown and x may not necessarily be greater than or equal to 34. Which of the following statements is/are necessarily true? Select all that apply. (A) Regardless of the value of x, the median can never be higher than 24. (B) If the median of the values is less than 24, the mode cannot be 24. (C) The range cannot be 24. 26. Which of the following statements is/are always true, for any given data set? (I) The first quartile, Q1 , is less than the third quartile, Q3. Q1 = Q3 if the middle 50% have the (II) The standard deviation is greater than 0. same value such as the data set 21 , 2, 2, 2, 2 , 33 Standard deviation (A) Only (I).. can be O if all data points are the (B) Only (II). same. (C) Both (I) and (II). (D) Neither (I) nor (II). 27. The 2018 study “Tea Consumption and Longitudinal Change in High-Density Lipoprotein Choles- terol Concentration in Chinese Adults” published in the Journal of the American Heart Association found that tea appears to slow the natural decrease in heart-helping HDL cholesterol as a person ages. A total of 80182 participants were asked to report on their baseline characteristics including the following: Sex (Male / Female), Physical activity level (Inactive / Moderately active / Active), - Income level (“1000” per month), and Body Mass Index (kg/m2 ). - Which of the following statements is true about the type of variables of Sex, Physical activity level, Income level and Body Mass Index, respectively? (A) Categorical ordinal, Categorical nominal, Categorical ordinal, Categorical ordinal. (B) Categorical ordinal, Categorical ordinal, Categorical nominal, Numerical. BMI is the division of 2 numerical variables , it (C) Categorical nominal, Categorical ordinal, Categorical ordinal, Numerical. is also numerical (D) Categorical nominal, Categorical nominal, Categorical ordinal, Numerical. 28. An instructor of a new industrial attachment module decided to conduct a quick post-course survey to gauge its reception among students. The information collected included the students’ sex (“1” for male, “2” for female), days attended (how many days each student attended the course), satisfaction level (“1” for not satisfied, “2” for neutral, “3” for very satisfied), and each student’s score on the final exam of the pre-requisite module (out of 100). The results are shown in the following table. Name Sex Days attended Satisfaction Exam marks Peter 1 2 1 73 Paul 1 2 1 77 Mary 2 3 2 84 Josephine 2 3 2 89 Frank 1 4 3 93 nominal numerical ordinal numerical Exercise 1 33 Which of the following is true about the type of variables of Sex, Days attended, Satisfaction, and Exam marks, respectively? (A) Categorical ordinal, Numerical, Categorical ordinal, Numerical. (B) Categorical ordinal, Categorical ordinal, Categorical nominal, Numerical. (C) Categorical nominal, Numerical, Categorical ordinal, Numerical. (D) Categorical nominal, Categorical nominal, Categorical ordinal, Numerical. 29. Consider the following 9 whole numbers: 14, 15, 15, 18, 20, 21, 24, 25, 26. There is a 10th unknown whole number, x, which belongs to the list of numbers above. We know that the interquartile range of these ten numbers is 9. Which of the following is/are possible values of x? Select all that apply. Q 15 Q3 24 -IQR 9 = , = = (A) 10 Q = 15 Q3 , = 25 - IQR 10 = (B) 15 (C) 20 (D) 25 30. A teacher wanted to know if there was any trend in terms of how his class scored for two di!erent in-class quizzes within a particular semester. For all 10 students in his class, he obtained their scores for the two di!erent quizzes, which we will refer to henceforth as Quiz A and Quiz B. The results from both quizzes are represented in the following dot plots, as follows: more spread than Quiz A From the information given, which of the following statements is/are true? (I) The coe”cient of variation of Quiz A scores is greater than the coe”cient of variation of Quiz B scores. X Students who scored lower marks for Quiz A might have scored higher marks than Quiz B. (II) For each student let x be the Quiz A score and let y denote the Quiz B score. The correlation coe”cient between the two scores can be negative. of B mean of A < mean (A) (I) only. (3 6). (3 1). (B) (II) only. (C) Both (I) and (II). Coefficient of SD coefficient of E => (D) Neither (I) nor (II). variation mean variation A will be lower than coefficient of variation B. Exercise 2 59 Exercise 2 1. On 19 June 2021, The Straits Times published the figure below, taken from a population census of Singapore. Each household may only belong to a single category. 100% (13 3 + 2 + 10 2 + 10 6 + 10 4) % -.... 100% - (13 3. + 2 + 10 2. + 10 6. + 10. 4) % = 53 5 %. = 53 5 %. : majority of households are earning $6999 > in 2020. 100 % - (10. 5+ 3 5+ 15 2.. + 16. 2+ 14. 1)% = 40 5%. : rate (Income > $6999 2020) > rate (Income > $6999 2010) What can be said about the resident households, earning more than 6,999 from work? From the following statements, select all that apply. (A) A majority of resident households are earning more than 6,999 from work in 2020. (B) A larger proportion of resident households are earning more than 6,999 from work in 2020, as compared to 2010. (C) rate(Income > 6,999 | 2020) > rate(Income > 6,999 | 2010). Here “Income” represents Household monthly income from work. (D) rate(Income > 6,999 | 2020) < rate(Income > 6,999 | 2010). Here “Income” represents Household monthly income from work. - 2. A researcher collected data on his study subjects. Unfortunately, he spilled co!ee on his table of values, resulting in some missing information. Based on the remaining information in the table, calculate rate(Male) in the study. X Y Row total Female 300 100 408 Male 50 50 100 Column total 350 150 500 100 (A) rate(Male) = 0.2. rate (Male = 508 (B) rate(Male) = 1. = 0 2. (C) rate(Male) = 0.7. (D) rate(Male) = 0.1. - 3. For the year 2020, the marginal death rate of country A is greater than the death rate among the females of country A, or in other words, rate(Death) > rate(Death | Female). Which of the following statements must be true in country A for the year 2020? Basic rule (A) rate(Male) < rate(Male | Death). rate (Death Male) < rate (Death) > rate (Death Female (B) rate(Male) > rate(Male | Death). Symmetry rule (C) rate(Male) = rate(Male | Death). rate (Male Death) > rate (Male No death) Basic rule rate (Male Death) > rate (Male) > rate (male No death) 60 Chapter 2. Categorical Data Analysis 4. Categorical variables A and B are associated with each other. This means that rate(A | B) →= rate(A | not B). Based on the given information, which of the following statements is/are always true? (I) rate(B | A) →= rate(B | not A). ~ symmetry rule (II) rate(B | A) = rate(A | B). (III) rate(A | B) →= rate(not A | B). (A) Only (II). (B) Only (II) and (III). (C) Only (I). (D) All the statements are true. - 5. The following data is coming from a survey done on the e!ectiveness of the coaching sessions on job hunting for fresh graduates. In this survey, three questions were asked of the participants: Q1: How much is their salary or if they are unemployed? Q2: If they have received coaching? (Answer YES or NO.) Q3: If their job was a continuation of their internship? (Answer YES or NO.) Statement I The table below summarises the answers received for the three questions. rate($4000 Coaching) 8 + 28 Q2: NO Q2: YES = 0 15 Monthly Salary Q3: YES Q3: NO Q3: YES Q3: NO Total 8 +28 + 0 + 7. + 13 + 48 > $4000 26 85 8 28 147 Q1: rate(>$4000Taching < $3000 5 31 0 7 43 $3000 ↑ $4000 41 181 13 48 283 26+ 85 Unemployed 604 141 745 26 + 85 + 5 + 31 + 41 + 181 + 684 = 0. 11 Based on the data provided above, which of the following statements is/are correct? The 2 rates are (I) There is an association between receiving coaching and having a salary above 4000. X different: there (II) There is an association between receiving coaching and landing a job in continuation of an is an association. internship. Statement 8 + 0 + 13 rate(Etern Coaching) = = 0 086. (A) Only (I). 8 + 0 + 13 + 28 + 7 + 48 + 141 (B) Only (II). rate (hernaching) = 26 + 5 + 41 = 0 074. (C) Neither (I) nor (II). 26+ 5 + 41 + 85 + 31 + 181 + 604 (D) Both (I) and (II). - The 2 rates are different , there is an association. 6. The contingency table below shows the classification of hair descriptions of students studying in an international school in Singapore. Hair type Straight Curly Hair colour Male Female Male Female Total Red 7 9 8 5 29 Brown 35 20 12 16 83 Blonde 51 55 38 27 171 Black 22 25 19 24 90 Total 115 109 77 72 373 9 + 20 + 55 + 5 + 16 + 27 x0 % rate (Curly) = 7772x100 % rate (non-Black and Female) = 373 Exercise 2 61 9 The marginal rate, rate(Curly), is 95 %; while the joint rate, rate(non-Black and Female). is 35 39. %. Give each answer as a percentage correct to 2 decimal places. 7. A group of market researchers were commissioned to investigate the relationship between two food delivery companies (Grabfood and Foodpanda) and their punctuality of deliveries (whether they are on time or late). The following chart was used by the researchers to aid in the presentation of their findings. rate (Late Grabfood) = 1030-rate (Late Foodpanda Which of the following statements is/are true based on the information given above? Select all that apply. (A) Grabfood is positively associated with being on time for food deliveries. (B) Grabfood is positively associated with being late for food deliveries. (C) Foodpanda is positively associated with being on time for food deliveries. (D) Foodpanda is positively associated with being late for food deliveries. - 8. A newspaper article had a headline “30% of local university students admitted last year graduated from a polytechnic”. Assume there are only 2 universities (Uni A and Uni B). In Uni A, 50% of its local students admitted last year graduated from a polytechnic. In Uni B, the percentage of its local students admitted last year who graduated from a polytechnic must be By basic rule on rates since the overall rate is 30 % and , (A) more than 50%.. the rate at Uni. A is 50%, the rate at Uni B must be. (B) 40%. less than 30 % (C) between 30% and 50%. (D) less than 30%. - 9. By “elderly”, we mean a person who is more than 65 years old. In Singapore, the percentage of elderlies among women is higher than the percentage of elderlies among men. Which of the following statements must be true? (I) In Singapore, the percentage of women among elderlies is higher than the percentage of women among the non-elderlies. cannot determine without information (II) In Singapore, the percentage of women is higher than the percentage of men among elderlies. rate (Elderlies Women) < rate (Elderlies men) (A) Only (I). - womena elderlies are positively associated rate (Women Elderlies) > rate (Women Non-elderlies) 62 Chapter 2. Categorical Data Analysis (B) Only (II). (C) Both (I) and (II). (D) Neither (I) nor (II). - 10. Su is investigating the association between blood pressure and workaholism in a certain population. Someone who works more than 75 hours per week is considered a workaholic. The income level and blood pressure (high or normal) for each subject and whether or not they are classified as workaholic is recorded and summarised in the table below. Here “HBP” denotes “high blood pressure” while “NBP” denotes “normal blood pressure”. rate (HBP workaholic Income Group =25 X100 % Low Middle High HBP NBP HBP NBP HBP NBP Low Total : rate (HBP) Workaholic 25 75 23 87 26 134 100 25+ 25 = x 100 % Non-workaholic 25 80 18 72 9 51 105 100 + 105 Consider the “Low” income level group. rate(HBP | Workaholic) is (1) 25 % while rate(HBP) is (2) 24 %. Fill in the blanks in the statement above, giving your answers as percentages correct to the nearest whole numbers. - 11. The Lord of the Rings: The Fellowship of the Ring was released in December 2001. Suppose that (I) Among the people in Singapore who were born before 2000, 10% watched the film. (II) Among the people in Singapore who were born during or after 2000, 20% watched the film. Choose the best option below. Among all the people in Singapore, the percentage who watched the film. 10 % & 20 % are the respective rates in the (groups , (A) must be 15%. the overall rate is in between 10 % -20 % (B) must be between 10% and 20%. (C) can be less than 10%. (D) can be more than 20%. - 12. Darren is planning a surprise Bubble Tea Party for his class of 30 students during the last tutorial of GEA1000. Each student chooses either milk tea or fruit tea (but not both). The following information is what he has gathered about his tutorial class: Of the 30 students, 40% are males. 60% of the students who drink milk tea at this party are males. 70% of the students who drink fruit tea at this party are females. Which of the following statements can be concluded from the above information about bubble tea consumption in Darren’s tutorial class? (I) There is positive association between being male and drinking milk tea. rate (Male) = 40 % (II) The majority of the 30 students are fruit tea drinkers. rate (Female Fruit Teal 70 % = rate (Male Fruit Teal = 30% (A) Only (I). (B) Only (II). Fate(male rate (Male Fruit tea) = milktea) Positivenation (C) Both (I) and (II). 60% 30 % (D) Neither (I) nor (II). Exercise 2 63 13. A researcher was studying the e!ects of Paxlovid for COVID-19 treatment on a group of patients. The researcher took notes on whether the treatment was successful or not (Success/Failure), and the gender of each patient (Male/Female). Which of the following statements is/are correct? proportion of males that had successful treatments (I) rate(Success | Male) refers to the proportion of successful treatments that are males. (II) rate(Failure & Male) refers to the proportion of males that had a failed treatment. (III) rate(Female) refers to the proportion of females among the patients. ~ proportion of patients that were (A) Only (III). male and had failed treatment (B) Only (I) and (II). (C) Only (II) and (III). (D) All three statements are correct. - 14. Suppose that 70% of the male graduates in 2018 from university GER got married by 2022. In addition, 5000 of all 6250 graduates in 2018 from university GER got married by 2022. The percentage of female graduates in 2018 from university GER who got married by 2022 is 80% (A) lower than 70%. (B) between 70% and 80%. (C) higher than 80%. (D) impossible to determine from the information given. - 15. An incomplete breakdown of students in University S by its four faculties and by sex is as follows: Male Female Row Total Engineering 700 Arts 250 400 &650 Architecture 50 50 100 Science 550 Column Total 1200 800 2000 Based on the above, which of the following deductions must be true? Select all that apply. 650/2000 0 325 =. 250/1200 0 208400/800 0 5 =. =. (A) rate(Arts) is closer to rate(Arts | Male) than to rate (Arts | Female). (B) There is a negative association between Arts and Male. rate (Arts Male) < rate (Arts Female) (C) rate(Science) is closer to rate(Science | Female) than to rate (Science | Male). (D) There is no association between Architecture and Female. - 16. A researcher conducted a study to investigate if the usage of ChatGPT is associated with the passing of an exam. He found the rate(pass | usage of ChatGPT) to be equal to 0.4 and rate(fail | no usage of ChatGPT) to be equal to 0.3. Which of the following statements is/are true? Select all that apply. (A) There is insu”cient information to make a conclusion about the association between the two variables. rate(pass use) 0 4 rate (email no response) , symmetry rule. (C) 50% of those who were asked to do email surveys responded. rate(response email) may not equal to rate(email response) (D) The rate of response among email surveys is greater than the overall rate of response. rate(r (response) > rate /response no email esponse email)> rate 19. Consider the following partial contingency table that gives the breakdown of students by gender in department A and department B of a local university. We are told that there is no association between gender and department. The total number of students in both departments is 300. G Male Female Row Total A 30 60 90 B 140 70210 - 20. The table below provides the number of all the teachers employed in the di!erent institutions in 2021 in Singapore, categorised by age and sex. (1) male = zo4A-O 513 * 100 % Exercise 2 65 2228 + 1494 + 1166 (2) Female % 1166x100 = 68 + 1030 + 2341 + 2181 + 2495 + 2228 + 1494 + Using only the information in the table, fill in the blanks below (giving your answers to 2 decimal -m places). Among all the teachers aged 30-39, (1) % of them were male secondary school teachers. In addition, (2) % of female primary school teachers were aged 45 years old and above. 21. Suppose that in a population, it is known that within males and within females, smoking and binge drinking are positively associated. However, Simpson’s Paradox is observed when the male and female subgroups are combined. From the statements below, select all that are true. smoking and binge drinking are neg. (A) Overall rate(Binge drinker | Smoker) ↓ overall rate(Binge drinker | Non-smoker). associated / not (B) Overall rate(Smoker | Binge drinker) > overall rate(Smoker | Non-binge drinker). associated. (C) Overall rate(Smoker | Binge drinker) ↓ overall rate(Smoker | Non-binge drinker). smoking and not binge are positively (D) Overall rate(Non-binge drinker | Smoker) ↔ overall rate(Non-binge drinker | Non-smoker). associated 22. A researcher wants to find out if drinking tea helps to reduce memory loss. He interviewed 100 elderly citizens from an Elder Care Center and inquired if they were tea drinkers. 60 of them were observational classified as tea drinkers, while the remaining 40 were not. He then asked them to play a specific study memory game to test their memory. The researcher also noted that a potential confounding variable was “gender”. To control for this potential confounder (gender), the researcher could perform (A) double blinding. (does not control confounder gender as (B) random assignment. (only for experimental studies (C) slicing of the data. 23. The table below shows male and female patients undergoing two treatment types, X or Y. The outcome of the treatment is designated as either successful or unsuccessful. The success rates of the respective treatments across genders are also calculated. Simpson's Paradox is not Male Female observed when the subgroups Patients Succ. # Suc. Rate Patients Succ. # Succ. Rate are combined. X ? 88 ? 40 50% 40 32 80% % 38 Y ? 20 ? 10 ? 50 ? ? 60% Total 100 50 50% ? ? ? 60 % -80% Unfortunately, some of the data is missing. We know that all missing values are non-zero. Which of the following statements must be true? (I) Simpson’s Paradox is observed when the subgroups of Treatment X and Treatment Y are combined, when considering the relationship between gender and outcome. (II) Treatment type is a confounder between the variables gender and outcome. 66 Chapter 2. Categorical Data Analysis (A) Only (I). (B) Only (II). rate(x male) = 80 = rate(X Female (C) Neither (I) nor (II). associated gender not necessarila treatmenttype : is not with , (D) Both (I) and (II). - 24. A tuition agency is interested to see whether teaching English via a new online platform is more e!ective compared to their current teaching methods for Primary 3 students. Every child has a choice whether he/she wants to enrol in the class that teaches using the online platform (denoted as Class A) or in the class without the online platform (denoted as Class B) and parental consent is also obtained. At the end, all students are given an English test and are awarded an “S” grade if they pass the test. The table below provides some of the information. Class A Class B Number S-grade Rate (%) Number S-grade Rate (%) Males 200 40 20 100 10 10 Females w x y z You are given the following rates (for males) as shown in the table above: 40 rate(S-grade | Class A) = 200 = 20%. 10 rate(S-grade | Class B) = 100 = 10%. By considering similar rates for females, which of the following are possible values of w, x, y, z such that Simpson’s Paradox will be observed in the above table when combining males and females together within Class A and Class B? & (A) w = 100, x = 20, y = 80, z = 8. Simpson's Paraa tryalloptions (B) w = 200, x = 140, y = 100, z = 20. to see where (C) w = 80, x = 30, y = 200, z = 50. (D) w = 50, x = 40, y = 200, z = 150. - 25. A study was conducted to understand the relationship between a patient’s age and having cardio- vascular disease (CVD). The information on the variables “Age” (Young/Old) and “CVD” (Has CVD/No CVD) was collected in the table below. young patients are positively associated with having CVD Young Old Total rate(has CVD Young) = 11005. Has CVD 100 50 150 No CVD 100 200 300 rate (has CVD old) Total 200 250 450 Furthermore, it is known that a third variable, “Smoking”, is associated with “CVD”. Using only the information given, which of the following statements must be true? Select all that apply. (A) Young patients are positively associated with having CVD. (B) “Smoking” is a confounder when examining the association between “Age” and “CVD”. (C) “CVD” is a confounder when examining the association between “Age” and “Smoking”. - 26. A study was conducted to determine if treatment types (A and B) were associated with how successful they were in curing disease X. The age of the subjects was also recorded as it is a possible confounder. Each subject, depending on his/her age was classified as either “Old” or “Young”. The table below shows the result of the study. Exercise 2 67 rate (success old)=0 = 0. 25 Age Treatment A Number Success Treatment B Number Success rate (success young) = = 0 25. Old 120 25 280 75 Young 190 45 250 65 Total 310 70 530 140 Which of the statements below is correct? (A) Age is a confounder between treatment types and how successful they are. Simpson’s Paradox is observed in this study. (B) Age is a confounder between treatment types and how successful they are. Simpson’s Paradox is not observed in this study. (C) Age is not a confounder between treatment types and how successful they are. (D) More information is needed before determining if age is a confounder between treatment types and how successful they are. - 27. Su is investigating the association between blood pressure and “workaholism” in a certain popula- tion. Someone who works more than 75 hours per week is considered a workaholic. The income level and blood pressure (high or normal) for each subject and whether or not they are classified as “workaholic” are recorded and summarised in the table below. Here “HBP” denotes “high blood pressure” while “NBP” denotes “normal blood pressure”. rate (HBP workaholic For "Low" , Middle' 'High" levels , 25 + 23 +26 = 0 2 = (HBP workaholic) Income Group. rate > 100 + 110 + 160 Low Middle High rate (HBP Non-workaholic HBP NBP HBP NBP HBP NBP rate (HBP Non-workaholic) 25 + 18+ 9 For "low' levels Workaholic 25 75 23 87 26 134 = 0 204 =. Non-workaholic 25 80 18 72 9 51 105 + 90+ 60 25 25 : 25 + 75 25 + 80 Which of the following statements is true? (A) We have an instance of Simpson’s Paradox for this data set, when considering the association between being a “workaholic” and having “high blood pressure”, first for individual income levels (“Low”, “Middle”, “High”) and then overall. (B) We do not have an instance of Simpson’s Paradox for this data set, when considering the association between being a “workaholic” and having “high blood pressure”, first for individual income levels (“Low”, “Middle”, “High”) and then overall. (C) We are not able to determine if we have an instance of Simpson’s Paradox for this data set (or not), when considering the association between being a “workaholic” and having “high blood pressure”, first for individual income levels (“Low”, “Middle”, “High”) and then overall. There is insu”cient information given. - 28. In NUS, the rate of co!ee drinking among female students is 60% and the rate of co!ee drinking among male students is also 60%. It was found that the rate of co!ee drinking among scholarship students is 90%. Which of the following statements must be true? (I) Co!ee drinking is positively associated with scholarship students. (II) When considering the association between co!ee drinking/non-co!ee drinking and scholarship students/non-scholarship students, Simpson’s Paradox is not observed when the male and female subgroups are combined. rate (drink coffee female) rate (drink coffee male) = = 60 % (A) Only (I). : rate (drink coffee) = 60 % =basic rates (gendernas rate (drink coffee) 60 % r > r2. (C) r > r1 > r2. (D) r < r1 < r2. - 20. A researcher is interested in the correlation between the amount of time an individual spends on social media and the individual’s level of happiness. Suppose that she observed that the correlation coe!cient r1 for males only is 0.8, and that the correlation coe!cient r2 for females only is also 0.8. Which of the following statements must be true for r, the correlation coe!cient when the data for males and females are combined? (A) 0 ↑ r ↑ 0.8. (B) r = 0.8. (C) 0.8 < r ↑ 1. (D) None of the other given options is correct. - 21. Based on the scatter plot shown below, which of the following is closest to the equation for the regression line? Here, W is the weight of the car and C is the consumption. 110 Chapter 3. Dealing with Numerical Data (A) W = 3 → 0.1C. (B) W = 5 → 0.1C. (C) W = 3 + 0.8C. (D) W = 5 + 0.8C. - 22. Which of the following is/are true about a non-zero correlation coe!cient? Select all that apply. (A) The correlation coe!cient does not change when we add 5 to all the values of one variable. (B) The correlation coe!cient is positive when the slope of the regression line is positive. (C) The correlation coe!cient does not change when we multiply all the values of one variable by 2. (D) A correlation of →0.3 is stronger than a correlation of →0.8. - 23. The relationship between the number of glasses of beer consumed daily (x) and blood alcohol content in percentage (y) was studied in young adults. The equation of the regression line is y = →0.015 + 0.02x for 1 ↑ x ↑ 10. The legal limit to drive in Singapore is having a blood alcohol content below 0.08%. Des, a young adult, had just finished 5 glasses of beer. After that, he wanted to take his car out for a drive. Is it legal for him to drive in Singapore? (A) Yes. (B) No. (C) Unable to determine. - 24. Three father-son pairs had their heights measured. The following table shows their heights: Pair Father (inches) Son (inches) A 68 72 B 70 71 C 66 70 Exercise 3 111 Using these three data points, the standard deviation for the fathers would be 2 and for the sons it would be 1. From the table, what is the standard unit for the son from pair A? (A) →1. (B) 0. (C) 1. (D) 1.88. - 25. Suppose that there are 40 male students in a class and each student scored 5 less marks for his maths test than what he scored for his science test. What can we say about their maths and science test marks? Select all that apply. (A) The interquartile range of science test marks is higher than that for maths test marks. (B) If student A scored a higher mark for the maths test than student B, then he must have scored a higher mark than student B for the science test. (C) The science test marks and maths test marks are perfectly negatively correlated. (D) The standard deviation of maths test marks is equal to that of science test marks. - Sx 1 =. 5 , Sy = 2 2 26. The regression line for Y vs X is given by Y = 0.82X + 59.1. The standard deviations for X and. m = 0 82. Y are 1.5 and 2.2 respectively. Suppose now we construct a regression line that uses Y to predict b 0 559(5) 0 38 r 0 82(') X. v m(SY) = =. r-value doesn't change =. =. 0 559 =. The predicted average increase of X when Y is increased by 1 unit is 0 38. (Give your. - answer correct to 2 decimal places.) by + C = 27. A professor wants to know the percentage of right-handed students in NUS. Since he is teaching a course in NUS this semester, he decides to do a survey in his class. From the single survey, he concluded that eighty percent of students in NUS are right-handed. Which one of the following fallacies was committed by the professor? (A) Atomistic fallacy. (B) Ecological fallacy. (C) None of the other options. - 28. The total number of people who are infected by a disease (denoted by y) can be predicted using the regression model y = 2x+1 → 1, where x is the number of days from the first infection, up till the 30th day. Based on the information above, which of the following is true? (A) After 3 days from the first infection, there will be exactly 15 people infected. (B) If there were 7 people infected, it means that exactly 2 days have passed from the first infection. (C) After exactly 20 days, there will be approximately less than 2 million people infected. (D) The relationship can be modelled as a simple linear regression Y = mX + c, where Y = y, X = 2x , m = 2, and c = →1. - 29. Bivariate numerical data can be represented in the form (x, y). Which of these 4 data sets, af- ter having added an additional data point (2, 8), would have the magnitude of their correlation coe!cient decrease as a result? Select all that apply. (A) (2, 2), (8, 2), (8, 8) (B) (2, 2), (4, 5), (6, 2) (C) (2, 2), (5, 5), (8, 8) (D) (2, 8), (5, 5), (8, 2) 112 Chapter 3. Dealing with Numerical Data 30. ”The relation between anxiety and BMI - is it all in our curves?” was published in the journal Psychiatry Research in 2016. As stated in the abstract of that research paper, ”The relation between anxiety and excessive weight is unclear. The aims of the present study were three-fold: First, we examined the association between anxiety and Body Mass Index (BMI). Second, we examined this association separately for female and male participants...” r= 0 The first result reported was: No linear correlation between anxiety scores and BMI among all the participants was observed. If the researchers had not proceeded to investigate the associa- tion between anxiety scores and BMI separately for female and male participants, but concluded straightaway from their first result that ”there is no linear correlation between anxiety scores and BMI among the females and among the males separately’, what mistake would they have commit- ted? ecological correlation to wake assumption about individual corr. individual correlation to make assumption about ecological corr. (A) Ecological fallacy (B) Atomistic fallacy (C) Confusing correlation and causation (D) None of the other options is correct 142 Chapter 4. Statistical Inference Exercise 4 1. A researcher developed a new test to detect COVID-19 in humans and the test has a specificity of 0.90. He administers the test in a town of 100,000 people, of whom 1% have COVID-19, as indicated in the contingency table below. Positive Negative Row total COVID-19 1000 No COVID-19 99000 Column total 100000 What can be said about the sensitivity of the test, assuming that the researcher obtained 1 rate(COVID-19 | Negative) = 298 for his test? (A) The sensitivity is less than 80%. (B) The sensitivity is more than 80%. (C) The sensitivity is equal to 80%. - 2. A player rolls a fair six-sided die twice. You can assume the rolls are independent. We define the following events: I 23456 A: The first roll shows numbers 1 or 2. I 2 3456 7 23 7 B: The second roll shows numbers 5 or 6. C: The sum of the two rolls is less than or equal to 7. z 7 Consider the following statements: P (BC) = P(2ndrow 50r6 sum (7) X P (B | C) > P (B). (I) = 57 X P (A and C) = P (A) → P (C). (II) P(a) =, P(B) = 5 Which of the statements above must be true? 15+ 6 2 (A) Only (I). P(C) = = 36 36 (B) Only (II). (C) Neither (I) nor (II). (D) Both (I) and (II). - x 3. A game is played using a fair six-sided die, a pawn and a simple board as shown below. (A pawn is a chess piece.) S 1 2 3 4 5 E Initially, the pawn is placed on square S. The game is played by throwing the die and moving the pawn back and forth in the following manner: S 1 2 3 4 5 E 5 4 3 2 1 2 3..... Exercise 4 143 Thus, for example if the first and second throws of the die give a “5” and “4” respectively, the final position of the pawn will be on square “3”, because the first throw would send the pawn to square 5, and the second throw would then send the pawn from square “5” to square “3”. The game will stop only when the pawn stops at square “E” after a die roll, passing by “E” does not end the game. Let X denote the number of throws of the die required to move the pawn such that it stops at square “E”. Which of the following statements is/are true? 5 (I) P (X = 2) = 36. (II) The events X = 1 and X = 2 are mutually exclusive. (A) Only (I). (B) Only (II). (C) Neither (I) nor (II). (D) Both (I) and (II). - 4. There are 5 identical bags, except that 2 are coloured red and 3 are coloured blue. Each of the bags contains 4 identical balls, except that 3 are coloured yellow and 1 is coloured green. Let A be the event that a randomly selected bag is red, and B be the event that a ball randomly selected from the chosen bag is yellow. You are given that Ball mutally A B P P(ANB) (A and B) = 0.3. * Yellow exclusive Bag What can we say about the events A and B? Az red Green this an the circles * Yellow (I) X The two events are mutually exclusive. intersect blue (II) The two events are independent. BE Green (A) Only (I). P(A B) = PA (B) Only (II). 0 3 =. = (C) Neither (I) nor (II). 215(3/4) + 3/5(3/4) (D) Both (I) and (II). 5. Suppose A and B are events with probabilities P (A) = 0.4 and P (B) = 0.7. Which of the following statements is/are correct? (I) A and B can be mutually exclusive. (II) P (A and B) = 0.4 + 0.7. (A) Only (I). (B) Only (II). (C) Both (I) and (II). (D) Neither (I) nor (II). - 6. I have a fair 12-sided (dodecahedron) die with sides labelled 1, 2,... , 12 respectively. I also have a fair 6-sided die with sides labelled 1, 2,... , 6 respectively. I first roll the 12-sided die on the table, then roll the 6-sided die. Assume that the two die rolls are independent, what is the probability that the sum of the numbers appearing face up on the two dice is 11? 1 (A) 12. 5 (B) 36. 1 (C) 18. 144 Chapter 4. Statistical Inference 1 (D) 9. - 7. We wish to deploy a certain number of sensors around a particular area so as to detect intruders moving through the area. We may assume that the sensors function independently and each has probability 0.9 of detecting an intruder in the area. We would like to achieve at least 99.5% success rate of detecting an intruder using the sensors. What is the minimum number of sensors we need to deploy in order to achieve this target? (A) Two. (B) Three. (C) Four. (D) Target cannot be achieved. - 8. There is a new home test kit for HIV detection. This test kit is known to be 99% accurate, meaning that its sensitivity and specificity are both 99%. Which of the following statements must be correct? Base Rate Ve -ve + " Row Total R(HIV) 0 1 =. HIV 0. 991100) = 99 1 100 ~ (I) Among those with HIV, 1% will test negative using the kit. No HIV 9 0. 991900) 891 : 900 (II) Among those who test negative using the kit, 1% will have HIV. columnotal X 108 892 1000 (A) Only (I). (B) Only (II). (C) Both (I) and (II). (D) Neither (I) nor (II). - 9. Mr Tan is doing a study on a fair coin. He flips it twice and records the results on a paper. If the result is ‘Heads’, he writes down ‘H’. If the result is ‘Tails’, he writes down ‘T’. Mr Chan happened to see part of the paper and saw a ‘H’. Mr Chan then tells Mr Tan: “I guess that both your coin flips are Heads”. What is the probability that Mr Chan is correct? (A) 1/1 (B) 1/2 (C) 1/3 (D) 1/4 - 10. John plays in the final round of a rock-paper-scissors tournament against his opponent. The final round consists of 5 games of rock-paper-scissors. The first person to win 3 games wins the final round. John has already won 1 game and lost 2 games in the final round. The probability of winning each game is independent from one another. The probability of John winning a game is 0.4, and the probability that John wins the “best player award” is 0.4. Which of the following statements must be true? Select all that apply. (A) The probability that John wins the next game is greater than the probability that John wins the final round. (B) P(John wins the next game | John wins best player award) = P(John wins best player award | John wins the next game). (C) John winning the next game is independent of John winning the best player award. (D) John winning the next game and John winning the best player award are mutually exclusive. P(m) = 0 7. P(Rm) = 0 4. 0 4. R P(R) = P(rm)p(m) + P(RE) P(E) M NoR 3) P(F) = 0 3. P(RF) = 07. 0 7. 0 6. = 0. 4(0 7). + 0 7 (0.. 7 R - > 0 Exercise 4 49 => 145. = 0 #. 0 3. 0 3. NoR 11. In a cohort of final year undergraduate statistics students, 70% are male and 30% are female. Every student has the option of completing either a research project or a final year internship but not both. 40% of the males and 70% of the females decided to take the research project option. If I randomly pick a student from this cohort, what is the exact probability that the student picked -49 the option of a research project? 12. After coming across an old Channel News Asia article: ”Should women do National Service now? Societal cost will ’far outweigh’ benefits, says Ng Eng Hen.” published on 9 May 2022, Chase and Jenny started chatting about whether most Singaporeans will think that women should not be enlisted into National Service (as deduced from the article). Chase said that a randomly picked person from the Singaporean population who thinks that women should not be enlisted into National Service, is equally likely to be male or female. Jenny tried to represent what Chase said using a probability statement. Let A be the event that the person chosen thinks that women should not be enlisted into National Service, and B be the event that the person chosen is male. Which of the following statements correctly represents what Chase said? (A) P(B | A) = P(not B | A). (B) P(B | not A) = P( not B | not A). (C) P(A | B) = P( A | not B). (D) None of the other statements. - 13. A poll conducted before a local election gives a 95% confidence interval for the percentage of voters who support candidate X as (54%, 60%). Based on the same poll result, which of the following can potentially be a 99% confidence interval for the percentage of voters who support candidate X? (A) (56%, 58%). (B) (52%, 62%). (C) (54%, 60%). - 14. A researcher takes a random sample from Country X’s population to estimate its unemployment rate. From the sample, the researcher obtains the 95% confidence interval for the population