Data Collection & Analysis in Research

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

A marketing team wants to understand the effectiveness of two different ad campaigns. Which data collection method would be most suitable?

Relying on social media analytics alone to track brand mentions.
Conducting A/B testing by showing each ad to a random segment of their target audience. (correct)
Analyzing existing financial records to determine overall marketing spend.
Using government databases to find demographic information about potential customers.

A researcher aims to study the average income of adults in a city. Due to resource constraints, they cannot survey the entire population. What should the researcher do to get useable data?

Select a sample from the population and analyze only those individuals. (correct)
Analyze the entire population to ensure accuracy.
Rely on judgmental sampling only, selecting individuals they believe represent the average income.
Ignore the population create an imaginary cohort.

A marketing team wants to categorize customers based on their preferred social media platform (Facebook, Instagram, X). Which measurement scale is most appropriate for this classification?

Ordinal Scale
Nominal Scale (correct)
Interval Scale
Ratio Scale

In a study examining the effectiveness of a new drug, researchers measure patients' pain levels before and after treatment using a 1-10 scale. What type of variable is 'pain level' in this context?

Ordinal variable (C) Signup and view all the answers

A university wants to survey its alumni regarding their experiences after graduation. The alumni database contains contact information for all graduates. What does this database represent in the context of sampling?

A sampling frame, because it provides a listing from which a sample can be drawn. (B) Signup and view all the answers

A researcher is studying the relationship between advertising spending and sales revenue. What type of variable is 'advertising spending' when measured in dollars?

Ratio variable (C) Signup and view all the answers

A quality control team needs to inspect a batch of products. They decide to select every 20th item from the production line. Which sampling method are they employing?

Systematic Sampling (C) Signup and view all the answers

Which data collection method is most suitable for gathering in-depth information about customer experiences with a product, including their feelings and motivations?

Interviews (D) Signup and view all the answers

A company wants to gather feedback from its customers. They randomly select 50 customers from each of their three customer segments (high-value, medium-value, and low-value). Which sampling method is being used?

Stratified Sampling (B) Signup and view all the answers

A quality control manager counts the number of defective products in a manufacturing process. What type of variable is 'number of defective products'?

Discrete variable (D) Signup and view all the answers

A researcher wants to study the opinions of software engineers in different tech companies in a city. Instead of randomly selecting engineers from all companies, they randomly select five companies and survey all the engineers in those companies. Which sampling technique are they using?

Cluster Sampling (D) Signup and view all the answers

A city is conducting a survey about pedestrian safety. Surveyors stand on a busy street corner during rush hour and interview people as they walk by. What type of sampling method are they using?

Convenience Sampling (B) Signup and view all the answers

A company wants to understand how satisfied their employees are with their job. They ask employees to rate their satisfaction on a scale of 'very dissatisfied', 'dissatisfied', 'neutral', 'satisfied', and 'very satisfied'. Which measurement scale does this represent?

Ordinal Scale (D) Signup and view all the answers

Researchers are studying the effect of different teaching methods on student test scores. They randomly assign students to different groups, each receiving a different teaching method, and then compare the average test scores of each group. What data collection method are they using?

Experiments (C) Signup and view all the answers

A survey about political opinions includes the question, "Do you agree that the current government's policies are leading the country to ruin?" What type of survey error is most likely to occur because of this question?

Measurement error (A) Signup and view all the answers

A researcher measures the temperature of a room using the Celsius scale. Which measurement scale is being used?

Interval Scale (D) Signup and view all the answers

A researcher finds that in a dataset of customer satisfaction scores (1-5), the score '4' appears 75 times. What does this value represent?

The absolute frequency of the score '4' (B) Signup and view all the answers

In a dataset of 200 product ratings, a specific rating has an absolute frequency of 40. What is the percentage frequency of this rating?

20% (D) Signup and view all the answers

For what type of data is the 'mode' the MOST appropriate measure of central tendency?

Categorical Data (B) Signup and view all the answers

A real estate company wants to summarize typical home prices in a neighborhood. The prices range from $200,000 to $1,500,000, but many houses are clustered around $300,000 - $400,000, and a few luxury homes skew the data towards the higher end. Which measure of central tendency would BEST represent the typical home price?

Median (B) Signup and view all the answers

A teacher wants to analyze the scores of a recent test. The scores are normally distributed. Which measure of central tendency is BEST to use?

Mean (D) Signup and view all the answers

In descriptive statistics, what is the primary purpose of measures of variation?

To quantify the spread or dispersion of data points. (A) Signup and view all the answers

Which of the measures of central tendency can have multiple values in a single dataset?

Mode (A) Signup and view all the answers

A dataset concerning run times for a marathon had several outliers due to first-time runners. Which measure of central tendency would be least affected by these outliers?

The median (C) Signup and view all the answers

Which of the following statements accurately describes the relationship between variance and standard deviation?

Standard deviation is the square root of the variance. (A) Signup and view all the answers

A dataset has a maximum value of 105 and a minimum value of 20. What is the range of this dataset?

85 (C) Signup and view all the answers

In a dataset, a particular data point has a z-score of 3.2. According to the typical z-score outlier rule, what does this indicate?

The data point is an extreme outlier. (C) Signup and view all the answers

A distribution is described as having a 'long tail' to the right. What type of skewness does this indicate?

Positive skewness (B) Signup and view all the answers

If a distribution has positive kurtosis, what does this suggest about the tails of the distribution and the concentration of data points?

Heavy tails and a sharp peak, with more data points in the tails. (B) Signup and view all the answers

What information does the first quartile (Q1) provide about a dataset?

The median of the lower half of the dataset, separating the lowest 25% of the data. (B) Signup and view all the answers

Which Excel function is used to calculate the sample standard deviation?

<code>stdev.s</code> (A) Signup and view all the answers

In a dataset, Q1 is 12 and Q3 is 30. Calculate the upper bound for outlier detection.

57 (D) Signup and view all the answers

Which of the following statements is true regarding a right-skewed distribution as described by its quartiles?

The distance between Q2 and Q3 is greater than the distance between Q1 and Q2. (C) Signup and view all the answers

Which of the following is the correct Excel formula to calculate the Z-score of a data point in a dataset?

(point - mean) / standard deviation (C) Signup and view all the answers

Which of the following is NOT part of the five-number summary?

The mean (B) Signup and view all the answers

In a boxplot, what does the length of the 'box' itself represent?

The interquartile range (IQR). (B) Signup and view all the answers

If a dataset is symmetrically distributed, which of the following relationships between its quartiles is most likely to be observed?

The distance between Q1 and Q2 is approximately equal to the distance between Q2 and Q3. (D) Signup and view all the answers

In a boxplot, what does a longer left whisker and a median line closer to Q3 indicate about the distribution of the data?

The data is left-skewed. (C) Signup and view all the answers

Given a dataset where Q1 = 25, Median = 30 and Q3 = 42, what can you infer about the skewness of the distribution?

The distribution is right-skewed. (C) Signup and view all the answers

If a dataset is normally distributed, according to the Empirical Rule, approximately what percentage of data points will fall within two standard deviations of the mean?

95% (D) Signup and view all the answers

How does an increase in the IQR (Interquartile Range) affect the boxplot?

It lengthens the box. (D) Signup and view all the answers

Chebyshev’s Rule guarantees that at least what proportion of data will fall within $k$ standard deviations from the mean (where $k > 1$) for any distribution?

$1 - (1/k^2)$ (A) Signup and view all the answers

If a dataset has a lower bound of 5 and an upper bound of 95, what values would be considered outliers?

Any value less than 5 or greater than 95. (A) Signup and view all the answers

What does a negative covariance between two variables X and Y suggest?

As X increases, Y tends to decrease. (D) Signup and view all the answers

A correlation coefficient of 0.9 between two variables indicates what kind of relationship?

A strong positive linear relationship. (B) Signup and view all the answers

Flashcards

Frequency Distribution

A summary of how often different values occur in a dataset.

Absolute Frequency

The number of times a specific value appears in a dataset.