Untitled Quiz
48 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which of the following is the primary goal of descriptive statistics?

  • Describing the characteristics of a dataset without drawing broader inferences. (correct)
  • Making predictions about a larger group based on a smaller subset.
  • Developing new statistical theories and methods.
  • Determining the level of satisfaction of customers.

Inferential statistics is most useful in which of the following scenarios?

  • Describing the distribution of ages in a retirement home, using existing data.
  • Presenting data about the heights of all players on a basketball team.
  • Estimating the percentage of all voters who favor a particular candidate based on a poll. (correct)
  • Calculating the average score of students in a single class.

A researcher collects data on the fuel efficiency of 100 cars to estimate the average fuel efficiency of all cars in a country. What does the '100 cars' represent?

  • The parameter.
  • The statistic.
  • The population.
  • The sample. (correct)

Which of the following activities falls under the domain of descriptive statistics?

<p>Creating a histogram to visualize the distribution of test scores. (B)</p> Signup and view all the answers

A company wants to assess employee satisfaction. They randomly survey 200 employees out of their 2,000 employees. In this scenario, what constitutes the population?

<p>The 2,000 employees of the company. (C)</p> Signup and view all the answers

A quality control inspector randomly selects 50 items from an assembly line and measures their dimensions. They use this data to infer whether the entire production run meets the required specifications. This is an example of:

<p>Inferential statistics. (D)</p> Signup and view all the answers

What is the primary focus of mathematical statistics compared to applied statistics?

<p>Developing new statistical theories and methods. (C)</p> Signup and view all the answers

Which of the following statements best describes the relationship between a sample and a population?

<p>A sample is a subset of a population. (B)</p> Signup and view all the answers

What is a key disadvantage of using the range as a measure of dispersion?

<p>It only considers the extreme values and ignores the distribution in between. (D)</p> Signup and view all the answers

How does the interquartile range (IQR) mitigate the impact of outliers compared to the range?

<p>By focusing on the central 50% of the data, using quartile values. (C)</p> Signup and view all the answers

Which of the following statements is true regarding the variance and standard deviation?

<p>The standard deviation is the square root of the variance and represents the average distance from the mean. (A)</p> Signup and view all the answers

Why is the sample variance calculated using $n-1$ in the denominator instead of $n$?

<p>To correct for the bias introduced by estimating the population mean from the sample. (D)</p> Signup and view all the answers

A dataset has a mean of 50 and a standard deviation of 10. According to this information, which data point would have a Z-score fall within the range of -1 and 1?

<p>40 (C)</p> Signup and view all the answers

Which of the following is most affected by extreme values?

<p>Range (D)</p> Signup and view all the answers

Two datasets have the same mean. Dataset A has a standard deviation of 5, and Dataset B has a standard deviation of 10. What can be concluded from this information?

<p>Data points in Dataset A are, on average, closer to the mean than those in Dataset B. (D)</p> Signup and view all the answers

In a symmetric distribution, what is the relationship between the mean, median, and mode; and how does this affect the measures of dispersion?

<p>They are all equal, leading to a smaller standard deviation. (C)</p> Signup and view all the answers

A dataset has a mean of 50. If each value in the dataset is multiplied by 2 and then 5 is added to each of the new values, what is the new mean?

<p>105 (C)</p> Signup and view all the answers

Which of the following is most susceptible to outliers?

<p>Mean (B)</p> Signup and view all the answers

In a dataset, the sum of deviations from the mean is always:

<p>Zero. (B)</p> Signup and view all the answers

Consider two populations. Population 1 has a size of 30 and a mean of 10. Population 2 has a size of 20 and a mean of 20. What is the combined mean of both populations?

<p>13 (C)</p> Signup and view all the answers

A teacher calculates the weighted mean of student grades, giving more weight to exams than homework. What does this weighting primarily account for?

<p>The relative importance of each assessment. (A)</p> Signup and view all the answers

A researcher aims to study the job satisfaction of nurses in a large hospital. Due to time constraints, they decide to interview only the nurses who are readily available during their visit. Which non-probability sampling method is being used?

<p>Convenience sampling (B)</p> Signup and view all the answers

Which of the following statements is not a characteristic of the mean?

<p>It is not influenced by extreme values. (A)</p> Signup and view all the answers

In which sampling method does a researcher begin with an initial subject who then identifies other potential subjects that meet the research criteria?

<p>Snowball/referral sampling (D)</p> Signup and view all the answers

In what scenario would a weighted mean be most appropriate instead of a regular arithmetic mean?

<p>When each data point has a different level of significance. (A)</p> Signup and view all the answers

A market research company wants to survey households in a city. They divide the city into blocks and randomly select some blocks. All households within the selected blocks are then surveyed. Which probability sampling method is being used?

<p>Cluster sampling (B)</p> Signup and view all the answers

An NGO wants to gather data on the prevalence of a rare disease. Given the difficulty of finding affected individuals, they start with known patients who then refer other potential participants. What sampling technique are they employing?

<p>Snowball sampling (C)</p> Signup and view all the answers

If the sum of squared deviations from a certain value in a dataset is the minimum possible, what is that certain value?

<p>The mean. (A)</p> Signup and view all the answers

A university wants to survey its alumni. They obtain a list of all alumni, randomly select a starting point, and then select every 50th name on the list. Which probability sampling method is being used?

<p>Systematic sampling (B)</p> Signup and view all the answers

A researcher is studying the reading habits of students in a school. The researcher obtains a list of all students, assigns a number to each student, and then uses a random number generator to select the sample. Which sampling method is the researcher using?

<p>Simple random sampling (D)</p> Signup and view all the answers

In a study aiming to estimate the average income of adults in a city, the researcher divides the population into different age groups (e.g., 18-25, 26-35, 36-45, etc.) and then randomly selects participants from each age group. Which sampling method is being used?

<p>Stratified random sampling (C)</p> Signup and view all the answers

A quality control engineer wants to inspect a batch of 500 items. Instead of inspecting all items, they select one at random, then inspect every 25th item thereafter. What sampling interval k is used in this systematic sampling approach, and what is the sampling fraction?

<p>$k = 25$, sampling fraction is $1/25$ (D)</p> Signup and view all the answers

How is the standard deviation of a dataset affected if a constant c is added to each observation?

<p>It remains unchanged. (C)</p> Signup and view all the answers

What happens to the standard deviation if each observation in a dataset is multiplied by a constant c?

<p>It is multiplied by the absolute value of <code>c</code>. (C)</p> Signup and view all the answers

Under what condition is the coefficient of variation (CV) not computable or considered meaningless?

<p>When the mean is zero or negative. (B)</p> Signup and view all the answers

What does a large value of the coefficient of variation (CV) indicate about a dataset?

<p>The data set is highly variable. (C)</p> Signup and view all the answers

Which of the following is the primary use of the coefficient of variation (CV)?

<p>To compare the variability of datasets with different units or means. (B)</p> Signup and view all the answers

Why is the standard score (Z-score) useful when comparing data points from different distributions?

<p>It converts the data points to a common scale, accounting for differences in means and standard deviations. (C)</p> Signup and view all the answers

A student scores 80 on a math test where the mean is 70 and the standard deviation is 10. They also score 75 on an English test where the mean is 65 and the standard deviation is 5. On which test did the student perform relatively better?

<p>English test (C)</p> Signup and view all the answers

Which of the following is not a characteristic of the standard score (Z-score)?

<p>It is a measure of absolute dispersion. (D)</p> Signup and view all the answers

An applicant, Nancy, took three different tests. Given her scores, the means, and standard deviations for each test: Law (141 sec, mean 180 sec, SD 30 sec), Accounting (7 min, mean 10 min, SD 2 min), and Scientific (33 min, mean 26 min, SD 5 min). In which test did Nancy perform relatively the best, considering the distribution of scores for each test?

<p>Scientific (B)</p> Signup and view all the answers

A company decides to give all its employees a 5% raise. If the original mean salary was $415,000 and the standard deviation was $54,200, what are the new mean and standard deviation after the raise?

<p>Mean: $435,750, SD: $57,910 (B)</p> Signup and view all the answers

The annual salaries of 16 employees at a company are used to calculate a sample mean of $415,000 and a sample standard deviation of $54,200. Which formula represents the calculation of the sample standard deviation $s$?

<p>$s = \sqrt{\frac{\sum_{i=1}^{16} (X_i - \overline{X})^2}{16 - 1}}$ (A)</p> Signup and view all the answers

A dataset of salaries has a mean of $415,000. If each salary is then divided by 12, what happens to the new mean?

<p>Decreases by a factor of 12 (D)</p> Signup and view all the answers

The average test score in a class is 75 with a standard deviation of 7. If the teacher decides to add 5 points to each student's score, what will be the new mean and standard deviation?

<p>Mean = 80, Standard Deviation = 7 (C)</p> Signup and view all the answers

Consider a dataset representing the waiting times at a doctor's office. If the mean waiting time is 25 minutes with a standard deviation of 5 minutes, what does the standard deviation tell us about the spread of waiting times?

<p>Most patients wait between 20 and 30 minutes. (A)</p> Signup and view all the answers

Suppose you have two sets of data: Set A has a mean of 100 and standard deviation of 10, and Set B has a mean of 50 and standard deviation of 5. If you want to compare a value of 110 from Set A to a value of 58 from Set B, which of the following is the most appropriate next step?

<p>Calculate the z-scores for both values. (A)</p> Signup and view all the answers

Which of the following statements is NOT a property of the standard deviation?

<p>It is affected by adding a constant to each data point. (A)</p> Signup and view all the answers

Flashcards

Statistical Methods

Methods for collecting, presenting, analyzing, and interpreting data.

Descriptive Statistics

Methods to describe data without drawing conclusions about a larger group.

Inferential Statistics

Methods for predicting or making inferences about a larger group based on a subset.

Statistical Theory

Development of theories that underpin statistical methods.

Signup and view all the flashcards

Population

All elements under consideration in a statistical study.

Signup and view all the flashcards

Sample

A part or subset of the population from which information is collected.

Signup and view all the flashcards

Population (Example)

Example: All customers who purchased a kerosene heater.

Signup and view all the flashcards

Sample (Example)

Example: 5,000 customers contacted for feedback.

Signup and view all the flashcards

Target Population

The population from which information is desired.

Signup and view all the flashcards

Sampled Population

The collection of elements from which the sample is actually taken.

Signup and view all the flashcards

Population Frame

A listing of all individual units in the population.

Signup and view all the flashcards

Convenience Sampling

Selects units that are easily accessible.

Signup and view all the flashcards

Simple Random Sampling (SRS)

Every item in the population has an equal chance of selection.

Signup and view all the flashcards

SRS with Replacement (SRSWR)

Chosen element replaced before the next selection. An element may be chosen more than once.

Signup and view all the flashcards

Stratified Random Sampling

Population divided into strata, then a random sample from each.

Signup and view all the flashcards

(1-in-k) Systematic Sampling

Selecting every kth unit from an ordered population, with a random start.

Signup and view all the flashcards

Population Mean (µ)

The average of all values in a population.

Signup and view all the flashcards

Sample Mean (XÌ„)

The average of all values in a sample.

Signup and view all the flashcards

Mean's Data Usage

The mean uses all available data points in the data set.

Signup and view all the flashcards

Mean's Sensitivity

The mean can be heavily influenced by extreme highs or lows.

Signup and view all the flashcards

Mean's Value

The mean may not be an actual data point in the set.

Signup and view all the flashcards

Weighted Mean

Assigns different importance to each observation.

Signup and view all the flashcards

Weighted Mean Formula

The average where each data point has a weight.

Signup and view all the flashcards

Combined Mean

Mean of multiple populations combined.

Signup and view all the flashcards

Range

The difference between the maximum and minimum values in a dataset.

Signup and view all the flashcards

Range Calculation

The range is calculated using only the highest and smallest values.

Signup and view all the flashcards

Range Sensitivity

The range can be drastically changed by a single outlier.

Signup and view all the flashcards

Interquartile Range (IQR)

The difference between the upper (Q3) and lower (Q1) quartiles of a dataset.

Signup and view all the flashcards

IQR Robustness

Reduces the impact of extreme values by using quartiles.

Signup and view all the flashcards

IQR Data Coverage

The IQR contains the central cluster of the dataset.

Signup and view all the flashcards

Variance

Average squared difference of each observation from the mean.

Signup and view all the flashcards

Standard Deviation

The square root of the variance.

Signup and view all the flashcards

Mean

A measure of the average value of a dataset.

Signup and view all the flashcards

Nancy's Score

Applicant's score in a test.

Signup and view all the flashcards

Calculating the Mean

Sum of values divided by the number of values.

Signup and view all the flashcards

Statistic

A single number summarizing a characteristic of a dataset.

Signup and view all the flashcards

Data Standardization

Adjusting data to a common standard.

Signup and view all the flashcards

Data Transformation

Changing the original data values.

Signup and view all the flashcards

New Data

Adding a constant to each old datapoint.

Signup and view all the flashcards

Standard Deviation and Addition/Subtraction

Adding/subtracting a constant from each data point doesn't change the standard deviation.

Signup and view all the flashcards

Standard Deviation and Multiplication/Division

Multiplying/dividing each data point by a constant scales the standard deviation by the same constant.

Signup and view all the flashcards

Coefficient of Variation (CV)

The ratio of the standard deviation to the mean, expressed as a percentage.

Signup and view all the flashcards

Use of Coefficient of Variation

Compares variability of datasets with different means or units.

Signup and view all the flashcards

Interpreting CV Value

High CV indicates high variability in the dataset.

Signup and view all the flashcards

Standard Score (Z-score)

A measure of how many standard deviations an observation is from the mean.

Signup and view all the flashcards

Use of Standard Score

Comparing values from different series with different means or standard deviations.

Signup and view all the flashcards

Outlier Detection

Identifying potential outliers in a dataset.

Signup and view all the flashcards

Study Notes

  • Statistical Methods of Applied Statistics are procedures and techniques used for data collection, presentation, analysis, and interpretation.

Descriptive Statistics

  • Methods involve data collection, description, and analysis to describe a data set without drawing conclusions about a larger group.
  • Focuses on clarifying obscure information within the given data.
  • Conclusions are limited to the data being analyzed.

Inferential Statistics

  • Methods involve making predictions or inferences about a larger data set, using information from a subset.

  • Aims to predict and infer beyond mere description.

  • Conclusions can be applied to a larger group, with the analyzed data considered a subset.

  • Statistical Theory of Mathematical Statistics involves theoretical development which forms the basis of statistical methods.

Population and Sample

  • Population: All elements under consideration in a study.
  • Sample: A subset of the population from which data is collected.

Population and Sample Example

  • A kerosene heater manufacturer wants to know if customers are satisfied.

  • They contact 5,000 of their 200,000 customers.

  • Population is all 200,000 customers.

  • Sample is the 5,000 contacted customers.

  • Parameter: A numerical characteristic of a population.

  • Statistic: A numerical characteristic of a sample.

Parameter and Statistic Example

  • A college polls 200 students and finds 12% smoke

  • Population: all students at the college

  • Sample: the 200 students polled

  • Parameter: proportion of all college students who smoke

  • Statistic: the 12% smoking rate in the sample

  • Variable: A characteristic or attribute that can have different values for different entities.

  • Measurement: Process of determining the value/label of a variable for an experimental or sampling unit.

  • Experimental/Sampling Unit: The individual or object on which a variable is measured.

  • Observation: A numerical recording of information on a variable.

  • Data: A collection of observations.

Data Classification

  • Primary Data: Original data measured by the researcher/agency that publishes it.
  • Secondary Data: Data republished by another agency.
  • Example: Philippine Statistics Authority data is primary, while subsequent publications are secondary.
  • Internal Data: Relates to the organization's operations and functions.
  • External Data: Relates to activity outside the organization collecting it.
  • SM's sales data is internal to SM, but external to Robinson's.

Data Collection

  • Census/Complete Enumeration: Method of gathering data from every unit in a population.
  • Not always timely, accurate, or economical.
  • Sample Survey: Gathers data from a representative population subset.
  • More speedy and timely
  • More economical for information gathering and analysis
  • More accurate, with skilled workers making fewer errors
  • Involves questions to obtain info
  • Includes telephone, mailed, online, in-person and mall intercept surveys

Statistical Data Collection Methods

  • Observation: Recording behavior at the time of occurrence e.g. traffic count.
  • Experimental: Data collection under controlled conditions.
  • Existing Studies: Using census, health statistics, etc.
    • Can be documentary sources (published reports) or field sources (researchers).
  • Registration: Recording data such as car or student registration.

Sampling Techniques

  • Probability Sampling: Every population element has a known chance of being selected.
  • Non-Probability Sampling: Sampling procedure where not every element has a known chance of selection.
  • Probability sampling is preferred for objective reliability assessment.
  • Target Population: The population from which information is desired.
  • Sampled Population: Collection of elements from which the sample is taken.
  • Population Frame: A list of all individual units in the population.

Non-Probability Sampling Methods

  • Purposive Sampling: Sample agrees with the population profile based on characteristics chosen beforehand.
  • Quota Sampling: Selects specific number of units with certain characteristics.
  • Convenience Sampling: Selects readily available sampling units
  • Judgment Sampling: Sample selected based on expert judgment.
  • Snowball/Referral Sampling: Initial subjects identify further subjects.

Probability Sampling Methods

  • Simple Random Sampling: Each item has an equal chance of selection.
  • Can be with replacement (SRSWR) or without replacement (SRSWOR).

Approaches to Simple Random Sampling

  • Lottery Method

  • Use of Random Numbers

    • Table of Random Numbers
    • Online Random Number Generator
  • Stratified Random Sampling: Divides population into strata, draws a random sample from each.

  • (1-in-k) Systematic Sampling: Selects every kth unit from an ordered population with a random start.

    • k is the sampling interval.

Probability Sampling Methods Continued

  • Cluster Sampling: Selects distinct groups/clusters, then takes a census within selected clusters.
  • Multistage Sampling: Divides the population hierarchically into sampling units across stages.

Types of Allocation

  • Equal Allocation: Used when strata have similar unit numbers, variability, and sampling costs, or lack prior knowledge.
  • Proportional Allocation: Used when stratum sizes vary.
  • Optimum (Neyman) Allocation: Used when stratum variability or proportion varies between strata.

Data Presentation

Textual Presentation

  • Data incorporation into a paragraph
  • Describes figures and density

Tabular Presentation

  • Systematic arrangement of data in rows/columns.
  • Parts of a Formal Statistical Table:
    • Heading: Includes table number, title, and headnote.
    • Box Head: Contains column heads describing data.
    • Stub: The first column lists row captions.
    • Field: Main part of the table with the data.
    • Source Note: Cites data source.
    • Footnote: Additional notes.

Graphical Presentation

  • Represents numerical values or relationships in pictorial form.

Data Description

  • Aims to describe data without oversimplifying or overcomplicating.
  • Data can be presented raw, as frequency distributions, or as graphs.

Measures of Central Tendency

  • Central tendency identifies the "center" or typical value of a data set.
  • Facilitates the comparison of two or more data sets.
  • Includes the mean, median and mode

Characteristics of a Good Average

  • Easily understood
  • Objective and clearly defined
  • Stable
  • Amenable to statistical computation

Arithmetic Mean

  • Sum of all values divided by the number of observations.

Population Mean Formula

  • μ = (Σ Xi) / N

Sample Mean Formula

  • X = (Σ Xi) / n

Characteristics of the Mean

  • Uses all available information
  • Influenced by extreme values and small number of observations

Mean Modifications

  • May not be a value in the data set.
  • Has two mathematical properties:
    • Sum of deviations from the mean is zero.
    • Sum of squared deviations is minimum.
  • Always exists and is unique
  • Adding/subtracting a constant changes the mean by that amount.
  • Multiplying/dividing by a constant scales the mean by that constant.

Weighted Mean

  • Assigns weights to observations based on their importance.
  • X = (Σ WiXi) / (Σ Wi)

Combined Mean

  • The combined population mean μc:
  • μc = (Σ Niμi) / (Σ Ni)
  • The combined sample mean Xc:
  • Xc = (Σ niXi) / (Σ ni)

Trimmed Mean

  • Mitigates outlier effects by deleting a% of lowest and highest values.
  • The a%-trimmed mean does this then finds the mean of remaining values

Median

  • Divides the data array into two equal parts.
  • If n is odd, Md = X(n+1)/2
  • If n is even, Md = [X(n/2) + X(n/2 + 1)] / 2

Characteristics of the Median

  • Positional measure.
  • Affected by item position, not value.
  • Less sensitive to extreme values than the mean.

Mode

  • The most frequently observed value in a data set.
  • Data set can be no mode, unimodal, bimodal, trimodal, etc
  • Exists if the data is high density where observation values occur
  • Is not affected by extreme values
  • The mode can be used for qualitative as well as quantitative data.

Choosing a Suitable measure of Central Tendency

  • This depends on the data distribution and objective.

Measures of dispersion

  • These describe data scatter or variability with respect to central tendency
  • Necessary to describe the dataset in addition to measures of central tendency

Measures of Dispersion Definition

  • Quantities indicating the extent to which individual items in a series are scattered about an average.

Measuring Dispersion

  • Measures the scatter’s extent, allowing existing variation controls.
  • Used to measure average value reliability

Measures of Absolute Dispersion

  • Expressed in original observation units.
  • Cannot compare variations when data set averages differ greatly in value/measurement units

Measures of Relative Dispersion

  • Unitless, used to compare distributions.

The Range

  • Difference between largest and smallest values (max - min).

Characteristics of the Range

  • Uses only extreme values.
  • Omits info about data clustering between extremes.
  • Sensitive to outliers.

The Interquartile Range

  • Difference between upper and lower quartiles (Q3 - Q1).

Characteristics of Interquartile Range

  • Uses quartile values, reducing extreme value influence
  • Contains middle 50% of data

The Standard Deviation and Variance

  • Variance: Average squared difference from the mean.
  • Standard Deviation: Square root of the variance.

Variance Formula

  • Σ(Xi - μ)² / N

Standard Deviation Formula

  • σ = √[Σ(Xi - μ)² / N]

Sample Variance Formula

  • s² = Σ(Xi - X)² / (n-1)

Sample Standard Deviation Formula

  • s = √[Σ(Xi - X)² / (n-1)]

Characteristics of Variance and Standard Deviation

  • Standard deviation is used most frequently
  • Variance is not a measure of absolute dispersion
  • Affected by every value, skewed by extreme values
  • Adding/subtracting a constant doesn't change standard deviation.
  • Multip./dividing by a constant scales the standard deviation.

Coefficient of Variation

  • Ratio of standard deviation to mean, in percentage.

Population Formula

  • CV = (σ / μ) × 100%

Sample Formula

  • CV = (s / X) × 100%

Characteristics of the Coefficient of Variation

  • Compares variability even with different means/units.
  • Expresses standard deviation as a percentage
  • High CV means high variability
  • Undefined when the mean is zero.

The Standard Score

  • Z tells you how standard deviations an observation is from the mean.
  • Population: Z = (X - μ) / σ,
  • Sample: Z = (X - X) / s

Characteristics of the Standard Score

  • Not a measure of relative dispersion itself
  • useful for comparing values from different series, especially with different means, SD or units
  • Helpful for outlier detection

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

More Like This

Use Quizgecko on...
Browser
Browser