Introduction to statistics

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

In a study, researchers want to understand the sleep patterns of university students. They collect data from a sample of 200 students at one university. What does the population consist of?

  • The sleep patterns of all university students. (correct)
  • The sleep patterns of the 200 students surveyed.
  • The universities where students were surveyed.
  • The responses of all adults in Ghana.

Which of the following scenarios best illustrates the use of inferential statistics?

  • Calculating the average test score of students in a class.
  • Creating a pie chart to show the distribution of blood types in a sample.
  • Determining the range of salaries for employees in a company.
  • Using the results from a sample of voters to predict the outcome of an election. (correct)

In a clinical trial for a new drug, researchers measure the change in blood pressure for each participant after treatment. What type of data is being collected?

  • Quantitative (correct)
  • Ordinal
  • Nominal
  • Qualitative

What is the most significant limitation of using ordinal level of measurement in statistical analysis?

<p>The differences between data entries are not meaningful. (B)</p> Signup and view all the answers

Consider a dataset of temperature readings (in Celsius) taken every day for a year. Which level of measurement does this data represent?

<p>Interval (D)</p> Signup and view all the answers

A researcher wants to study the job satisfaction of employees at a large corporation. Due to resource constraints, they only survey employees in the marketing and finance departments. What type of sampling technique is being used?

<p>Cluster Sample (A)</p> Signup and view all the answers

Which of the following is least likely to have a class width of 5, when constructing a frequency distribution?

<p>35-41 (A)</p> Signup and view all the answers

Which measure is most affected by outliers?

<p>Mean (C)</p> Signup and view all the answers

Consider a dataset with a mean of 50, a median of 45, and a mode of 40. What can you infer about the shape of the distribution?

<p>The distribution is right skewed. (D)</p> Signup and view all the answers

For any data set, what is the key difference between calculating the population variance and the sample variance?

<p>Sample variance divides by $n-1$, population variance divides by $N$. (A)</p> Signup and view all the answers

A dataset with the following values: 20, 22, 23, 25, 27, 28, 30, 50. Calculate the interquartile range (IQR).

<p>9 (D)</p> Signup and view all the answers

Which statement is correct about the characteristics about a boxplot?

<p>If the median falls to the right of the center, the distribution is negatively skewed. (B)</p> Signup and view all the answers

Which of the following scenarios involves an event and its sample space?

<p>Tossing a coin and obtaining either heads or tails. (D)</p> Signup and view all the answers

You roll a six-sided die. What is the probability of rolling a number less than 5?

<p>2/3 (B)</p> Signup and view all the answers

A survey finds that 60% of adults in a city support building a new sports stadium. What is the probability that a randomly selected adult does not support the new stadium?

<p>0.4 (C)</p> Signup and view all the answers

In a population, 40% of people have type A blood, and 10% have type B blood. If blood type A and B are independent, what is the probability that a randomly selected person has both type A and type B blood?

<p>0.04 (A)</p> Signup and view all the answers

Event A and Event B are mutually exclusive, which of the following most accurately reflects the relationship between the events?

<p>$P(A | B) = 0$ (B)</p> Signup and view all the answers

In a lottery, a player needs to select 6 numbers out of 49 correctly to win the jackpot. How many different combinations of numbers are possible?

<p>13,983,816 (B)</p> Signup and view all the answers

How many ways could the letters in the word 'statistics' be arranged?

<p>50,400 (A)</p> Signup and view all the answers

What is the number of ways to arrange the letters in the word 'arrange'?

<p>210 (C)</p> Signup and view all the answers

In a quality control process, a manufacturer assesses a sample of 5 items from a production lot of 100. If four or more items are defective, the entire lot is rejected. If in reality, the population mean is 3, is the variable discrete or continuous?

<p>Discrete (C)</p> Signup and view all the answers

A discrete probability distribution is defined by $P(X = x) = kx$ for $x = 1, 2, 3$. Find the value of $k$.

<p>1/6 (A)</p> Signup and view all the answers

Consider a scenario in which $P(X)$ is not between zero and one. Which rule is violated?

<p>Probability must be between zero and one (A)</p> Signup and view all the answers

Given the discrete probability distribution where the values are a die: (1) 1/6, (2) 1/6, (3) 1/6, (4) 1/6, (5) 1/6, (6) 1/6: Determine the mean of the probability distribution.

<p>3.5 (B)</p> Signup and view all the answers

A game involves drawing a ball from a bag containing 4 red balls and 6 blue balls. If you draw a red ball, you win $10; if you draw a blue ball, you lose $5. What is the expected value of playing this game?

<p>$1 (A)</p> Signup and view all the answers

In flipping a fair coin three times, what is the probability of getting at least two heads?

<p>0.5 (A)</p> Signup and view all the answers

What are the requirements of a probability distribution?

<p>The range of probabilities is between zero and one and the sum of the probabilities is one. (D)</p> Signup and view all the answers

Why is randomness important in sample selection?

<p>To remove any bias (C)</p> Signup and view all the answers

A researcher measured the heart rates of 50 individuals after they completed a stressful task. The distribution of heart rates was right-skewed. Which measure of central tendency would best represent the typical heart rate of these individuals?

<p>Median (C)</p> Signup and view all the answers

Which of the following statements about boxplots is not generally true?

<p>They always show the exact mean of the dataset. (B)</p> Signup and view all the answers

Two events, A and B, are such that $P(A)$ is 0.6, $P(B)$ is 0.5, and $P(A \cup B)$ is 0.8. What is $P(A \cap B)$?

<p>0.3 (D)</p> Signup and view all the answers

A box contains 7 red balls and 3 blue balls. Three balls are selected at random, without replacement. Determine the probability that there are exactly two red balls.

<p>21/40 (C)</p> Signup and view all the answers

A store manager wants to implement a new customer service program. Before launching it, they survey a sample of 100 customers to gauge interest. Based on this sample, they estimate that 70% of all customers would be interested in the program. What type of statistics is being used here?

<p>Inferential statistics (C)</p> Signup and view all the answers

Which of the following data types is least suited for calculating a meaningful arithmetic mean?

<p>Customer satisfaction ratings on a scale of 1 to 5 (B)</p> Signup and view all the answers

In a study examining the effectiveness of a new teaching method, students in one class are taught using the new method, while students in another class are taught using the traditional method. At the end of the semester, both groups take the same exam, and the results are compared. The goal is to determine whether the new teaching method leads to higher exam scores. What would be the best approach?

<p>Experiment (B)</p> Signup and view all the answers

A researcher wants to analyze the distribution of household incomes in a city. However, they suspect that the presence of a few extremely high incomes could distort the results. Which of the following measures would be least affected by these outliers?

<p>Median (A)</p> Signup and view all the answers

You are given the following five-number summary for a dataset: Minimum = 10, Q1 = 25, Median = 30, Q3 = 40, Maximum = 75. What can you conclude about the shape of the distribution?

<p>The distribution is skewed to the right (B)</p> Signup and view all the answers

There are 100 students in a class. 60 students passed the first exam, and 70 students passed the second exam. If 40 students passed both exams, what is the probability that a randomly selected student passed at least one of the two exams?

<p>0.9 (B)</p> Signup and view all the answers

A card is drawn randomly from a standard deck. Determine the probability of selecting a face card or a red card

<p>$\frac{32}{52}$ (B)</p> Signup and view all the answers

Consider a scenario where you want to choose a committee of 5 people from a group of 10 men and 8 women. What is the total possible combinations?

<p>18C5 (C)</p> Signup and view all the answers

What is the difference between discrete variables and continuous variables with regard to probability distributions?

<p>Discrete variables have a countable number of values in its probability distribution, while continuous variables can assume range of values, but its probabilities are generally not uniformly distributed. (B)</p> Signup and view all the answers

Suppose $X$ is the number of successes after 6 independent trials, where each trial has $p$ is 0.35. Compute $P(X=4)$.

<p>0.0976 (A)</p> Signup and view all the answers

With probability distributions, what are some of the most commonly computed values? (Select all that apply)

<p>The mean, $μ$ (B), The variance, $σ^2$ (C)</p> Signup and view all the answers

Flashcards

What is Data?

Information from observations, counts, measurements or responses.

What is Statistics?

Science of collecting, organizing, analyzing, and interpreting data to make decisions.

What is population?

The entire group of individuals or items being studied.

What is a sample?

A subset of the population used for analysis.

Signup and view all the flashcards

What is a Parameter?

A number that describes a population characteristic

Signup and view all the flashcards

What is a Statistic?

A number that describes a sample characteristic.

Signup and view all the flashcards

What is Descriptive Statistics?

Organizing, summarizing, and displaying data.

Signup and view all the flashcards

What is Inferential Statistics?

Using a sample to draw conclusions about a population.

Signup and view all the flashcards

What is Qualitative Data?

Consists of attributes, labels, or non-numerical entries.

Signup and view all the flashcards

What is Quantitative Data?

Consists of numerical measurements or counts.

Signup and view all the flashcards

What is Nominal Level?

Qualitative data that cannot be ordered.

Signup and view all the flashcards

What is Ordinal Level?

Data that can be ordered, but differences are not meaningful.

Signup and view all the flashcards

What is Interval Level?

Data that can be ordered; differences between data entries are meaningful. Zero is a position on a scale.

Signup and view all the flashcards

What is Ratio Level?

Similar to interval level, but zero is an inherent zero, Ratio of data values ​​are meaningful.

Signup and view all the flashcards

Data Collection Techniques?

Observational Study, Experiment, Simulation, Survey

Signup and view all the flashcards

Sampling Techniques?

Simple Random, Stratified, Cluster, Systematic

Signup and view all the flashcards

What is Frequency Distribution?

Organized display of raw data in table form, using classes and frequencies.

Signup and view all the flashcards

Frequency Distribution Graphs?

Graphs such as Histogram and Frequency Polygon

Signup and view all the flashcards

Cumulative Frequency Graphs?

Graphs such as Ogive and Pie Chart

Signup and view all the flashcards

Stem-and-leaf Plot?

Data plot for quantitative data values

Signup and view all the flashcards

What is Mean?

Sum of all data entries divided by the number of entries.

Signup and view all the flashcards

What is Median?

The middle value of an ordered data set.

Signup and view all the flashcards

What is Mode?

The data entry that occurs most frequently.

Signup and view all the flashcards

Mean of Grouped Data?

Measures the average value

Signup and view all the flashcards

Weighted Mean?

Mean in which each data entry is weighted by importance.

Signup and view all the flashcards

What is Symmetric Distribution?

A distribution where mean = median = mode

Signup and view all the flashcards

What is Left Skewed Distribution?

mean < median < mode

Signup and view all the flashcards

What is Right Skewed Distribution?

mode < median < mean

Signup and view all the flashcards

What is Range?

The difference between the maximum and minimum data entries.

Signup and view all the flashcards

What is Deviation?

The difference between the data entry and the mean.

Signup and view all the flashcards

Population variance?

Measures the spread of the data

Signup and view all the flashcards

Sample variance?

Measures variability within a sample

Signup and view all the flashcards

What is Population Standard Deviation?

σ, the square root of the population variance.

Signup and view all the flashcards

What is Sample Standard Deviation?

s, the square root of the sample variance.

Signup and view all the flashcards

What are Quartiles?

Values which divide the data into four equal parts.

Signup and view all the flashcards

What are Deciles?

Values which divide the data into ten equal parts.

Signup and view all the flashcards

What are Percentiles?

Values which divide the data into hundred equal parts

Signup and view all the flashcards

What is Box Plot?

Five-number summary for data: min, Q1, median, Q3, max.

Signup and view all the flashcards

Probability Experiment?

A chance process that leads to well defined results called outcomes

Signup and view all the flashcards

What is an outcome?

The result of a single trial of a probability experiment

Signup and view all the flashcards

What is Sample Space?

The set of all possible outcomes

Signup and view all the flashcards

What is event?

A subset of the sample space.

Signup and view all the flashcards

Study Notes

  • Statistical Methods I is MATH 153 and is taught by Collins Abaitey.

Course Outline

  • The course covers Introduction to Statistics, Frequency Distributions and Graphs, Measures of Central Tendency, Measures of Variation, Measures of Position, Probability and Counting Rules, Random Variables, and Discrete Probability Distributions.
  • A recommended textbook for the course is Elementary Statistics (A Step by Step Approach) by Allan G. Bluman.

Introduction to Statistics

  • Data consists of information from observations, counts, measurements, or responses.
  • Statistics is the science of collecting, organizing, analyzing, and interpreting data to make decisions.
  • A population includes all outcomes, responses, measurements, or counts of interest.
  • A sample is a subset of a population.
  • The population is the responses of all adults in Ghana when surveying 2500 adults in Ghana on global warming.
  • The sample includes the responses of the 2500 adults in Ghana surveyed.
  • A parameter describes a population characteristic, like the average age of all people.
  • A statistic describes a sample characteristic, like the average age of a sample of people.
  • The average starting salary for petroleum engineers of $83,121 represents a statistic, as it is based on a sample.
  • The average cut-off point of aggregate 12 for the 2,182 students admitted to KNUST in 2009 is a parameter since it is derived from the entire population of admitted students.
  • Descriptive Statistics involves organizing, summarizing, and displaying data.
  • Inferential Statistics involves using a sample to draw a conclusion about a population.
  • In a study of men aged 48 over 18 years, descriptive statistics include statements like "70% of unmarried men were alive at age 65" and "90% of married men were alive at age 65.".
  • A possible inference from the men study is that being married is associated with a longer life for men.

Data Classification

  • Qualitative data consists of attributes, labels, or non-numerical entries like place of birth, eye color, or political affiliation.
  • Quantitative data consists of numerical measurements or counts, such as age, weight, or temperature.
  • Nominal level measurements are qualitative and cannot be ordered.
  • Ordinal level measurements are qualitative or quantitative, can be ordered, but differences between data entries are not meaningful.
  • Top five TV programs from 5/4/09 to 5/10/09 (American Idol, Dancing with the Stars, NCIS, The Mentalist) are ranked on an ordinal level.
  • Network Affiliates in Pittsburgh, PA (WTAE, WPXT, KDKA and WPGH) are an example of a nominal level.
  • Interval level measurements are quantitative, can be ordered, have meaningful differences between data entries, and zero represents a position on a scale, but a ratio doesn't make sense.
  • Ratio level measurements are similar to interval, but zero is inherent and implies 'none,’ which allows for meaningful ratios.
  • The New York Yankees' World Series victories (years) are measured on an interval level due to the meaningful differences between the years.
  • 2009 American League Home Run Totals are measured on a ratio level because a ratio of the home runs can be calculated and is meaningful.

Data Collection and Sampling Techniques

  • Data collection techniques include observational studies, experiments, simulations, and surveys.
  • Sampling techniques include simple random sampling, stratified sampling, cluster sampling, and systematic sampling.

Frequency Distribution and Graphs

  • A frequency distribution organizes raw data into a table using classes and frequencies.
  • Class width = 10, calculated by 11 - 1 = 10.
  • class mark = 5.5, calculated by (1 + 10) / 2
  • The lower class limit for the class 1–10 is 1.
  • The upper class limit for the class 1-10 is 10.
  • The lower class boundary for the class 1 – 10 is 0.5.
  • The upper class boundary for the class 1 – 10 is 10.5.
  • class boundary = 0.5, by subtracting and adding 0.5 to the lower and upper class limits respectively.

Measures of Central Tendency

  • Mean is the average found by summing all data entries and dividing by the number of entries, represented as μ = Σx / N for population mean and x = Σx / n for sample mean.
  • Median is the middle value of an ordered data set.
  • Mode is the most frequently occurring data entry.
  • When ordering the data {200, 300, 400, 400, 500, 600, 700}, the mean is 442.9, the median is 400, and the mode is 400.
  • When ordering the data {388, 397, 397, 427, 782, 872}, the mean is 543.8, the median is 412, and the mode is 397.
  • For the data {100, 101, 102, 103, 104, 105, 106}, the mean and median are both 103, and there's no mode.
  • For the data {250, 300, 300, 350, 350, 400, 450, 2000}, the mean is 550, the median is 350 and the modes are 300 and 350, making the set bimodal.
  • Outliers greatly affect mean.
  • Mean takes into account every entry of a data set.

Mean of Grouped Data

  • In grouped data, the mean is calculated using x = Σfx / Σf.

Weighted Mean

  • Weighted mean is calculated using x = Σwx / Σw, where w is the weight of each entry x.

Shapes of Frequency Distributions

  • In a symmetric distribution, mean = median = mode.
  • In a left-skewed distribution, mean < median < mode.
  • In a right-skewed distribution, mode < median < mean.

Measures of Variation

  • Range is the difference between the maximum and minimum data entries in a set.
  • Deviation is the difference between a data entry, x, and the mean of the data set (x - μ or x - x̄).
  • Population variance is σ² = Σ(x-μ)² / N.
  • Sample variance is s² = Σ(x-x̄)² / (n-1).
  • Population standard deviation is σ = √[Σ(x-μ)² / N].
  • Sample standard deviation is s = √[Σ(x-x̄)² / (n-1)].
  • For the data set {111, 112, 115, 117, 118, 119, 120}, the range is 9.

Measures of Position

  • Quartiles divide an ordered data set into four approximately equal parts (Q1, Q2, Q3). -Deciles divide an ordered data set into ten equal parts (D1, D2 ... D9).
  • Percentiles divide an ordered data set into 100 equal parts (P1, P2 ... P99).
  • Range is calculated through IQR = Q3 - Q1.

Box Plots

  • Box plots require a five-number summary: minimum entry, first quartile (Q1), second quartile/median (Q2), third quartile (Q3), and maximum entry.

Probability

  • Probability is the chance of an event occurring.
  • Probability experiment is a process that leads to well-defined outcomes.
  • Outcome is the result of a single trial in a probability experiment.
  • Sample space is the set of all possible outcomes.
  • Event is a subset of the sample space.
  • For probability experiment, a die roll has a sample space of {1, 2, 3, 4, 5, 6} .
  • Classical probability is when outcomes in the sample space are equally likely, P(E) = n(E) / n(S).
  • Empirical probability is the relative frequency of an event, P(E) = f / n.
  • Subjective probability is based on intuition, guesses, and estimates.
  • In rolling a six-sided die: The probability of rolling a 3 (Event A) = 1/6 and probability of rolling a number less than 5 (Event C) = 2/3.
  • The probability of an event E falls between 0 and 1.
  • Complement of event E is the set that includes the same sample space but not event E, denoted as E′.
  • The complement of event E is equal to P(E') = 1 − P(E).

Conditional Probability and the Multiplication Rule

  • It determines the probability of event B occurring when it is known that A has already occurred, P(B/A) = P(B∩A) / P(A).
  • Events are independent if the occurrence of one does not affect the probability of the other and P(B/A) = P(B) or P(A/B) = P(A).
  • For independent events, P(A and B) = P(A) * P(B).
  • The outcome on a coin does not affect the probability of rolling a 6 on the die.

Mutually Exclusive Events and the Addition Rule

  • Mutually exclusive events (A and B) cannot occur at same time, and P(A ∩ B) = 0.
  • The probability that two events A or B will occur is described as P(A or B) = P(A ∪ B) = P(A) + P(B) − P(A ∩ B). However, if mutually exclusive, then P(A or B) = P(A ∪ B) = P(A) + P(B)
  • An example in order to determine probabilities is with different types of blood that donors may have as they are mutually exclusive to each other.

Probability and Counting

  • Counting Principles state if event M can occur in m number of times and event N can occur in n number of times, then the two events can occur in sequence is m * n.
  • As an example, a scenario of looking at car manufacturers Ford, GM and Honda as the first event, then compact and midsize as the second event, then white(W), red(R), black(B), green(G) colour as the fourth sequence yields a result of 3 * 2 * 4 = 24 ways.
  • Factorial Notation for any n number is done as n! = n * (n - 1) * (n - 2) ... 2 * 1. For example, 5! = 120.
  • If there are many of the same element to account for such in a word it means that equation can be derived from doing n! / element1! + element2! ... elementn!.

Discrete Probability Distributions

  • Random variables are variable whose values are determined by chance, generally referred to as X or Y.
  • Has a countable amount of possible values. For example, the total chairs in a room.
  • Can assume all values within an interval between any given values and can be measured. Is obtained for example, the temperature within 24 hours.
  • Requirements for a probability distribution: 0≤𝑃(𝑋)≤1 and Σ𝑃(X)=1.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Population Parameters vs Sample Statistics
6 questions

Population Parameters vs Sample Statistics

UnquestionableGreenTourmaline avatar
UnquestionableGreenTourmaline
Population and Sample: Unit 1
10 questions
Statistics: Population and Sample
30 questions
Use Quizgecko on...
Browser
Browser