Statistics and Variables
41 Questions
3 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which variable represents the house price of unit area?

  • date
  • dist
  • price (correct)
  • age

What does the variable 'dist' represent in the dataset?

  • The number of convenience stores.
  • The age of the house.
  • The transaction date.
  • The distance to the nearest MRT station. (correct)

In the models presented, what does R2 represent?

  • Standard deviation
  • R-squared (correct)
  • Sample size
  • Adjusted R-squared

What is the unit of measurement for the 'age' variable?

<p>Years (A)</p> Signup and view all the answers

Which of the following is NOT a variable in the dataset?

<p>area (C)</p> Signup and view all the answers

What does 'H0' typically represent in hypothesis testing?

<p>Null Hypothesis (A)</p> Signup and view all the answers

In an upper-tailed test, which of the following is the correct alternative hypothesis?

<p>Ha: µ &gt; µ0 (B)</p> Signup and view all the answers

What does α (alpha) represent in hypothesis testing?

<p>The probability of rejecting the null hypothesis when it is true. (A)</p> Signup and view all the answers

In a lower-tailed test, the rejection region is located in which tail of the distribution?

<p>The left tail (A)</p> Signup and view all the answers

If the test statistic falls within the rejection region, what decision should be made regarding the null hypothesis?

<p>Reject the null hypothesis. (A)</p> Signup and view all the answers

What type of variable is 'gender'?

<p>Categorical (B)</p> Signup and view all the answers

Which of the following is a characteristic of a nominal variable?

<p>Values with no inherent order (A)</p> Signup and view all the answers

What defines a discrete numerical variable?

<p>Finite or countably infinite possible values (C)</p> Signup and view all the answers

What is the formula for the sample mean?

<p>$\frac{1}{n} \sum_{i=1}^{n} x_i$ (C)</p> Signup and view all the answers

Which of the following is a measure of variability?

<p>Sample variance (A)</p> Signup and view all the answers

What does $x_i - \bar{x}$ represent?

<p>The ith deviation from the mean (D)</p> Signup and view all the answers

What is 'number of teeth' considered in the example of Canadian children?

<p>Discrete variable (D)</p> Signup and view all the answers

Which of the following variables is most likely continuous?

<p>Height of a tree (A)</p> Signup and view all the answers

For any three events A, B, C, what is the formula for $P(A \cup B \cup C)$?

<p>$P(A) + P(B) + P(C) - P(A \cap B) - P(A \cap C) - P(B \cap C) + P(A \cap B \cap C)$ (D)</p> Signup and view all the answers

What does $P(A)$ represent for an event A?

<p>The sum of the probabilities of all outcomes in A. (A)</p> Signup and view all the answers

In an equally likely outcome experiment, how is the probability of an event A calculated?

<p>The number of outcomes in A, divided by the total number of outcomes in the sample space S. (D)</p> Signup and view all the answers

If S = {2, 4, 6, 8} with p(2) = 0.1, p(4) = 0.2, p(6) = 0.3, p(8) = 0.4, and A = {2, 6}, what is P(A)?

<p>0.4 (C)</p> Signup and view all the answers

In the context of probability, what is a 'simple event'?

<p>An event that cannot be broken down into smaller events. (A)</p> Signup and view all the answers

What is the sum of the probabilities of all simple events in a sample space?

<p>1 (D)</p> Signup and view all the answers

Given that 60% of families have TV cable and 80% have internet cable, what is the maximum possible percentage of families that have both?

<p>60% (C)</p> Signup and view all the answers

If a fair four-sided die is tossed twice, what is the total number of outcomes in the sample space?

<p>16 (B)</p> Signup and view all the answers

What is the effect on the confidence interval (CI) width when the sample size increases, assuming all other factors remain constant?

<p>The CI width decreases (C)</p> Signup and view all the answers

In the context of confidence intervals, what does 'margin of error' quantify?

<p>The range within which the true population parameter is expected to fall (D)</p> Signup and view all the answers

Which distribution is used when constructing a confidence interval for a population mean when the population standard deviation ($\sigma$) is unknown and the sample size is small?

<p>T-distribution (C)</p> Signup and view all the answers

What is the mean of the t-distribution?

<p>0 (D)</p> Signup and view all the answers

What happens to the standard deviation of the t-distribution as the degrees of freedom increase?

<p>Decreases (D)</p> Signup and view all the answers

If you increase the confidence level for a confidence interval, what happens to the width of the interval, assuming all other factors remain constant?

<p>The width increases. (D)</p> Signup and view all the answers

For a t-distribution, what does the parameter 'v' represent?

<p>Degrees of freedom (B)</p> Signup and view all the answers

What is the formula for the upper bound of a 95% confidence interval for the population mean, when the population standard deviation ($\sigma$) is known?

<p>$ar{x} + 1.96 \cdot rac{\sigma}{\sqrt{n}}$ (B)</p> Signup and view all the answers

Which of the following is a key assumption when constructing a Z confidence interval?

<p>The population standard deviation ($\sigma$) is known. (D)</p> Signup and view all the answers

What happens to the width of a confidence interval as the confidence level increases, assuming all other factors remain constant?

<p>The width increases. (A)</p> Signup and view all the answers

What is the meaning of '$ar{x}$' in the context of confidence intervals?

<p>Sample mean (A)</p> Signup and view all the answers

What does 'n' represent in the formula for calculating a confidence interval?

<p>Sample size (A)</p> Signup and view all the answers

If you construct a 95% confidence interval, what does this mean?

<p>95% of similarly constructed intervals would contain the true population mean. (A)</p> Signup and view all the answers

Assuming the same confidence level, what leads to a smaller (narrower) confidence interval?

<p>Larger sample size (D)</p> Signup and view all the answers

With a fixed sample size and standard deviation, if you change from a 95% confidence interval to a 99% confidence interval, what happens to the margin of error?

<p>The margin of error increases. (B)</p> Signup and view all the answers

Flashcards

Date (Real Estate)

Date of the real estate transaction.

Categorical Variable

A variable whose possible values are non-numerical and can be grouped into categories.

Ordinal Variable

A categorical variable where the set of possible values has a meaningful order or ranking.

Age (Real Estate)

Age of the house in years.

Signup and view all the flashcards

Nominal Variable

A categorical variable where the set of possible values has no inherent order or ranking.

Signup and view all the flashcards

Distance to MRT

Distance to the nearest MRT station in meters.

Signup and view all the flashcards

Numerical Variable

Variables whose values are numerical.

Signup and view all the flashcards

Cstores

Number of convenience stores in the living circle.

Signup and view all the flashcards

Price (Real Estate)

House price per unit area.

Signup and view all the flashcards

Discrete Variable

Numerical variables with finite or countably infinite possible values.

Signup and view all the flashcards

Continuous Variable

Numerical variables with possible values that fall within an interval of numbers.

Signup and view all the flashcards

Sample Mean (x̄)

The sum of all values in a dataset divided by the number of values.

Signup and view all the flashcards

Sample Variance (s²)

A measure of how spread out the data points are in a sample, calculated using the squared deviations of each data point from the sample mean.

Signup and view all the flashcards

Null Hypothesis (H0)

The null hypothesis (H0) states that the population mean (µ) is equal to a specific value (µ0).

Signup and view all the flashcards

Upper-Tailed Test (Ha: µ > µ0)

The alternative hypothesis (Ha) for an upper-tailed test states that the population mean (µ) is greater than a specific value (µ0).

Signup and view all the flashcards

Lower-Tailed Test (Ha: µ < µ0)

The alternative hypothesis (Ha) for a lower-tailed test states that the population mean (µ) is less than a specific value (µ0).

Signup and view all the flashcards

Two-Tailed Test (Ha: µ ≠ µ0)

The alternative hypothesis (Ha) for a two-tailed test states that the population mean (µ) is not equal to a specific value (µ0).

Signup and view all the flashcards

Rejection Region (RR)

The rejection region (RR) is the range of values for the test statistic that leads to the rejection of the null hypothesis.

Signup and view all the flashcards

Confidence Interval

An interval with a fixed range calculated from sample data, used to estimate a population parameter.

Signup and view all the flashcards

Random Interval

An interval with a random range that attempts to capture the true population mean with certain probability.

Signup and view all the flashcards

Interpreting a 95% CI

Under repeated sampling, 95% of constructed confidence intervals contain the true population mean.

Signup and view all the flashcards

Narrower vs. Wider CI

A narrower CI provides a more precise estimate of the population parameter.

Signup and view all the flashcards

Reduce CI Width

Increase sample size or decrease the confidence level.

Signup and view all the flashcards

Calculate 99% CI for µ

Use zα/2 value corresponding to 99% confidence level in the CI formula.

Signup and view all the flashcards

Two-Sided Z - CI

A 100(1 − α)% confidence interval for the population mean µ when σ is known and the population is normal or n ≥ 30.

Signup and view all the flashcards

Confidence Level

The confidence level is the probability that the interval contains the true population mean.

Signup and view all the flashcards

P(A ∪ B ∪ C) Formula

For three events A, B, and C, the probability of their union equals the sum of individual probabilities, minus pairwise intersections, plus the intersection of all three.

Signup and view all the flashcards

P(A) as Sum of Outcomes

The probability of an event A is the sum of the probabilities of all the individual outcomes that make up event A.

Signup and view all the flashcards

Equally Likely Outcome

In an equally likely outcome experiment, each outcome has the same probability of occurring.

Signup and view all the flashcards

P(A) in Equally Likely Outcomes

In an equally likely experiment, the probability of event A is the number of outcomes in A divided by the total number of outcomes.

Signup and view all the flashcards

TV/Internet Cable Probability

Given percentages on TV/Internet subscriptions, find the probability of TV cable OR internet cable OR both. Use the general addition rule.

Signup and view all the flashcards

Electronic Dryer Probability

Given P(at most 1 electronic dryer) = 0.428, find the probability that at least two purchase an electronic dryer.

Signup and view all the flashcards

Dryer Type Probability

Given the probabilities of all five customers purchasing gas or electronic dryers, find the probability that at least one of each type is purchased.

Signup and view all the flashcards

Die Toss Probabilities

Tossing a fair die twice, calculate P(A), P(B), P(C) where A is at least one 4, B is sum of 5, and C is both tosses are 1.

Signup and view all the flashcards

Confidence Interval (CI)

The range within which the true population mean is likely to fall, with a specified degree of confidence.

Signup and view all the flashcards

Margin of Error (m)

The maximum expected difference between the true population mean and the sample mean.

Signup and view all the flashcards

Factors Affecting CI Width

Determines the width of the confidence interval. A larger sample size results in a smaller confidence interval, and so on.

Signup and view all the flashcards

t-Distribution

Used when the population standard deviation (σ) is unknown and the sample size is small. It has degrees of freedom (v).

Signup and view all the flashcards

Degrees of Freedom (v)

A parameter in the t-distribution, calculated as n - 1, where n is the sample size.

Signup and view all the flashcards

P (T ≥ tα (v)) = α

The probability of observing a test statistic as extreme as, or more extreme than, the statistic obtained from a sample, under the assumption that the null hypothesis is true.

Signup and view all the flashcards

Estimating σ with s

Replace σ with s (sample standard deviation) to calculate CI.

Signup and view all the flashcards

t-CI

Used to estimate population mean with unknown population standard deviation (σ) and normal distribution or large sample size.

Signup and view all the flashcards

Study Notes

Overview and Descriptive Statistics

Populations, Samples, and Processes

  • A Population is the entire group to be studied; N typically represents the size of a finite population.
  • An Observation is a single individual entity or measurement.
  • A Variable is a characteristic of interest about an individual or object within a population.
  • A Census involves measuring or surveying every unit in the population.
  • A Sample is a subset of the population, selected from the entire group.

Random Sample vs. Convenience Sample

  • A Random Sample is chosen so every individual or object has an equal chance of selection.
  • A Convenience Sample is selected based on ease of access.

Probability vs. Statistics

  • In a Probability Problem, population characteristics are known, and the focus is on sample-related questions.
  • In an Inferential Statistics Problem, limited knowledge about the population exists, so sample data is used to make generalizations about the population and answer related questions.

Types of Data

  • A Categorical Variable has non-numerical values that can be grouped into categories.
    • An Ordinal Variable is a categorical variable with ordered categories.
    • A Nominal Variable is a categorical variable without any specific order.
  • A Numerical Variable has numerical values.
    • A Discrete Variable is numerical with a finite or countably infinite set of possible values.
    • A Continuous Variable has possible values within an interval of numbers.

Measures of Location and Variability

Summation Notation

  • Summation notation is a concise way to express the sum of a series of terms
  • Denoted by the symbol Σ, it represents the addition of a sequence of values

Sample Mean

  • A set of n observations is denoted as x1, x2, ..., xn.
  • The sample mean is calculated as the sum of all observations divided by the number of observations: x̄ = (1/n) * Σxi.
  • The population mean is denoted as μ, and is a fixed constant, while the sample mean varies from sample to sample.

Measures of Variability

  • Measures of variability, along with measures of center, are essential to describe a dataset fully.

Sample Variance and Sample Standard Deviation

  • The ith deviation about the mean is xi - x̄.
  • Sample variance, denoted as , measures the average squared deviation from the mean.
  • Sample standard deviation, denoted as s, is the square root of the sample variance and has the same units those of the original data.
  • Population variance is denoted by σ² = (1/N) * Σ(xi - μ)², and the population standard deviation is denoted by σ = √(σ²).
  • n-1 is used as the divisor for sample variance, because xi's tend to be closer to their average x̄ than to μ, and this compensates for it.

Probability

Sample Space and Event

  • An Experiment is an activity with at least two possible outcomes, where the result cannot be predicted with certainty.
  • A Sample Space (S) is a listing of all possible outcomes of an experiment, expressed using set notation.
  • An Event is any collection or set of outcomes from an experiment.

Set Operations

  • Complement (A'): All outcomes in the sample space S that are not in A (not A).
  • Union (A∪B): All outcomes in A or B or both (A or B).
  • Intersection (A∩B): All outcomes in both A and B (A and B).
  • Disjoint or Mutually Exclusive: A ∩ B = {} (empty set); no elements in common. P(Ø) = 0.
  • Exhaustive: A ∪ B = S; includes all outcomes of the sample space.
  • Mutually Exclusive and Exhaustive: A ∩ B = {} and A ∪ B = S; no elements in common and includes all outcomes of the sample space.

Axioms, Interpretations, and Properties of Probability

  • Relative Frequency of Occurrence: The number of times an event occurs divided by the total number of experiment repetitions.
  • Probability of an Event A, P(A): The limiting relative frequency, or the proportion of time the event A occurs in the long run.
  • Axioms of Probability:
    • P(A) ≥ 0
    • P(S) = 1, where S is the sample space.
    • For infinite disjoint events, P(A1 ∪ A2 ∪ ...) = Σ P(Ai).
  • **P(A) ≤ 1
  • If A and B are disjoint events, then P(A ∩ B) = 0.
  • For any events A and B, P(A ∪ B) = P(A) + P(B) – P(A ∩ B).
  • If A and B are disjoint events, then P(A ∪ B) = P(A) + P(B).
  • Equally Likely Outcome Experiment: When all outcomes have an equal chance, the probability of event A is the number of outcomes in A divided by the total outcomes in the sample space S.

Conditional Probability

  • The conditional probability of A given B is P(A|B) = P(A ∩ B) / P(B), where P(B) > 0.

Independence

  • A and B are independent if and only if P(A|B) = P(A).
  • If A and B are independent, then A' and B; A and B'; A' and *B' *are also independent
  • If A, B are independent, P(A ∩ B) = P(A) * P(B).
  • A1, A2, ..., An are mutually independent if for every k = 2,3, ..., n and every subset of indices i1, i2, ..., ik, P(Ai₁ ∩ Aiz ก... ∩ Aik) = P(Ai₁) * P(Ai₂) * ... * P(Aik)

Discrete Random Variables and Probability Distributions

Random Variables

  • A Random Variable assigns a unique numerical value to each outcome in a sample space.
  • A Discrete Random Variable has values from a finite set or a countably infinite sequence.
  • Continuous Random Variable has an interval, or disjoint union of intervals possible values.
  • Random variables are usually denoted by a capital letter.

Probability Distributions for Discrete Random Variables

  • Probability distribution or probability mass function (pmf) of a discrete random variable is defined for every number x by p(x) = P(X = x) = P(all w ∈ S : X (w) = x)
  • The cumulative distribution function (cdf) F(x) of a discrete random variable X with pmf p(x) is defined for every number x by:
    • F(x) = P(X ≤ x) = Σ p(y) for all y ≤ x.
    • For any number x, F(x) represents the probability that the observed value of X will be at most x.
    • For any two numbers a, b, and a - P(a ≤ X ≤ b) = F(b) – F(a – 1)

Expected Value and Variance

  • For a discrete random variable X with possible values D and pmf p(x), the expected or mean value (E(X) or μX) : E(X) = μX = Σx * p(x), where the sum is over all x ∈ D.
  • The expected value of any function h(X) is denoted by E[h(X)] or μh(X): E[h(X)] = Σ h(x) * p(x), where the sum is over all *x ∈ D Variance of X: *V(X) = Σ(x – E(X))² * p(x) = Σ x² * p(x) – E(X)² = E(X²) – [E(X)]², where the sum is over all x ∈ D Standard deviation (SD) of X: σX = √σ².

Bernoulli Random Variable

  • A Bernoulli experiment: an experiment with only two possible random outcomes: success and failure.
  • Bernoulli random variable: whose only possible values are 0 and 1.

Binomial Probability Distribution

  • Binomial experiment:
    • Consists of n trials.
    • Each trial has only two outcomes: success, failure.
    • Outcomes are independent between trials.
    • Success probability, p, remains constant between trials.
  • A binomial random variable, X: the number of successes in n trials. Denoted: X ~ B(n, p).

Continuous Random Variables and Probability Distributions

Probability Density Functions

  • A continuous probability distribution describes the random variable completely and is used to compute probabilities associated with a random variable.
  • Probability density function (pdf), f(x):
    • A function defined for all real numbers (i.e., x ∈ (-∞, +∞)).
    • A smooth curve which describes the probability distribution for a continuous random variable X through area under the curve. Let a < b, P(a ≤ X ≤ b) = ∫ f(x) dx from a to b.
  • The cumulative distribution function (cdf) F(x) for a continuous rv X is defined, for every number x, by F(x) = P(X < x) = ∫ f(y) dy from -∞ to X.
  • F(x)* is the area under the density curve to the left of x.
    • P(X > a) = 1-2
    • P(a ≤ X ≤ b) = F(b)-F(a)
  • Expected value and variance, of a continuous rv X:
    • μX = E(X)
    • μh(x) = E[h(X)]
    • σχ2=V(X)
    • σX

Uniform Distribution

  • The random variable X has a uniform distribution on the interval [a, b].
  • Probability density function (pdf):
    • f(x; a, b) = 1/(b-a), when a ≤ X ≤ b
    • f(x; a, b) = 0, otherwise

The Normal Distribution

  • The Normal Distribution has two parameters: μ,σ (or μ, σ²), −∞ < μ +∞ and σ > 0. We write the random variable X ~ Ν(μ, σ²). The pdf is:
  • f(x) = 1 / (σ√(2π)) * e^((- (x-μ)^2) / (2σ²))

Joint Probability Distributions and Random Samples

Statistic and the Distribution

  • A Population Parameter is a numeric measure of a population.
  • A Sample Statistic is a numerical characteristic of the sample.
  • A Point Estimate is a single value of the selected point estimator, computed from a given sample.

The Distribution of the Sample Mean

  • When taking a random sample X1, X2, ..., Xn from a distribution with mean μ and standard deviation σ, the sample mean x̄ has these properties:
    • E(x̄) = μ
    • V(x̄) = σ²/n and σχ=σ/√n**.
  • Random samples from a normal distribution
    • Given the rv's X1, X2, ..., Xn from a normal distribution with mean μ and standard deviation σ, then for any n.
      • E(x̄) = μ
      • V(x̄) = σ²/n.
      • x̄ ~ N(μ, σ²/n)
      • T(0) ~N(nμ, nσ²)
  • Central Limit Theorem (CLT):
    • The rv's X1, X2, ..., Xn should be a random sample from a distribution with mean value μ and standard deviation σ. If n is sufficiently large (n > 30), then,
      • E(x̄) = μ, and V(x̄) = σ²/n
      • x̄ ~ N(μ, σ²/n)
      • T(0) ~N(nμ, nσ²)

Confidence Intervals with a Single Sample

Why do we need a Confidence Interval?

  • A point estimate alone is highly likely to be wrong.
  • A Confidence Interval is an interval that is used to estimate the population true mean, is more reliable than a single estimate.
  • Confidence Level: Represents the degree of reliability, is expressed as 100(1 – α)%.
  • The random interval is defined as x̄ ± 1.96 (σ/√n).

Derive the Confidence Interval

  • For confidence interval derivation, assume the population is normal or n > 30 (large) and σ² is known.
  • Questions to consider:
    • What is the difference b/w Random Interval and Confidence Interval?
    • How to interpret a Confidence Interval?
    • Is a wider CI or a narrower CI better?
    • How to make a Confidence Interval smaller when the confidence level keeps the same value?
    • How to calculate a 99% CI for μ? Given the same sample, which is wider: a 95% CI or a 99% CI?

Two-Sided CI with Known σ: Z-CI

  • Assumptions include, the sample size is normal or large (n ≥ 30) and that the standard deviation σ is known.
  • A two-sided 100(1 – α)*% Confidence Interval (CI):
  • x ± zα/2*(σ/√n).

The Width of Z-CI and Sample Size

  • Given a population with σ fixed, the factors that may effect CI width. Given a 100(1 – α)*% *Confidence interval, is: x ± zα/2 (σ/√n). The margin of error, m = zα/2 (σ/√n)

Confidence Interval when σ is unknown: t - CI

  • In situation's where σ is unknown, a t-CI is used, which involves the t distribution. The t distribution has a degree of freedom parameter, v,= n − 1.

Two-Sided t - CI with Unknown

  • The underlying distribution must be either normal or the sample size must be large (n ≥ 30)
  • σ² must also be unknown

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Statistics Textbook PDF

Description

This quiz covers fundamental concepts in statistics, including variables, hypothesis testing, and measures of variability. It explores variable types like nominal and discrete, and delves into hypothesis testing components such as null and alternative hypotheses. It further tests understanding of statistical measures like sample mean and standard deviation.

More Like This

Use Quizgecko on...
Browser
Browser