Normality Tests, Binomial Distribution & R code
21 Questions
1 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

When assessing data normality with a sample size of 30, which statistical tests are most appropriate?

  • Kolmogorov-Smirnov test
  • Skewness and Kurtosis tests (correct)
  • Shapiro-Wilk test
  • Anderson-Darling test

In the context of normality testing, how should a probability value of 0.03 be interpreted?

  • The data are approximately normally distributed.
  • The data are normally distributed.
  • The data are likely not normally distributed. (correct)
  • The normality test is inconclusive.

Which of the following is NOT a characteristic of a Bernoulli process, which underlies the Binomial distribution?

  • Trials are statistically independent.
  • The probability of outcomes remains constant across trials.
  • The number of trials must be infinite. (correct)
  • Each trial has only two possible outcomes.

Which R function is suitable for conducting the Shapiro-Wilk normality test?

<p><code>shapiro.wilk.normality.test()</code> (B)</p> Signup and view all the answers

In the binomial formula, what does the variable 'r' represent?

<p>The number of desired occurrences. (A)</p> Signup and view all the answers

In ROIStat, where can you locate the normality tests?

<p>Both A and B (D)</p> Signup and view all the answers

A company wants to determine the probability of receiving at least 40 good parts out of a shipment of 45, where the probability of a part being good is 0.9. Which R code would correctly calculate this using the dbinom or equivalent function, summing the probabilities from 40 to 45?

<p><code>pbinom(q = 39, size = 45, prob = 0.9, lower.tail = FALSE)</code> (A)</p> Signup and view all the answers

When predicting the percentage of values outside a specification limit, why is it preferable to use the normal distribution prediction rather than simply counting the values in the sample?

<p>We want to make an inference from the sample to the population. (B)</p> Signup and view all the answers

If data contains a set of observations and x represents a specific value, what does sum(data < x)/n calculate?

<p>The proportion of observations less than x in the sample. (D)</p> Signup and view all the answers

What type of data is most appropriate for analysis using the Binomial distribution?

<p>Nominal data, where variables are categorical and unordered. (C)</p> Signup and view all the answers

To estimate the proportion of values in a population that are below a certain threshold x, assuming a normal distribution, which R function would you typically use?

<p><code>pnorm(x, mean, sd)</code> (B)</p> Signup and view all the answers

Suppose a manufacturing process has a probability of 0.75 of producing a defect-free item. If 10 items are produced, what R code using dbinom calculates the probability of exactly 8 items being defect-free?

<p><code>dbinom(x = 8, size = 10, prob = 0.75)</code> (C)</p> Signup and view all the answers

Given the R code sum(FlowRate$Flow < 15)/50, what does this calculate in the context of the FlowRate data frame?

<p>The percentage of flow rates less than 15 in the sample. (B)</p> Signup and view all the answers

In RStudio, which argument in the pnorm() function determines whether to calculate the probability to the left or right of a given value?

<p><code>lower.tail</code> (A)</p> Signup and view all the answers

What parameters are required to calculate probabilities using a normal distribution within ROIStat?

<p>Mean and standard deviation (A)</p> Signup and view all the answers

A manufacturing process has a mean of 100 and a standard deviation of 10. If a part must be at least 85 units to be acceptable, what R code using pnorm would find the proportion of parts that are acceptable?

<p><code>pnorm(85, 100, 10, lower.tail = FALSE)</code> (D)</p> Signup and view all the answers

A quality control process measures the weight of cereal boxes. The mean weight is 20 ounces with a standard deviation of 0.5 ounces. What proportion of boxes are less than 19 ounces? Assume a normal distribution.

<p>Using ROIStat, input mean = 20, standard deviation = 0.5, and point of interest = 19. (D)</p> Signup and view all the answers

What does the q parameter represent in the pnorm(q, mean, sd, lower.tail) function in RStudio?

<p>The quantile (value) for which the probability is to be calculated (C)</p> Signup and view all the answers

A machine fills bags with candy. The bags are labeled as containing 500 grams. Over a long period, the machine's fills have averaged 505 grams with a standard deviation of 3 grams. Assuming the fills are normally distributed, what is the probability a randomly selected bag will contain less than 500 grams?

<p>Approximately 4.78% (B)</p> Signup and view all the answers

A certain type of light bulb has an average lifespan of 1000 hours, with a standard deviation of 50 hours. If the lifespan is normally distributed, what is the probability that a randomly selected bulb will last more than 1100 hours?

<p>Calculate <code>pnorm(1100, 1000, 50, lower.tail = FALSE)</code> (D)</p> Signup and view all the answers

A company produces bolts with a mean diameter of 5 mm and a standard deviation of 0.1 mm. Bolts are considered defective if their diameter is outside the range of 4.8 mm to 5.2 mm. Assuming a normal distribution, approximately what percentage of bolts are defective?

<p>Calculate <code>pnorm(4.8, 5, 0.1, lower.tail = TRUE) + pnorm(5.2, 5, 0.1, lower.tail = FALSE)</code> (B)</p> Signup and view all the answers

Flashcards

Binomial Distribution

A probability distribution that describes the number of successes in a fixed number of independent trials, each with the same probability of success.

Bernoulli Process

A process where each trial has only two outcomes (success or failure), probability remains fixed, and trials are independent.

Binomial Formula Variables

p is the probability of success, q is the probability of failure (1-p), r is the number of successes desired, and n is the number of trials.

dbinom() in R

Calculates the binomial probability of exactly 'r' successes in 'n' trials, given probability 'p'.

Signup and view all the flashcards

P(X ≥ r) in Binomial

To find the probability of getting at least a certain number of successes, calculate P(X ≥ r) using binomial distribution.

Signup and view all the flashcards

Normality Tests (n < 25)

Tests used to check if data follows a normal distribution when n < 25.

Signup and view all the flashcards

Normality Tests (n ≥ 25)

Tests used to check if data follows a normal distribution when n ≥ 25, based on Skewness and Kurtosis.

Signup and view all the flashcards

Normality Test Result (≥ 0.05)

A probability ≥ 0.05 suggests the data is likely normally distributed.

Signup and view all the flashcards

Normality Test Result (< 0.05)

A probability < 0.05 suggests the data is likely NOT normally distributed.

Signup and view all the flashcards

R Normality Test Functions

Functions in R used for normality testing. Includes Anderson-Darling and Shapiro-Wilk.

Signup and view all the flashcards

Inferring Population from Sample

Using a sample's % out of spec to infer the predicted % out of spec in the population.

Signup and view all the flashcards

pnorm(x, mu, sigma)

Estimates the proportion of data points below a certain value (x) in a normally distributed population, using its mean (mu) and standard deviation (sigma).

Signup and view all the flashcards

Sample % Out of Spec Calculation

To find the actual percentage of values in a sample that are below (or above) a specific value.

Signup and view all the flashcards

Probability of Tool Life

The probability a tool lasts less than a certain time.

Signup and view all the flashcards

ROIStat

A statistical software used to calculate probabilities based on distributions.

Signup and view all the flashcards

pnorm()

A function in RStudio that calculates the cumulative probability of a normal distribution.

Signup and view all the flashcards

μ (Mean)

The average center-to-center distance between holes.

Signup and view all the flashcards

σ (Standard Deviation)

A measure of the spread of the center-to-center distances.

Signup and view all the flashcards

USL (Upper Specification Limit)

The maximum acceptable center-to-center distance.

Signup and view all the flashcards

LSL (Lower Specification Limit)

The minimum acceptable center-to-center distance.

Signup and view all the flashcards

Defect Probability

The percentage of parts that don't meet requirements.

Signup and view all the flashcards

Study Notes

Binomial Distribution

  • Can be used to describe the Binomial probability distribution and calculate probabilities
  • Relates to a discrete random variable (nominal data) and is based on the Bernoulli process

Bernoulli Process

  • Each trial or experiment has only two possible outcomes
  • The probability of any and all outcomes remains fixed over time (constant probability)
  • All trials or experiments are statistically independent

Binomial Formula

  • P(r in n trials) = [n! / r!(n-r)!] * [p^r] * [q^(n-r)]
  • p = probability of occurrence
  • q = 1-p = probability of failure
  • r = number of occurrences desired
  • n = number of trials

Binomial Example

  • A vendor frequently ships 2 bad parts out of 10
  • Vendor ships 50 parts
  • At least 9 parts out of 10 must be good
  • It's possible to calculate the probability of receiving what was requested

Binomial Example (Calculations Shown)

  • p = 0.80, q = 0.20, r = 45, n = 50
  • P(45 in 50) = [50! / 45!(50-45)!] * [0.8^45] * [0.2^5] = 0.02953

Binomial Distribution in RStudio

  • p = 0.80, q = 0.20, r = 45, n = 50
  • Can be solved in RStudio as dbinom(x = 45, size = 50, prob = 0.8)
  • The function ro(table.dist.binomial(n = 50, p = 0.80),5) can also be used

Binomial Example in ROIStat

  • To calculate, first open ROI Stat
  • Then go to Distributions > Binomial
  • Enter in the value for p (π)
  • Enter in the sample size (n)
  • Select the Point (R) of Interest

Poisson Distribution

  • Can be used to describe this probability distribution and calculate probabilities

Poisson Distribution Context

  • Used for discrete random variables which can take integer (whole) values (ordinal data)
  • Can apply to the number of parts produced during a 10 minute period, # of breakdowns per shift and the # of failures per 100 cycles

Poisson Formula

  • P(X) = (λ^X / X!) * e^(-λ)
  • P(X) = probability exactly X occurrences
  • λ = Mean number of occurrences per time interval (or unit)
  • e = 2.71828

Poisson Example

  • λ = 25 parts produced per hour
  • X = 10 parts produced in one hour

Poisson Example - Probability

  • P(10) = (25^10 / 10!) * e^(-25) = 0.000365

Poisson Distribution in RStudio

  • λ = 25 parts produced per hour
  • X = 10 parts produced in one hour
  • The calculation can be solved in code as dpois(x = 10, lambda = 25)
  • ro(table.dist.poisson(lambda = 25),5) is an alternative function to use

Poisson Example in ROIStat

  • Enter values for Open ROI Stat, Distributions > Poisson and Count (λ) and select the Point (R) of Interest

Testing for a Poisson Distribution

  • All ratio discrete, count data do not necessarily conform to this probability distribution!

Testing in RStudio

  • Use the follow function: poisson.dist.test(x = Discrete$DEFECTS)

Additional info for testing

  • If the p-value is less than 0.05 then reject the hypothesis
  • Data are likely from a Poisson distribution when the p (probability) is low
  • If p is low, then reject H0

Results of Example of Testing

  • chi.square = 44.173, degrees of freedom = 49, p-value = 0.6624
  • alternative hypothesis: true chi.square is not equal to 49
  • sample estimates:
    • chi.square sample variance 44.172840
    • sample mean = 3.240000

Ungrouped Testing

  • function is hist.ungrouped(Discrete$DEFECTS)

Testing for a Poisson Distribution In ROIStat

  • Open ROI Stat and Go to Distributions > Testing
  • Select the data and reject if the p-value is < 0.05

Definition of Probability in quality control

  • Used for monitoring the number of occurrences of a specified event in a specified inspection unit
  • Inspection units can be length, area, number of parts, volume, or time

Considering Challenge

  • For every 750 lines of code, there will be an average of 6 errors

What to Consider for The challenge

  • A = (255 / 750)(6) = 2.04 errors
  • λ = 6/750 = 0.008 errors per line x 255 lines; so
  • 0.008 x 255 = 2.04 errors

Distribution Solution

  • Produce the distribution for the relevant Poisson Distribution (λ = 2.04) with the function round.object(table.dist.poisson(2.04),4)

Finding P(0)

  • round.object(table.dist.poisson(2.04),4)

    • x p.at.x eq.and.above eq.and.below
    • 0 0 0.1300 1.0000 0.1300
  • 1 1 0.2653 0.8700 0.3953
  • 2 2 0.2706 0.6047 0.6659
  • 3 3 0.1840 0.3341 0.8498
  • 4 4 0.0938 0.1502 0.9437
  • 5 5 0.0383 0.0563 0.9819
  • 6 6 0.0130 0.0181 0.9950
  • 7 7 0.0038 0.0050 0.9988
  • 8 8 0.0010 0.0012 0.9997
  • 9 9 0.0002 0.0003 0.9999
  • 10 10 0.0000 0.0001 1.0000
  • 11 11 0.0000 0.0000 1.0000

Binomial practice activities

  • What is the probability of finding exactly two nonconforming parts in the 50 parts and finding two are less nonforming parts in the 50 parts assuming a supplier ships 50 parts to a plant and a consistent 10% nonforming rate exists?
  • You can use lolcat's 'table.dist.binomial()' function considering: π = 0.10, n = 50 and r = 2

Solution for previous

  • ro(table.dist.binomial(n,p) [1:10,],4)
  • The exact probability of x, or r = 2 can be obtained with the function: dbinom(x = 2,size = 50, prob = 0.1)
  • pbinom(q = 2, size = 50, prob = 0.1) calculates the probability of 2 or fewer
  • x p.at.x eq.and.above eq.and.below
  • 0 0 0.0052 1.0000 0.0052
  • 1 1 0.0286 0.9948 0.0338
  • 2 2 0.0779 0.9662 0.1117
  • 3 3 0.1386 0.8883 0.2503
  • 4 4 0.1809 0.7497 0.4312
  • 5 5 0.1849 0.5688 0.6161
  • 6 6 0.1541 0.3839 0.7702
  • 7 7 0.1076 0.2298 0.8779
  • 8 8 0.0643 0.1221 0.9421
  • 9 9 0.0333 0.0579 0.9755

You Try: Binomials

  • Assume a product has a documented failure rate of 0.20 after 150 hours of use. If we were to place 30 randomly selected parts from this process in the field, what is the failure rate with 5 or fewer after 150 hours and then what is the probability of over 10 failing

Poisson Take away

  • The number of OSHA-recordable safety accidents in a manufacturing plant has been running 4.2 accidents per 200,000 hours worked. What is the probability of having exactly two accidents in a 200,000-hour work period?

Facts:

  • Given, X = 4.2 and X=2

You can use lolcat's'table.dist.poisson()' function to get the results (next slide) or directly with the R dpois() function, both demonstrated on the next slide:

  • λ = 4.2 and X = 2

Example Solution

  • ro(table.dist.poisson(lambda)[1:5,],4)
  • x p.at.x eq.and.above eq.and.below
  • 0 0 0.0150 1.0000 0.0150
  • 1 1 0.0630 0.9850 0.0780
  • 2 2 0.1323 0.9220 0.2102
  • 3 3 0.1852 0.7898 0.3954
  • 4 4 0.1944 0.6046 0.5898

Try A Poisson

  • The average buckets of blanked saw chain cutters have 65 (入) and the output represents the Poisson function. What are the probability of creating 50 or more buckets per day?

NORMAL DISTRIBUTIONS

  • A theoretical probability distribution for a continuous random variable and one of the most important distributions
  • Mean = Median = Mode and symmetrical around μ
  • Tails extend to ∞ but never touch the horizontal axis and Areas are always predictable
  • Has a Y3 and Y4 of 0.00

Area in a curve

  • 34.135% lies within 1 standard of Deviation
  • 13.590% lies within 2 standard of Deviations
  • 2.140% lies within 3 Standard of deviation
  • The middle (μ) has a area of 68.27%
  • Two standard distributions have an area of 95.45% and 3 have 99.73%

Z score

  • The area corresponding to any score value may be found through the function
  • Z = ( X – μ) / σ.
  • Z is the number of standard deviation units from X to μ

Normal probability Question

  • Tooling lasted 180 hours with a deviation of 5 hours and the probability will last less than 172 hours

Facts :

  • Z = (X – μ) / σ
  • Z = 172 - 180/5 = -1.60

Quick notes

  • Can be coded in RStudio with function pnorm(q, mean, sd , lower.tail)or In distributions > Normal in ROI Stat

What's a follow up Normal Example

  • center-to-center distance between the two holes has been an average (μ) of 5.20mm and a standard deviation (σ) of 0.05mm with an USL of 5.35mm and a LSL of 5.15mm.
  • Find the amount out of specs?

Here's the solution

  • Z= 5.15 - 5.20 /.05
  • z = -1.00 and then Z= 5.35 - 5.20/.05 and Z= 3.00

How to test for Normality

  • When n < 25, use the Anderson-Darling / Shapiro-Wilk tests for normality
  • and when n ≥ 25, use Skewness Test, and Kurtosis Test (Moment Tests)

Testing Tips

  • Probabilities ≥ 0.05 indicate that the data are normal and opposite if it's NOT normal

What RStudio

  • anderson.darling.normality.test() or shapiro.wilk.normality.test() summary.continuous() is the solution
  • Or find the EDA > Normality Tests functions in ROIStat

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Description

This quiz covers normality tests for a sample size of 30, interpretations of probability values, and characteristics of a Bernoulli process. It also covers the R function for the Shapiro-Wilk test, the binomial formula and calculating probabilities.

More Like This

Pruebas de Normalidad y Alternativas
8 questions
Statistiques: Tests de Shapiro et Levene
18 questions
Statistika: Testování hypotéz
37 questions

Statistika: Testování hypotéz

PeacefulNovaculite1938 avatar
PeacefulNovaculite1938
Use Quizgecko on...
Browser
Browser