The Data Driven Manager: Binomial, Poisson, and Normal Distributions - PDF

Document Details

SubstantiveCliff6183

Uploaded by SubstantiveCliff6183

University of Colorado Boulder

Tags

probability statistics binomial distribution poisson distribution

Summary

This document introduces the basic concepts of data driven management, covering topics such as the binomial, Poisson, and normal distributions. It also includes practice activities designed to help the reader to improve their understanding of statistics.

Full Transcript

The Data Driven Manager 5 Making Decisions with Probability Distributions The Binomial Distribution 3 Learning Objectives Describe the Binomial probability distribution Calculate probabilities using the Binomial distribution...

The Data Driven Manager 5 Making Decisions with Probability Distributions The Binomial Distribution 3 Learning Objectives Describe the Binomial probability distribution Calculate probabilities using the Binomial distribution 4 The Binomial Distribution The Binomial distribution relates to a discrete random variable (nominal data). The basis of this distribution is the Bernoulli process. 5 The Bernoulli Process Each trial or experiment has only two possible outcomes The probability of any and all outcomes remains fixed over time (constant probability) The trials or experiments are statistically independent 6 The Binomial Formula where p = probability of occurrence q = 1-p = probability of failure r = number of occurrences desired n = number of trials 7 Binomial Example A vendor frequently ships 2 bad parts out of 10. Suppose the vendor ships our company 50 parts. If we tell them that at least 9 parts out of 10 must be good, and nothing in their manufacturing process has changed, what is the probability that we will receive what we asked for? 8 Binomial Example p = 0.80, q = 0.20, r = 45, n = 50 9 Binomial Distribution In RStudio and ROIStat Binomial Example in RStudio p = 0.80, q = 0.20, r = 45, n = 50 dbinom(x = 45, size = 50, prob = 0.8) ro(table.dist.binomial(n = 50, p = 0.80),5) 11 Binomial Example in ROIStat Open ROI Stat Go to Distributions > Binomial Enter in the value for p (𝝿) Enter in the sample size (n) Select the Point (R) of Interest 12 13 Binomial Example What if we wanted to know the probability of getting at least 9 out of 10 good parts in the shipment of 50? P ≥ 45? We would sum the following: P(45) + P(46) + P(47) + P(48) + P(49)+P(50) 14 Binomial Example p = 0.80, q = 0.20, r = 45, n = 50 # pbinom gives P[X>x] for upper tail probabilities pbinom(q = 44, size = 50, prob = 0.80 , lower.tail = F) ro(table.dist.binomial(n = 50, p = 0.80),5) 15 The Poisson Distribution 16 Learning Objectives Describe the Poisson probability distribution Calculate probabilities using the Poisson distribution 17 The Poisson Distribution This probability distribution is for discrete random variables which can take integer (whole) values (ordinal data) Examples: ○ The number of parts produced during a 10 minute period ○ The number of breakdowns per shift ○ The number of failures per 100 cycles 18 The Poisson Formula where P(X) = probability exactly X occurrences λ = Mean number of occurrences per time interval (or unit) e = 2.71828 19 Poisson Example λ = 25 parts produced per hour X = 10 parts produced in one hour What is the probability of exactly 10? 20 Poisson Distribution In RStudio and ROIStat Poisson Example in RStudio λ = 25 parts produced per hour X = 10 parts produced in one hour What is the probability of exactly 10? dpois(x = 10, lambda = 25) ro(table.dist.poisson(lambda = 25),5) 22 Poisson Example in ROIStat Open ROI Stat Go to Distributions > Poisson Enter in the value for the count (𝝺) Select the Point (R) of Interest 23 24 Poisson Example What is the probability of producing 18 or fewer? λ = 25 parts produced per hour X ≤ 18 parts produced in one hour 25 Poisson Example ppois(q = 18, lambda = 25, lower.tail = T) ro(table.dist.poisson(lambda = 25)[7:51,],5) 26 27 Testing for a Poisson Distribution It should be noted that all ratio discrete, count data do not necessarily conform to a Poisson Distribution! We must ask, therefore, when presented with such sample data set: “Is it reasonable to infer that the data were drawn from a population that may be approximated by a Poisson Distribution? 28 Testing for a Poisson Distribution Testing in RStudio ○ poisson.dist.test(x = Discrete$DEFECTS) 29 Testing for a Poisson Distribution Although we have not yet discussed it in full, if the p-value is less than 0.05 we reject the hypothesis associated with the test (that is, the data are likely from a Poisson distribution). Remember this mantra: If p is low, Reject H0 30 Testing for a Poisson Distribution 31 Testing for a Poisson Distribution > hist.ungrouped(Discrete$DEFECTS) 32 Testing for a Poisson Distribution In ROIStat Open ROI Stat Go to Distributions > Testing Select the data Reject if the p value is < 0.05 33 Testing for a Poisson Distribution In ROIStat 34 Poisson Distribution Used for monitoring the number of occurrences of a specified event in a specified inspection unit Inspection units can be length, area, number of parts, volume, or time. 35 Example - Nonconformities Nonconformities c=5 Thinking Challenge You work in a software development firm as a supervisor. For every 750 lines of code in programs written by a particular software engineer, you know that historically, there will be an average of 6 errors. Assuming that this engineer has just finished writing an application containing 255 lines of code, what is the probability that this application will be error–free (i.e., have 0 errors)? 37 Poisson Distribution Solution: Finding λ 750 lines of code ~ 6 errors = λ In a 255 line program, we would expect: λ = (255 / 750)(6) = 2.04 errors or λ = 6/750 = 0.008 errors per line x 255 lines; so 0.008 x 255 = 2.04 errors 38 Poisson Distribution Solution: Finding P(0) Produce the distribution for the relevant Poisson Distribution (λ = 2.04) with the following command, including rounding: round.object(table.dist.poisson(2.04),4) The table is on the next slide. 39 Poisson Distribution Solution: Finding P(0) 40 Discrete Probability Distributions Practice Activities Binomial Distribution Example: Assume a supplier has a consistent 10% nonconforming rate. Suppose that the supplier ships 50 parts to your plant in a single lot. What is the probability of finding exactly two nonconforming parts in the 50 parts? What is the probability of finding two or less nonconforming parts in the 50 parts? 42 Binomial Distribution Example: You can use lolcat's 'table.dist.binomial()' function considering: π = 0.10 n = 50 r=2 43 Binomial Distribution > ro(table.dist.binomial(n,p)[1:10,],4) x p.at.x eq.and.above eq.and.below The exact probability of x, or r = 2 can 0 0 0.0052 1.0000 0.0052 be obtained with the following R 1 1 0.0286 0.9948 0.0338 function: 2 2 0.0779 0.9662 0.1117 dbinom(x = 2,size = 50, prob = 0.1) 3 3 0.1386 0.8883 0.2503 4 4 0.1809 0.7497 0.4312 The probability of 2 or fewer can be 5 5 0.1849 0.5688 0.6161 obtained with the following R 6 6 0.1541 0.3839 0.7702 function: 7 7 0.1076 0.2298 0.8779 8 8 0.0643 0.1221 0.9421 pbinom(q = 2, size = 50, prob = 0.1) 9 9 0.0333 0.0579 0.9755 44 Binomial Distribution Assume that a product has a documented failure rate of 0.20 after 150 hours of use. If we were to place 30 randomly selected parts from this process in the field: π = ______ n = ______ 45 Binomial Distribution Assume that a product has a documented failure rate of 0.20 after 150 hours of use. If we were to place 30 randomly selected parts from this process in the field: ○ What is the probability that 5 or fewer will have failed? 46 Binomial Distribution Assume that a product has a documented failure rate of 0.20 after 150 hours of use. If we were to place 30 randomly selected parts from this process in the field: ○ What is the probability that exactly 5 will have failed after 150 hours? 47 Binomial Distribution Assume that a product has a documented failure rate of 0.20 after 150 hours of use. If we were to place 30 randomly selected parts from this process in the field: ○ What is the probability that more than 10 will have failed? 48 Poisson Distribution Example: The number of OSHA-recordable safety accidents in a manufacturing plant has been running 4.2 accidents per 200,000 hours worked. What is the probability of having exactly two accidents in a 200,000-hour work period? Given, λ = 4.2, X=2 P(2) = ________________ 49 Poisson Distribution Example: You can use lolcat's 'table.dist.poisson()' function to get the results (next slide) or directly with the R dpois() function, both demonstrated on the next slide: λ = 4.2 X=2 50 Poisson Distribution ro(table.dist.poisson(lambda)[1:5,],4) x p.at.x eq.and.above eq.and.below 0 0 0.0150 1.0000 0.0150 1 1 0.0630 0.9850 0.0780 2 2 0.1323 0.9220 0.2102 3 3 0.1852 0.7898 0.3954 4 4 0.1944 0.6046 0.5898 51 Poisson Distribution Example: An expeditor has been monitoring the daily production rate of blanked saw chain cutters. On average, the number of buckets per day that have been produced is 65 (𝝺) and the output is representative of a Poisson function What is the probability of producing 50 buckets or more in a day? 52 The Normal Distribution 53 Learning Objectives Describe the Normal probability distribution Calculate probabilities using the Standard Normal distribution 54 The Normal Distribution A theoretical probability distribution for a continuous random variable One of the most important distributions because of its wide range of practical applications 55 The Normal Distribution Mean = Median = Mode γ3 = 0.00 Symmetrical around µ γ4 = 0.00 Tails extend to ∞ but never Areas under the curve are touch the horizontal axis predictable 56 Areas Under the Normal Curve 34.135% 34.135% 13.590% 13.590% 2.140% 2.140% 0.135% 0.135% -3σ -2σ -1σ µ 1σ 2σ 3σ 68.27% 95.45% 57 99.73% Area Calculations The area corresponding to any score value may be found using a z-score, where Z is the number of standard deviation units from X to µ 58 Normal Distribution Example 1 To date, tooling used on a particular drilling process has lasted an average of 180 hours before requiring replacement, with a standard deviation of 5 hours. What is the probability that a tool selected at random from the tool crib will last less than 172 hours before replacement is required? 59 165 170 175 180 185 190 195 172 -3 -2 -1 0 1 2 3 Z -1.60 60 Normal Distribution In RStudio and ROIStat Normal Distribution in RStudio > pnorm(q, mean, sd , lower.tail) 62 Normal Distribution in ROIStat Open ROI Stat Go to Distributions > Normal Enter in the value for the average (𝛍) Enter in the value for the std. dev. (σ) Select the Point of Interest 63 64 Normal Distribution Example 2 A stamping operation has been running consistently, punching two holes in sheet metal. The center-to-center distance between the two holes has been an average (μ) of 5.20mm, with a standard deviation (σ) of 0.05mm. 65 Normal Distribution Example 2 The process produces center-to-center distances that can be modeled with a normal distribution. The specifications for these parts require a maximum, or upper (USL), limit of 5.35mm and a minimum, or lower (LSL), limit of 5.15mm. What percentage of the manufactured parts are likely to fall outside of the specifications? 66 5.05 5.10 5.15 5.20 5.25 5.30 5.35 -3 -2 -1 0 1 2 3 Z 67 Normal Distribution Example 2 The process produces center-to-center distances that can be modeled with a normal distribution. The specifications for these parts require a maximum, or upper (USL), limit of 5.35mm and a minimum, or lower (LSL), limit of 5.15mm. What percentage of the manufactured parts are likely to fall outside of the specifications? 68 Normal Distribution in RStudio > pnorm(q, mean, sd , lower.tail) 69 Normal Distribution in ROIStat Open ROI Stat Go to Distributions > Normal Enter in the value for the average (𝛍) Enter in the value for std. dev. (σ) Select the Point of Interest 70 Testing for Normality When n < 25, use the Anderson-Darling / Shapiro-Wilk tests for normality When n ≥ 25, use Skewness Test, and Kurtosis Test (Moment Tests) 71 Testing for Normality Probabilities ≥ 0.05 indicate that the data are normal Probabilities < 0.05 indicate that the data are NOT normal 72 Testing for Normality Testing for Normality in RStudio inRRStudio In / Rstudio > anderson.darling.normality.test( ) In shapiro.wilk.normality.test( R / Rstudio: ) or anderson.darling.normality.test( summary.continuous( ) ) shapiro.wilk.normality.test( ) or summary.continuous( ) 73 Testing for Normality in ROIStat Open ROI Stat Go to EDA > Normality Tests OR Go to Distributions > Testing 74 Comparing Actual Out of Spec to Predicted Out of Spec When calculating the percent out of specification (or above / below a score value), why don’t we just count the number of values in the sample? 75 Comparing Actual Out of Spec to Predicted Out of Spec Which is more correct? The percentage in the sample you took, or what is predicted in the population based on the normal distribution (given that we tested for normality and can show that it is probable that the sample was drawn from a normal distribution)? We want to make an inference from the sample to the population! 76 Comparing Actual Out of Spec to Predicted Out of Spec Sample - actual out of specification in the sample > sum(data < x)/n or sum(data > x)/n Population - estimated out of specification in the population > pnorm(x, mu, sigma) 77 Comparing Actual Out of Spec to Predicted Out of Spec Example: Using the FlowRate.txt data file… What percentage of values in the sample are < 15? > sum(FlowRate$Flow < 15)/50 = 4.00% 78 Comparing Actual Out of Spec to Predicted Out of Spec What percentage of values in the population are predicted to be < 15? pnorm(q = 15, mean = mean(FlowRate$Flow), sd = sd(FlowRate$Flow), lower.tail = T) = 8.08% 79 The Exponential Distribution 80 Learning Objectives Describe the Exponential probability distribution Calculate probabilities using the Exponential distribution 81 The Exponential Distribution The exponential distribution occurs in a number of situations in the industrial environment. Time to failure often follows an exponential distribution. 82 The Exponential Distribution Measurement from a physical process that has a restraint, such as the location of a hole from a reference edge, where the reference edge is pressed against a fixture, may follow an exponential distribution. Roundness of shaft, measured by total indicator reading, may also follow this type of distribution. 83 The Exponential Distribution The exponential distribution is a continuous random variable probability distribution with the form: 84 The Exponential Distribution When Xmin = 0, the equation reduces to: 85 The Exponential Distribution The normal distribution contains an area of 50% above and 50% below µ. With the exponential distribution, 36.8% of the area under the curve is above the average (µ) and 63.2% is below. 86 Applications / Observations Predictions based on an exponentially distributed process often only require the µ (and sometimes Xmin) of the process. For prediction purposes, finding the area under the curve beyond the time period of concern is generally the point of interest. These prediction often relate to reliability issues or time between failure analyses. 87 Exponential Distribution Example 1 An in-plant study has shown that an engine control module laboratory tester is capable of operating on an average of 100 hours between breakdowns (MTBF). What is the probability that the tester will run for at least 60 successive hours without a breakdown (assuming that the time to failure pattern is distributed exponentially)? 88 Exponential Distribution Example 1 µ =100 0 60 x 89 Exponential Distribution in RStudio > pexp(q, rate, lower.tail) 90 Exponential Distribution in ROIStat Open ROI Stat Go to Distributions > Exponential Enter in the value for the the average (𝛍) Enter in the value for the minimum value (Xmin) Select the Point of Interest 91 Exponential Distribution Example 2 The distribution of time for a particular grinding machine is characterized by the exponential distribution. The mean time between breakdowns has been established at 50 minutes. The origin parameter (Xmin) is estimated to be 5 minutes. What is the probability of this machine running 20 minutes or less before a breakdown? 92 Exponential Distribution Example 2 µ = 50 5 20 x 93 Exponential Distribution in RStudio > pexp(q, rate, lower.tail) > pexp.low(q, low, mean, lower.tail) 94 Exponential Distribution in ROIStat Open ROI Stat Go to Distributions > Exponential Enter in the value for the the average (𝛍) Enter in the value for the minimum value (Xmin) Select the Point of Interest 95 Testing for Exponentiality Always test for normality first! When n ≤ 100, use the Shapiro-Wilk test When n > 100, use the Epps and Pulley test 96 Testing for Exponentiality Probabilities ≥ 0.05 indicate that the data are exponential Probabilities < 0.05 indicate that the data are NOT exponential 97 Testing for Exponentiality in RStudio > shapiro.wilk.exponentiality.test( ) > shapetest.exp.epps.pulley.1986( ) 98 Testing for Exponentiality in ROIStat Open ROI Stat Go to Distributions > Testing > Exponential Select the data to test If using Shapiro Wilk or MVP, click on the ‘Start Simulation’ button 99 Continuous Probability Distributions Practice Activities Normal Distribution Example Past participants in a training program designed to upgrade the skills of production-line supervisors spent an average of 500 hours on the program, with standard deviation of 100 hours. Assume a normal distribution. What is the probability that a participant selected at random will require more than 500 hours to complete the program? What is the probability that a candidate selected at random will take between 550 and 650 hours to complete the program? 101 Normal Distribution Example What is the probability that a candidate selected at random will take between 550 and 650 hours to complete the program? > pnorm(650,500,100) = 0.9331928 # 650 hours or less > pnorm(550,500,100) = 0.6374625 # 550 hours or less The difference between the two is the answer: 0.2417333 102 Normal Distribution Practice Activity A process has typically run at a 𝝻 of 163 with a 𝞂 of 12. The specifications for the part are A = 17.9659% 169 ± 5. B = 53.3207% What is the probability that a single part C = 99.7300% selected at random from a standard lot D = 71.2866% will be out of specification assuming that a normal distribution has been documented? 103 Exponential Distribution Example A research study has shown that the time required to return a response to a bid request at an automotive supplier is exponentially distributed with a mean of 72.5 hours; and an origin parameter of 25 hours. What percentage of responses are submitted in less than 48 hours ? 104 Exponential Distribution Practice Activity The distribution of time between breakdowns or stoppages for a particular grinding machine is characterized by the exponential distribution. The grinding machine automatically grinds the cutting edge in the gullet of a saw chain cutter. Statistically, the mean time between breakdowns has been established as 46 minutes. Also, the minimum value is estimated to be five minutes. 105 Exponential Distribution Practice Activity What is the probability of this particular machine running 15 minutes or less before a breakdown? A = 78.3564% B = 21.6436% C = 72.1742% D = 27.8258% 106 Exponential Distribution Practice Activity What is the probability of it running 60 minutes or more before a failure occurs? A = 72.8651% B = 26.1463% C = 73.8537% D = 27.1349% 107

Use Quizgecko on...
Browser
Browser