EDA CO1 PDF
Document Details
Uploaded by Deleted User
Mapúa University
Engr. Reginald Verdida MSc
Tags
Summary
This document is a presentation about probability distributions. It covers discrete probability distributions such as binomial, hypergeometric, geometric, negative binomial, and Poisson distributions. It also covers continuous probability distributions like normal, uniform, exponential, and gamma distributions. The document also contains various examples, problems, and formulas related to these distributions.
Full Transcript
An experiment that can result in different outcomes, even though it is repeated in the same manner every time, is called a random experiment. Sample spaces: The set of all possible outcomes of a random experiment is called the sample space of the experiment. The sample space is denoted as 𝑆. A sa...
An experiment that can result in different outcomes, even though it is repeated in the same manner every time, is called a random experiment. Sample spaces: The set of all possible outcomes of a random experiment is called the sample space of the experiment. The sample space is denoted as 𝑆. A sample space is often defined based on the objectives of the analysis. There are two types of Sample Space: A sample space is discrete if it consists of a finite or countable infinite set of outcomes. A sample space is continuous if it contains an interval (either finite or infinite) of real numbers. Event: An event is a subset of the sample space of a random experiment. We denote them as 𝐸. We can also describe events as combinations of existing events, such as: Venn Diagrams are used to represent a sample space and events in a sample space. Two events, denoted as 𝐸1 𝑎𝑛𝑑 𝐸2 are said to be mutually exclusive events such that, A random variable is a function that associates a real number with each element in the sample space. Notation is used to distinguish between a random variable and the real number. Notation: A random variable is denoted by an uppercase letter such as X. After an experiment is conducted, the measured value of the random variable is denoted by a lowercase letter such as x = 70 milliamperes. A discrete random variable is a random variable with a finite (or countably infinite) range. Examples of continuous random variables: electrical current, length, pressure, temperature, time, voltage, weight A continuous random variable is a random variable with an interval (either finite or infinite) of real numbers for its range. Examples of discrete random variables: number of scratches on a surface, proportion of defective parts among 1000 tested, number of transmitted bits received in error A probability distribution can be defined as a function that describes all possible values of a random variable as well as the associated probabilities. Discrete probability distribution is a type of probability distribution that shows all possible values of a discrete random variable along with the associated probabilities. In other words, a discrete probability distribution gives the likelihood of occurrence of each possible value of a discrete random variable. There are two conditions that a discrete probability distribution must satisfy. These are given as follows: (1) 0 ≤ 𝑃(𝑋 = 𝑥) ≤ 1 (2) σ 𝑃 𝑋 = 𝑥 = 1. Types of Discrete Probability Distributions Binomial Distribution Hypergeometric Distribution Geometric Distribution Negative Binomial Distribution Poisson Distribution Binomial distribution is the discrete probability distribution that gives only two possible results in an experiment, either success or failure. 𝑛 𝑥 𝑛−𝑥 𝑃(𝑋) = 𝑝 𝑞 𝑥 Where: 𝑃 𝑋 = probability of x successes 𝑛 = Bernoulli trials 𝑝 = Probability of successful event 𝑞 = Probability an event fails 𝑥 = number of successful events Mean: µ = 𝑛𝑝 Variance: 𝜎 2 = 𝑛𝑝(1 − 𝑝) A six-sided die is rolled 12 times. What is the probability of getting a 4 five times? 𝑛 𝑥 𝑛−𝑥 𝑃(𝑋) = 𝑝 𝑞 𝑥 A multiple-choice test contains 10 questions. Only one answer among the 4 choices to each question represents the correct answer. Find the probability that a student will answer exactly 6 questions correctly if he makes random guesses on all questions. 𝑛 𝑥 𝑛−𝑥 𝑃(𝑋) = 𝑝 𝑞 𝑥 20% of all sophomores are enrolled in Engineering and Data Analysis (EDA) course. Approximately 50 students are taken in random. Find the probability that exactly 17 students out of the 50 are taking up EDA. Determine the mean and standard deviation. The hypergeometric distribution describes the number of successes in a sequence of n trials from a finite population without replacement. 𝐾 𝑁−𝐾 𝑃 𝑥 = 𝑥 𝑛−𝑥 𝑁 𝑛 Where: 𝑁 = 𝑠𝑒𝑡 𝑜𝑓 𝑎𝑙𝑙 𝑜𝑏𝑗𝑒𝑐𝑡𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑒𝑑 𝐾 = 𝑜𝑏𝑗𝑒𝑐𝑡𝑠 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑 𝑎𝑠 𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠 𝑁 − 𝐾 = 𝑜𝑏𝑗𝑒𝑐𝑡𝑠 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑 𝑎𝑠 𝑓𝑎𝑖𝑙𝑢𝑟𝑒𝑠 𝑥 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠 𝑛 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑖𝑎𝑙𝑠 6 doctors and 19 nurses attend a small conference. All 25 names are put in a list, and 5 names are randomly picked without replacement for a raffle. What is the probability that 4 doctors out of the 5 names are picked? 𝐾 𝑁−𝐾 𝑃 𝑥 = 𝑥 𝑛−𝑥 𝑁 𝑛 Suppose a large urn contains 400 red marbles and 600 blue marbles. A random sample of 10 marbles is drawn without replacement. What is the probability that exactly 3 are red? 𝐾 𝑁−𝐾 𝑃 𝑥 = 𝑥 𝑛 − 𝑥 𝑁 𝑛 Geometric distribution is a discrete probability distribution that describes the chances of achieving success in a series of independent trials, each having two possible outcomes. 𝑃 𝑋 = 𝑥 = 𝑞 𝑥−1 𝑝 Cumulative Geometric Distribution: 𝑃 𝑋 ≤ 𝑥 = 1 − 𝑞𝑥 Mean: 1 µ= 𝑝 Variance: 2 1−𝑝 𝜎 = 2 𝑝 According to research, 25% of all cars passing along a certain road are red on weekdays. What is the probability that the 5th car will be the first red car that passes through the road on a Tuesday? QA analyst of a particular tire manufacturing company determines that 2% of all tires produced in a day are defects. A random sample of 50 tires is tested. (a) What is the probability that the 5th tire selected is a defect? (b) What is the probability that the first defect is identified among the first 20 tires? (c) How many tires would they expect to test until they find the first defective one everyday? From the latest census, about 4% of the population in a particular state works as an engineer. (a) What is the probability that the 8th person that you meet in this state is an engineer? (b) What is the probability that you find the first engineer is among the first 10 people that you meet? (c) Calculate the mean, variance, and standard deviation. If Binomial Distribution is defined as the number of successes in n Bernoulli trials, Negative Binomial Distribution is defined as the number of Bernoulli Trials until rth successes. 𝑛 − 1 𝑟 𝑛−𝑟 𝑃(𝑋) = 𝑝 𝑞 𝑟−1 Mean: 𝑟 µ= 𝑝 Variance: 2 𝑟(1 − 𝑝) 𝜎 = 𝑝2 A data analyst conducting telephone surveys must get 10 or more completed surveys before their job is finished. On each randomly dialed number, there is a 9% chance of reaching an adult who will complete the survey. (a) What is the probability that the 3rd completed survey will occur on the 10th call? (b) what is the probability that he will not finish his job after 50 calls? Suppose we inspect a sequence of tire samples from the tire company, and each tire is classified as either a defect or non- defect. Note that the probability of defective tires is 2% on a given production day. (a) What is the probability that the first defective tire will be identified on the 27th inspection? (b) What is the probability that the 3rd defective tire will be identified on the 76th inspection? An oil company conducts a geological study that indicates that an exploratory oil well should have a 20% chance of striking oil. (a) What is the probability that the third strike comes on the seventh well that was drilled? (b) What is the mean and variance of the number of wells that must be drilled if the oil company wants to set up three producing wells? Poisson Distribution is a discrete probability distribution that gives the probability of a number of independent events occurring in a fixed time. 𝜇 𝑥 𝑒 −𝜇 𝑃 𝑋=𝑥 = 𝑥! Cumulative Poisson Distribution: −𝜇 𝜇𝑥𝑛 𝑃 𝑋=𝑥 =𝑒 𝑥! An online shop receives an average of 12 orders per day. (a) What is the probability that the business will receive exactly 8 orders in a given day? A startup receives message, on average, 7 text messages in a 3-hour period timeframe? (a) What is the probability that the business will receive exactly 9 text messages in a 3-hour period? (b) What is the probability that the startup will receive exactly 24 text messages in 8 hours? A small business, on average, has 8 calls per hour. (a) What is the probability that the business will receive exactly 3 calls in 1 hour? (b) What is the probability that the business will receive, at most, 5 calls in one hour? (c) What is the probability that the business will receive more than 6 calls In one hour? ○ Normal ○ Gamma ○ Uniform ○ Weibull ○ Standard Normal ○ Lognormal ○ Exponential ○ Beta Discrete ○ Erlang ○ Joint 𝑓 𝑃𝑎 𝑏 𝑎 𝑡𝑜 𝑏 The Normal Distribution is a family of continuous distributions that can model many histograms of real-life data which are mound-shaped (bell-shaped) and symmetric. The distribution of a normal random variable with a mean of zero and variance of 1 is called the standard normal distribution. If the normal distribution is not standard, we use the following formula for transformation: 𝑋−𝜇 𝑍= 𝜎 Density Function: 1 −0.5𝑧 2 𝑓 𝑧 = 𝑒 2𝜋 The density function for a uniform random variable x on an interval [A,B] is: 1 𝑓 𝑥 = , 𝑎≤𝑥≤𝑏 𝑏−𝑎 𝑓 𝑥 = 0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒 Mean and variance: In a particular bus station, the amount of time a person has to wait before a bus arrives is uniformly distributed between 1 and 25 minutes. (a) Draw the graph of f(x). (b) Determine the probability density function f(x). (c) What is the probability that a person must wait less than 10 minutes? (d) What is the probability that a person must wait more than 15 minutes? (e) Calculate 𝑃 5 < 𝑥 < 7 , 𝑃 𝑥 = 12 𝑎𝑛𝑑 𝑃(𝑥 > 25). (f) What is the 78th percentile? A student takes between 40 and 65 minutes to finish a statistics test and is uniformly distributed. (a) Write the probability density function f(x). (b) What is the probability that a student finishes the exam in more than 45 minutes? (c) What is the probability that the student will take between 42 and 55 minutes to complete the test? (d) Determine the mean, variance, and standard deviation. (e) What is the 85th percentile? (f) What is the probability that the student will take more than 50 minutes to complete a test given that he always take more than 40 minutes to complete any statistics test? The derivation of the distribution of X depends only on the assumption that the outcomes follow a Poisson process. Density function: 𝑓 𝑥 = λ𝑒 −λ𝑥 𝑓𝑜𝑟 0 ≤ 𝑥 ≤ ∞ Mean and variance: A tech company manufactures a computer that lasts, on average, 5 years. Its lifespan was determined to follow an exponential distribution function. (a) Calculate the rate parameter. (b) Write the probability density function? (c) What is the probability that a computer will last less than 4 years? (d) What is the probability that a laptop will last more than 10 years. (e) What is the probability that a laptop will last between 4 and 7 years? Suppose X has an exponential distribution with rate parameter = 3. Determine the following: (a) 𝑃 𝑋 ≤ 0 (b) 𝑃 𝑋 ≥ 2 (c) 𝑃 𝑋 ≤ 1 (d) 𝑃 1 < 𝑋 < 2 (e) Find the value of x such that 𝑃 𝑋 < 𝑥 = 0.05 A generalization of the exponential distribution is the length until r events occur in a Poisson process. To get the probability, Density function: This probability function defines an Erlang random variable. An Erlang random variable with r=1 is an exponential random variable. The failures of the central processor units of large computer systems are often modeled as a Poisson process. Typically, failures are not caused by components wearing out but by more random failures of the large number of semiconductor circuits in the units. Assume that the units that fail are immediately repaired, and assume that the mean number of failures per hour is 0.0001. Let X denote the time until four failures occur in a system. Determine the probability that X exceeds 40,000 hours. Is a gamma random variable with parameters λ>0 and r> 0. If r is an integer, X has an Erlang distribution. Density function: The gamma function is defined by: Mean and Variance: The time to prepare a micro-array slide for high-throughput genomics is a Poisson process with a mean of two hours per slide. What is the probability that 10 slides require more than 25 hours to prepare? The Weibull distribution is often used to model the time until failure of many different physical systems. The parameters in the distribution provide a great deal of flexibility to model systems in which the number of failures increases with time (bearing wear), decreases with time (some semiconductors), or remains constant with time (failures caused by external shocks to the system). A log-normal distribution is a statistical distribution of logarithmic values from a related normal distribution. A log-normal distribution can be translated to a normal distribution and vice versa using associated logarithmic calculations. Density Function: Mean and Variance: The Beta distribution is a continuous probability distribution often used to model the uncertainty about the probability of success of an experiment. Density Function: