Math 403 Engineering Data Analysis PDF
Document Details
Uploaded by Deleted User
Cabaces, Donnalyn C.Marcaida, Marjorie G.Sotto, Rodolfo Jr. C.
Tags
Summary
This document is the chapter one of a textbook on engineering data analysis. It covers topics such as the methods of obtaining data, statistical terms, including population, sample, data, and different types. The chapter also covers planning and conducting surveys, explaining different methods such as face-to-face interviews and self-administered questionnaires.
Full Transcript
MATH 403- ENGINEERING DATA ANALYSIS MATH 403 Engineering Data Analysis CABACES, DONNALYN C. MARCAIDA, MARJORIE G. SOTTO, RODOLFO JR. C. MATH 403- ENGINEERING DATA ANALYSIS Chapter 1 OBTAINING DATA Introduct...
MATH 403- ENGINEERING DATA ANALYSIS MATH 403 Engineering Data Analysis CABACES, DONNALYN C. MARCAIDA, MARJORIE G. SOTTO, RODOLFO JR. C. MATH 403- ENGINEERING DATA ANALYSIS Chapter 1 OBTAINING DATA Introduction Statistics may be defined as the science that deals with the collection, organization, presentation, analysis, and interpretation of data in order be able to draw judgments or conclusions that help in the decision-making process. The two parts of this definition correspond to the two main divisions of Statistics. These are Descriptive Statistics and Inferential Statistics. Descriptive Statistics, which is referred to in the first part of the definition, deals with the procedures that organize, summarize and describe quantitative data. It seeks merely to describe data. Inferential Statistics, implied in the second part of the definition, deals with making a judgment or a conclusion about a population based on the findings from a sample that is taken from the population. Intended Learning Outcomes At the end of this module, it is expected that the students will be able to: 1. Demonstrate an understanding of the different methods of obtaining data. 2. Explain the procedures in planning and conducting surveys and experiments. MATH 403- ENGINEERING DATA ANALYSIS Statistical Terms Before proceeding to the discussion of the different methods of obtaining data, let us have first definition of some statistical terms: Population or Universe refers to the totality of objects, persons, places, things used in a particular study. All members of a particular group of objects (items) or people (individual), etc. which are subjects or respondents of a study. Sample is any subset of population or few members of a population. Data are facts, figures and information collected on some characteristics of a population or sample. These can be classified as qualitative or quantitative data. Ungrouped (or raw) data are data which are not organized in any specific way. They are simply the collection of data as they are gathered. Grouped Data are raw data organized into groups or categories with corresponding frequencies. Organized in this manner, the data is referred to as frequency distribution. Parameter is the descriptive measure of a characteristic of a population Statistic is a measure of a characteristic of sample Constant is a characteristic or property of a population or sample which is common to all members of the group. Variable is a measure or characteristic or property of a population or sample that may have a number of different values. It differentiates a particular member from the rest of the group. It is the characteristic or property that is measured, controlled, or manipulated in research. They differ in many respects, most notably in the role they are given in the research and in the type of measures that can be applied to them. MATH 403- ENGINEERING DATA ANALYSIS 1.1 Methods of Data Collection Collection of the data is the first step in conducting statistical inquiry. It simply refers to the data gathering, a systematic method of collecting and measuring data from different sources of information in order to provide answers to relevant questions. This involves acquiring information published literature, surveys through questionnaires or interviews, experimentations, documents and records, tests or examinations and other forms of data gathering instruments. The person who conducts the inquiry is an investigator, the one who helps in collecting information is an enumerator and information is collected from a respondent. Data can be primary or secondary. According to Wessel, “Data collected in the process of investigation are known as primary data.” These are collected for the investigator’s use from the primary source. Secondary data, on the other hand, is collected by some other organization for their own use but the investigator also gets it for his use. According to M.M. Blair, “Secondary data are those already in existence for some other purpose than answering the question in hand.” In the field of engineering, the three basic methods of collecting data are through retrospective study, observational study and through a designed experiment. A retrospective study would use the population or sample of the historical data which had been archived over some period of time. It may involve a significant amount of data but those data may contain relatively little useful information about the problem, some of the relevant data may be missing, recording errors or transcription may be present, or those other important data may not have been gathered and archived. These result in statistical analysis of historical data which identifies interesting phenomena but difficulty of obtaining solid and reliable explanations is encountered. MATH 403- ENGINEERING DATA ANALYSIS In an observational study, however, process or population is observed and disturbed as little as possible, and the quantities of interests are recorded. In a designed experiment, deliberate or purposeful changes in the controllable variables of the system or process is done. The resulting system output data must be observed, and an inference or decision about which variables are responsible for the observed changes in output performance is made. Experiments designed with basic principles such as randomization are needed to establish cause-and-effect relationships. Much of what we know in the engineering and physical-chemical sciences is developed through testing or experimentation. In engineering, there are problem areas with no scientific or engineering theory that are directly or completely applicable, so experimentation and observation of the resulting data is the only way to solve them. There are times there is a good underlying scientific theory to explain the phenomena of interest. Tests or experiments are almost always necessary to be conducted to confirm the applicability and validity of the theory in a specific situation or environment. Designed experiments are very important in engineering design and development and in the improvement of manufacturing processes in which statistical thinking and statistical methods play an important role in planning, conducting, and analyzing the data. (Montgomery, et al., 2018) 1.2 Planning and Conducting Surveys A survey is a method of asking respondents some well-constructed questions. It is an efficient way of collecting information and easy to administer wherein a wide variety of information can be collected. The researcher can be focused and can stick to the questions that interest him and are necessary in his statistical inquiry or study. MATH 403- ENGINEERING DATA ANALYSIS However surveys depend on the respondents honesty, motivation, memory and his ability to respond. Sometimes answers may lead to vague data. Surveys can be done through face-to-face interviews or self-administered through the use of questionnaires. The advantages of face-to-face interviews include fewer misunderstood questions, fewer incomplete responses, higher response rates, and greater control over the environment in which the survey is administered; also, the researcher can collect additional information if any of the respondents’ answers need clarifying. The disadvantages of face-to-face interviews are that they can be expensive and time-consuming and may require a large staff of trained interviewers. In addition, the response can be biased by the appearance or attitude of the interviewer. Self-administered surveys are less expensive than interviews. It can be administered in large numbers and does not require many interviewers and there is less pressure on respondents. However, in self-administered surveys, the respondents are more likely to stop participating mid-way through the survey and respondents cannot ask to clarify their answers. There are lower response rates than in personal interviews. When designing a survey, the following steps are useful: 1. Determine the objectives of your survey: What questions do you want to answer? 2. Identify the target population sample: Whom will you interview? Who will be the respondents? What sampling method will you use? 3. Choose an interviewing method: face-to-face interview, phone interview, self- administered paper survey, or internet survey. 4. Decide what questions you will ask in what order, and how to phrase them. 5. Conduct the interview and collect the information. MATH 403- ENGINEERING DATA ANALYSIS 6. Analyze the results by making graphs and drawing conclusions. In choosing the respondents, sampling techniques are necessary. Sampling is the process of selecting units (e.g., people, organizations) from a population of interest. Sample must be a representative of the target population. The target population is the entire group a researcher is interested in; the group about which the researcher wishes to draw conclusions. There are two ways of selecting a sample. These are the non-probability sampling and the probability sampling. Non-Probability Sampling Non-probability sampling is also called judgment or subjective sampling. This method is convenient and economical but the inferences made based on the findings are not so reliable. The most common types of non-probability sampling are the convenience sampling, purposive sampling and quota sampling. In convenience sampling, the researcher use a device in obtaining the information from the respondents which favors the researcher but can cause bias to the respondents. In purposive sampling, the selection of respondents is predetermined according to the characteristic of interest made by the researcher. Randomization is absent in this type of sampling. There are two types of quota sampling: proportional and non proportional. In proportional quota sampling the major characteristics of the population by sampling a proportional amount of each is represented. MATH 403- ENGINEERING DATA ANALYSIS For instance, if you know the population has 40% women and 60% men, and that you want a total sample size of 100, you will continue sampling until you get those percentages and then you will stop. Non-proportional quota sampling is a bit less restrictive. In this method, a minimum number of sampled units in each category is specified and not concerned with having numbers that match the proportions in the population. Probability Sampling In probability sampling, every member of the population is given an equal chance to be selected as a part of the sample. There are several probability techniques. Among these are simple random sampling, stratified sampling and cluster sampling. Simple Random Sampling Simple random sampling is the basic sampling technique where a group of subjects (a sample) is selected for study from a larger group (a population). Each individual is chosen entirely by chance and each member of the population has an equal chance of being included in the sample. Every possible sample of a given size has the same chance of selection; i.e. each member of the population is equally likely to be chosen at any stage in the sampling process. Stratified Sampling There may often be factors which divide up the population into sub-populations (groups / strata) and the measurement of interest may vary among the different sub- populations. This has to be accounted for when a sample from the population is selected MATH 403- ENGINEERING DATA ANALYSIS in order to obtain a sample that is representative of the population. This is achieved by stratified sampling. A stratified sample is obtained by taking samples from each stratum or sub-group of a population. When a sample is to be taken from a population with several strata, the proportion of each stratum in the sample should be the same as in the population. Stratified sampling techniques are generally used when the population is heterogeneous, or dissimilar, where certain homogeneous, or similar, sub-populations can be isolated (strata). Simple random sampling is most appropriate when the entire population from which the sample is taken is homogeneous. Some reasons for using stratified sampling over simple random sampling are: 1. the cost per observation in the survey may be reduced; 2. estimates of the population parameters may be wanted for each subpopulation; 3. increased accuracy at given cost. Cluster Sampling Cluster sampling is a sampling technique where the entire population is divided into groups, or clusters, and a random sample of these clusters are selected. All observations in the selected clusters are included in the sample. 1.3 Planning and Conducting Experiments: Introduction to Design of Experiments The products and processes in the engineering and scientific disciplines are mostly derived from experimentation. An experiment is a series of tests conducted in a systematic manner to increase the understanding of an existing process or to explore a new product or process. Design of Experiments, or DOE, is a tool to develop an MATH 403- ENGINEERING DATA ANALYSIS experimentation strategy that maximizes learning using minimum resources. Design of Experiments is widely and extensively used by engineers and scientists in improving existing process through maximizing the yield and decreasing the variability or in developing new products and processes. It is a technique needed to identify the "vital few" factors in the most efficient manner and then directs the process to its best setting to meet the ever-increasing demand for improved quality and increased productivity. The methodology of DOE ensures that all factors and their interactions are systematically investigated resulting to reliable and complete information. There are five stages to be carried out for the design of experiments. These are planning, screening, optimization, robustness testing and verification. 1. Planning It is important to carefully plan for the course of experimentation before embarking upon the process of testing and data collection. At this stage, identification of the objectives of conducting the experiment or investigation, assessment of time and available resources to achieve the objectives. Individuals from different disciplines related to the product or process should compose a team who will conduct the investigation. They are to identify possible factors to investigate and the most appropriate responses to measure. A team approach promotes synergy that gives a richer set of factors to study and thus a more complete experiment. Experiments which are carefully planned always lead to increased understanding of the product or process. Well planned experiments are easy to execute and analyze using the available statistical software. MATH 403- ENGINEERING DATA ANALYSIS 2. Screening Screening experiments are used to identify the important factors that affect the process under investigation out of the large pool of potential factors. Screening process eliminates unimportant factors and attention is focused on the key factors. Screening experiments are usually efficient designs which require few executions and focus on the vital factors and not on interactions. 3. Optimization After narrowing down the important factors affecting the process, then determine the best setting of these factors to achieve the objectives of the investigation. The objectives may be to either increase yield or decrease variability or to find settings that achieve both at the same time depending on the product or process under investigation. 4. Robustness Testing Once the optimal settings of the factors have been determined, it is important to make the product or process insensitive to variations resulting from changes in factors that affect the process but are beyond the control of the analyst. Such factors are referred to as noise or uncontrollable factors that are likely to be experienced in the application environment. It is important to identify such sources of variation and take measures to ensure that the product or process is made robust or insensitive to these factors. 5. Verification This final stage involves validation of the optimum settings by conducting a few follow- up experimental runs. This is to confirm that the process functions as expected and all objectives are achieved. MATH 403- ENGINEERING DATA ANALYSIS REFERENCES: Montgomery, Douglas C.,et al., Applied Statistics and Probabiliy for Engineers, 7th ed., John Wiley & Sons (Asia) Pte Ltd, 2018 Panopio, Felix M. (2004). Statistics with Probability. Batangas City, Philippines: Feliber Publishing House Rawley, Eve. Planning and Conducting Surveys. https://www.ck12.org/statistics/planning-and- conducting-surveys/lesson/Planning-and-Conducting-Surveys-ALG-I/ Date accessed: July 27, 2020 Walpole, Ronald E., et al., Probability and Statistics for Engineers and Scientists, 9th ed., Pearson Education Inc., 2016 Introduction to Design of Experiments. https://www.weibull.com/hotwire/issue84/hottopics84.htm. Date Accessed: April 15, 2020 https://mathspace.co/learn/world-of-maths/language-and-use-of-statistics/planning-a-statistical- investigation-i-investigation-18643/investigation-statistical-inquiry-916/ MATH 403- ENGINEERING DATA ANALYSIS CHAPTER TEST Answer the following. 1. Explain the different methods how you can obtain data. 2. As one of the students of EDA class, you are tasked to conduct a survey to show which extracurricular activities the students from the College of Engineering, Architecture and Fime Arts would like to engage in during the first semester. Follow the presented steps in conducting a survey. 3. You are asked to conduct an experiment in a catapult shown in the figure. It ia a table-top wooden device used in teaching design of experiments and statistical process control. The objective of the experiment is to determine the significant factors that affect the distrance travelled by the ball at it is thrown by the catapult. Also, you are to establish the settings to reach 25, 50, 75 and 100 inches. The response variable is the distance and the factors are the band height, start angle, number of rubber bands used ( 1 or 2), arm length, and the stop angle. Explain how are you going to conduct the experiment taking note of the stages of planning and conducting design of expermients. MATH 403- ENGINEERING DATA ANALYSIS Chapter 2 PROBABILITY Introduction Probability is simply how likely an event is to happen. “The chance of rain today is 50%” is a statement that enumerates our thoughts on the possibility of rain. The likelihood of an outcome is measured by assigning a number from the interval [0, 1] or as percentage from 0 to 100%. The higher the number means the event is more likely to happen than the lower number. A zero (0) probability indicates that the outcome is impossible to happen while a probability of one (1) indicates that the outcome will occur inevitably. This module intends to discuss the concept of probability for discrete sample spaces, its application, and ways of solving the probabilities of different statistical data. Intended Learning Outcomes At the end of this module, it is expected that the students will be able to: 1. Understand and describe sample spaces and events for random experiments 2. Explain the concept of probability and its application to different situations 3. Define and illustrate the different probability rules 4. Solve for the probability of different statistical data. MATH 403- ENGINEERING DATA ANALYSIS Probability Probability is the likelihood or chance of an event occurring. 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑤𝑎𝑦𝑠 𝑎𝑐ℎ𝑖𝑒𝑣𝑖𝑛𝑔 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = 𝑡ℎ𝑒 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 For example, the probability of flipping a coin and it being heads is ½, because there is 1 way of getting a head and the total number of possible outcomes is 2 (a head or tail). We write P(heads) = ½. The probability of something which is certain to happen is 1. The probability of something which is impossible to happen is 0. The probability of something not happening is 1 minus the probability that it will happen. Experiment – is used to describe any process that generates a set of data Event – consists of a set of possible outcomes of a probability experiment. Can be one outcome or more than one outcome. Simple event – an event with one outcome. Compound event – an event with more than one outcome. 2.1 Sample Space and Relationships among Events Sample space is the set of all possible outcomes or results of a random experiment. Sample space is represented by letter S. Each outcome in the sample space is called an element of that set. An event is the subset of this sample space and it is represented by MATH 403- ENGINEERING DATA ANALYSIS letter E. This can be illustrated in a Venn Diagram. In Figure 2.1, the sample space is represented by the rectangle and the events by the circles inside the rectangle. The events A and B (in a to c) and A, B and C (in d and e) are all subsets of the sample space S. Figure 2.1 Venn diagrams of sample space with events (adapted from Montgomery et al., 2003) For example if a dice is rolled we have {1, 2, 3, 4, 5, and 6} as sample space. The event can be {1, 3, and 5} which means set of odd numbers. Similarly, when a coin is tossed twice the sample space is {HH, HT, TH, and TT}. MATH 403- ENGINEERING DATA ANALYSIS Difference between Sample Space and Events As discussed in the beginning sample space is set of all possible outcomes of an experiment and event is the subset of sample space. Let us try to understand this with few examples. What happens when we toss a coin thrice? If a coin is tossed three times we get following combinations, HHH, HHT, HTH,THH, TTH, THT, HTT and TTT All these are the outcomes of the experiment of tossing a coin three times. Hence, we can say the sample space is the set given by, S = {HHH, HHT, HTH,THH, TTH, THT, HTT, TTT} Now, suppose the event be the set of outcomes in which there are only two heads. The outcomes in which we have only two heads are HHT, HTH and THH hence the event is given by, E = {HHT, HTH, THH} We can clearly see that each element of set E is in set S, so E is a subset of S. There can be more than one event. In this case, we can have an event as getting only one tail or event of getting only one head. If we have more than one event we can represent these events by E1, E2, E3 etc. We can have more than one event for a Sample space but there will be one and only one Sample space for an Event. If we have Events E1, E2, E3, …… En as all the possible subset of sample space then we have, S = E1 ∪ E2 ∪ E3 ∪ …….∪ En MATH 403- ENGINEERING DATA ANALYSIS We can understand this with the help of a simple example. Consider an experiment of rolling a dice. We have sample space, S = {1, 2, 3, 4, 5, 6} Now if we have Event E1 as getting odd number as outcome and E2 as getting even number as outcome for this experiment then we can represent E 1 and E2 as the following set, E1 = {1, 3, 5} E2 = {2, 4, 6} So we have {1, 3, 5} ∪ {2, 4, 6} = {1, 2, 3, 4, 5, 6} Or S = E1 ∪ E2 Hence, we can say union of Events E1 and E2 is S. Null space – is a subset of the sample space that contains no elements and is denoted by the symbol . It is also called empty space. Operations with Events Intersection of events The intersection of two events A and B is denoted by the symbol A B. It is the event containing all elements that are common to A and B. This is illustrated as the shaded region in Figure 2.1 (c). MATH 403- ENGINEERING DATA ANALYSIS For example, Let A = {3,6,9,12,15} and B = {1,3,5,8,12,15,17}; then A B = {3,12,15} Let X = {q, w, e, r, t,} and Y = {a, s, d, f}; then X Y = , since X and Y have no elements in common. Mutually Exclusive Events We can say that an event is mutually exclusive if they have no elements in common. This is illustrated in Figure 2.1 (b) where we can see that A B =. Union of Events The union of events A and B is the event containing all the elements that belong to A or to B or to both and is denoted by the symbol A B. The elements A B maybe listed or defined by the rule A B = { x | x A or x B}. For example, Let A = {a,e,i,o,u} and B = {b,c,d,e,f}; then A B = {a,b,c,d,e,f,i,o,u} Let X = {1,2,3,4} and Y = {3,4,5,6}; then A B = {1,2,3,4,5,6} Compliment of an Event The complement of an event A with respect to S is the set of all elements of S that are not in A and is denoted by A’. The shaded region in Figure 2.1 (e) shows (A C)’. For example, Consider the sample space S = {dog, cow, bird, snake, pig} Let A = {dog, bird, pig}; then A’ = {cow, snake} MATH 403- ENGINEERING DATA ANALYSIS Probability of an Event Sample space and events play important roles in probability. Once we have sample space and event, we can easily find the probability of that event. We have following formula to find the probability of an event. 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑝𝑎𝑐𝑒 𝑜𝑓 𝑎𝑛 𝑒𝑥𝑝𝑒𝑟𝑖𝑚𝑒𝑛𝑡 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎𝑛 𝑒𝑣𝑒𝑛𝑡 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑒𝑣𝑒𝑛𝑡 𝑠𝑒𝑡 𝑛(𝐸) 𝑃(𝐸) = 𝑛(𝑆) Where, n (S) represents number of elements in a sample space of an experiment; n (E) represents a number of elements in the event set; and P (E) represents the probability of an event. When probabilities are assigned to the outcomes in a sample space, each probability must lie between 0 and 1 inclusive, and the sum of all probabilities assigned must be equal to 1. Therefore, 0 P (E) 1 and P(S) = 1 Let us try to understand this with the help of an example. If a die is tossed, the sample space is {1, 2, 3, 4, 5, 6}. In this set, we have a number of elements equal to 6. Now, if the event is the set of odd numbers in a dice, then we have {1, 3, and 5} as an event. In this set, we have 3 elements. So, the probability of getting odd numbers in a single throw of dice is given by 3 1 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = = 6 2 MATH 403- ENGINEERING DATA ANALYSIS 2.2 Counting Rules Useful in Probability Multiplicative Rule Suppose you have j sets of elements, n1 in the first set, n2 in the second set,... and nj in the jth set. Suppose you wish to form a sample of j elements by taking one element from each of the j sets. The number of possible sets is then defined by: 𝑛1 ∙ 𝑛2 ∙ … ∙ 𝑛𝑗 Permutation Rule The arrangement of elements in a distinct order is called permutation. Given a single set of n distinctively different elements, you wish to select k elements from the n and arrange them within k positions. The number of different permutations of the n elements taken k at a time is denoted Pkn and is equal to 𝑛! 𝑃𝑘𝑛 = (𝑛 − 𝑘)! Partitions rule Suppose a single set of n distinctively different elements exists. You wish to partition them into k sets, with the first set containing n1 elements, the second containing n2 elements,..., and the kth set containing nk elements. The number of different partitions is 𝑛! 𝑛1 ! 𝑛2 ! … 𝑛𝑘 ! Where, n1 + n2 + … + nk = n MATH 403- ENGINEERING DATA ANALYSIS The numerator gives the permutations of the n elements. The terms in the denominator remove the duplicates due to the same assignments in the k sets (multinomial coefficients). Combinations Rule A sample of k elements is to be chosen from a set of n elements. The number of different samples of k samples that can be selected from n is equal to 𝑛 𝑛! ( )= 𝑘 𝑘! (𝑛 − 𝑘)! 2.3 Rules of Probability Before discussing the rules of probability, we state the following definitions: Two events are mutually exclusive or disjoint if they cannot occur at the same time. The probability that Event A occurs, given that Event B has occurred, is called a conditional probability. The conditional probability of Event A, given Event B, is denoted by the symbol P (A|B). The complement of an event is the event not occurring. The probability that Event A will not occur is denoted by P (A'). The probability that Events A and B both occur is the probability of the intersection of A and B. The probability of the intersection of Events A and B is denoted by P (A ∩ B). If Events A and B are mutually exclusive, P(A ∩ B) = 0. MATH 403- ENGINEERING DATA ANALYSIS The probability that Events A or B occur is the probability of the union of A and B. The probability of the union of Events A and B is denoted by P(A ∪ B). If the occurrence of Event A changes the probability of Event B, then Events A and B are dependent. On the other hand, if the occurrence of Event A does not change the probability of Event B, then Events A and B are independent. Rule of Addition Rule 1: If two events A and B are mutually exclusive, then: 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) Rule 2: If events A and B are not mutually exclusive events, then: 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵) Example 1. A student goes to the library. The probability that she checks out (a) a work of fiction is 0.40, (b) a work of non-fiction is 0.30, and (c) both fiction and non-fiction is 0.20. What is the probability that the student checks out a work of fiction, non-fiction, or both? Solution: Let F = the event that the student checks out fiction; Let N = the event that the student checks out non-fiction. Then, based on the rule of addition: 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐹) + 𝑃(𝑁) − 𝑃(𝐹 ∩ 𝑁) 𝑃(𝐴 ∪ 𝐵) = 0.4 + 0.3 − 0.2 = 𝟎. 𝟓 MATH 403- ENGINEERING DATA ANALYSIS Rule of Multiplication Rule 1: When two events A and B are independent, then: 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵) Dependent - Two outcomes are said to be dependent if knowing that one of the outcomes has occurred affects the probability that the other occurs Conditional Probability - an event B in relationship to an event A is the probability that event B occurs after event A has already occurred. The probability is denoted by 𝑃(𝐵|𝐴). Rule 2: When two events are dependent, the probability of both occurring is: 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵|𝐴) 𝑃(𝐴 ∩ 𝐵) Where 𝑃(𝐵|𝐴) = , provided that P (A) 0 𝑃(𝐴) Example 1. A day’s production of 850 manufactured parts contains 50 parts that do not meet customer requirements. Two parts are selected randomly without replacement from the batch. What is the probability that the second part is defective given that the first part is defective? Solution: Let A = event that the first part selected is defective Let B = event that the second part selected is defective. P (B|A) =? If the first part is defective, prior to selecting the second part, the batch contains 849 parts, of which 49 are defective, therefore P (B|A) = 49/849 MATH 403- ENGINEERING DATA ANALYSIS Example 2. An urn contains 6 red marbles and 4 black marbles. Two marbles are drawn without replacement from the urn. What is the probability that both of the marbles are black? Solution: Let A = the event that the first marble is black; and let B = the event that the second marble is black. We know the following: In the beginning, there are 10 marbles in the urn, 4 of which are black. Therefore, P (A) = 4/10. After the first selection, there are 9 marbles in the urn, 3 of which are black. Therefore, P (B|A) = 3/9. 4 3 𝑃(𝐴 ∩ 𝐵) = ( ) ( ) = 𝟎. 𝟏𝟑𝟑 10 9 Example 3. Two cards are selected from a pack of cards. What is the probability that they are both queen? Solution: Let A = First card which is a queen Let B = Second card which is also a queen We require P (A B). Notice that these events are dependent because the probability that the second card is a queen depends on whether or not the first card is a queen. MATH 403- ENGINEERING DATA ANALYSIS P (A B) = P (A) P (B|A) P (A) = 1/13 and P (B|A) = 3/51 P (A B) = (1/13) (3/51) = 1/221 = 0.004525 Rule of Subtraction The probability that event A will occur is equal to 1 minus the probability that event A will not occur. 𝑃(𝐴) = 1 − 𝑃(𝐴′ ) Example 1.The probability of Bill not graduating in college is 0.8. What is the probability that Bill will not graduate from college? Solution: 𝑃(𝐴) = 1 − 0.8 = 𝟎. 𝟐 REFERENCES: Montgomery, D. C. et al. (2003). Applied Statistics and Probability for Engineers 3rd Edition. USA. John Wiley & Sons, Inc. Walpole, R. E. et al. (2016). Probability & Statistics for Engineers & Scientists 9th Edition. England. Pearson Education Limited https://math.tutorvista.com/statistics/sample-space-and-events.html https://stattrek.com/probability/probability-rules.aspx https://www.ck12.org/book/CK-12-Probability-and-Statistics-Advanced-Second- Edition/section/3.6/ MATH 403- ENGINEERING DATA ANALYSIS CHAPTER TEST Solve the following problems completely. 1. Three events are shown on the Venn diagram in the following figure: Reproduce the figure and shade the region that corresponds to each of the following events. a. A’ b. A B c. (A B) C d. (B C)’ e. (A B)’ C 2. Each of the possible five outcomes of a random experiment is equally likely. The sample space is {a, b, c, d, e}. Let A denote the event {a, b}, and let B denote the event {c, d, e}. Determine the following: a. P(A) b. P(B) c. P(A’) d. P(A B) e. P(A B) 3. If A, B, and C are mutually exclusive events with P (A) = 0.2, P(B) = 0.3, and P(C) = 0.4, determine the following probabilities: a. P(A B C) c. P(A B) e. P(A’ B’ C’) b. P(A B C) d. P[(A B) C] MATH 403- ENGINEERING DATA ANALYSIS 4. A lot of 100 semiconductor chips contains 20 that are defective. Two are selected randomly, without replacement, from the lot. a. What is the probability that the first one selected is defective? b. What is the probability that the second one selected is defective given that the first one was defective? c. What is the probability that both are defective? d. How does the answer to part (b) change if chips selected were replaced prior to the next selection? 5. Suppose 2% of cotton fabric rolls and 3% of nylon fabric rolls contain flaws. Of the rolls used by a manufacturer, 70% are cotton and 30% are nylon. What is the probability that a randomly selected roll used by the manufacturer contains flaws? MATH 403- ENGINEERING DATA ANALYSIS Chapter 3 DISCRETE PROBABILITY DISTRIBUTIONS Introduction Many physical systems can be modelled by a similar or the same random variables and random experiments. The distribution of the random variables involved in each of these common systems can be analyzed, and the result of that analysis can be used in different applications and examples. In this chapter, the analysis of several random experiments and discrete random variables that often appear in applications is discussed. A discussion of the basic sample space of the random experiment is frequently omitted and the distribution of a particular random variable is directly described. Intended Learning Outcomes At the end of this module, it is expected that the students will be able to: 1. Determine probabilities from probability mass functions. 2. Determine probabilities from cumulative functions and cumulative distribution functions from probability mass functions. 3. Calculate means and variances for discrete random variables. 4. Understand the assumptions for each of the discrete probability distributions presented. MATH 403- ENGINEERING DATA ANALYSIS 5. Select an appropriate discrete probability distribution to calculate probabilities in specific applications. 6. Calculate probabilities, determine means and variances for each of the discrete probability distributions presented Discrete Probability Distribution A discrete distribution describes the probability of occurrence of each value of a discrete random variable. A discrete random variable is a random variable that has countable values, such as a list of non-negative integers. With a discrete probability distribution, each possible value of the discrete random variable can be associated with a non-zero probability. Thus, a discrete probability distribution is often presented in tabular form. 3.1 Random Variables and Their Probability Distributions Random Variables In probability and statistics, a random variable is a variable whose value is subject to variations due to chance (i.e. randomness, in a mathematical sense). As opposed to other mathematical variables, a random variable conceptually does not have a single, fixed value (even if unknown); rather, it can take on a set of possible different values, each with an associated probability. MATH 403- ENGINEERING DATA ANALYSIS A random variable’s possible values might represent the possible outcomes of a yet-to-be-performed experiment, or the possible outcomes of a past experiment whose already-existing value is uncertain (for example, as a result of incomplete information or imprecise measurements). They may also conceptually represent either the results of an “objectively” random process (such as rolling a die), or the “subjective” randomness that results from incomplete knowledge of a quantity. Random variables can be classified as either discrete (that is, taking any of a specified list of exact values) or as continuous (taking any numerical value in an interval or collection of intervals). The mathematical function describing the possible values of a random variable and their associated probabilities is known as a probability distribution. Discrete Random Variables Discrete random variables can take on either a finite or at most a countably infinite set of discrete values (for example, the integers). Their probability distribution is given by a probability mass function which directly maps each value of the random variable to a probability. For example, the value of x1 takes on the probability p1, the value of x2 takes on the probability p2, and so on. The probabilities pi must satisfy two requirements: every probability pi is a number between 0 and 1, and the sum of all the probabilities is 1. (p1+p2+⋯+pk=1) MATH 403- ENGINEERING DATA ANALYSIS Discrete Probability Distribution This shows the probability mass function of a discrete probability distribution. The probabilities of the singletons {1}, {3}, and {7} are respectively 0.2, 0.5, 0.3. A set not containing any of these points has probability zero. Examples of discrete random variables include the values obtained from rolling a die and the grades received on a test out of 100. Probability Distributions for Discrete Random Variables Probability distributions for discrete random variables can be displayed as a formula, in a table, or in a graph. A discrete random variable x has a countable number of possible values. The probability distribution of a discrete random variable x lists the values and their probabilities, where value x1 has probability p1, value x2 has probability x2, and so on. Every probability pi is a number between 0 and 1, and the sum of all the probabilities is equal to 1. Examples of discrete random variables include: The number of eggs that a hen lays in a given day (it can’t be 2.3) The number of people going to a given soccer match The number of students that come to class on a given day The number of people in line at McDonald’s on a given day and time A discrete probability distribution can be described by a table, by a formula, or by a graph. For example, suppose that xx is a random variable that represents the number of people waiting at the line at a fast-food restaurant and it happens to only take the values 2, 3, or 5 with probabilities 2/10, 3/10, and 5/10 respectively. This can be expressed MATH 403- ENGINEERING DATA ANALYSIS through the function f(x) = x/10, x=2, 3, 5 or through the table below. Of the conditional probabilities of the event BB given that A1 is the case or that A2 is the case, respectively. Notice that these two representations are equivalent, and that this can be represented graphically as in the probability histogram below. Probability Histogram: This histogram displays the probabilities of each of the three discrete random variables. The formula, table, and probability histogram satisfy the following necessary conditions of discrete probability distributions: 1. 0≤f(x) ≤1, i.e., the values of f(x) are probabilities, hence between 0 and 1. 2. ∑f(x) =1, i.e., adding the probabilities of all disjoint cases, we obtain the probability of the sample space, 1. Sometimes, the discrete probability distribution is referred to as the probability mass function (pmf). The probability mass function has the same purpose as the probability histogram, and displays specific probabilities for each discrete random variable. MATH 403- ENGINEERING DATA ANALYSIS The only difference is how it looks graphically. Probability Mass Function This shows the graph of a probability mass function. All the values of this function must be non-negative and sum up to 1. x f(x) 2 0.2 3 0.3 5 0.5 Discrete Probability Distribution: This table shows the values of the discrete random variable can take on and their corresponding probabilities. Example 1. A shipment of 20 similar laptop computers to a retail outlet contains 3 that are defective. If a school makes a random purchase of 2 of these computers, find the probability distribution for the number of defectives. Solution: Let X be a random variable whose values x are the possible numbers of defective computers purchased by the school. Then x can only take the numbers 0, 1, and 2. MATH 403- ENGINEERING DATA ANALYSIS Now, Thus, the probability distribution of X is x 0 1 2 f(x) 68/95 51/190 3/190 3.2 Cumulative Distribution Functions You might recall that the cumulative distribution function is defined for discrete random variables as: 𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = ∑ 𝑓(𝑡) 𝑡≤𝑥 Again, F(x) accumulates all of the probability less than or equal to x. The cumulative distribution function for continuous random variables is just a straightforward extension of that of the discrete case. All we need to do is replace the summation with an integral. The cumulative distribution function ("c.d.f.") of a continuous random variable X is defined as: 𝑥 𝐹(𝑥) = ∫ 𝑓(𝑡)𝑑𝑡 −∞ For -∞ c), and so forth. Note that when X is continuous, P (a < X < b) = P (a < X < b) + P(X = b) = P (a < X < b). That is, it does not matter whether we include an endpoint of the interval or not. This is not true, though, when X is discrete. Although the probability distribution of a continuous random variable cannot be presented in tabular form, it can be stated as a formula. Such a formula would necessarily be a function of the numerical values of the continuous random variable X and as such will be represented by the functional notation f(x). In dealing with continuous variables, f(x) is usually called the probability density function, or Figure 4.1 Typical Density Functions simply the density function of A'. Since X is defined over a continuous sample space, it is possible for f(x) to have a finite number of discontinuities. However, most density functions that have practical applications in the analysis of statistical data are continuous and their graphs may take any of several forms, some of which are shown in Figure 4.1. Because areas will be used to represent probabilities and probabilities arc positive numerical values, the density function must lie entirely above the x axis. A probability MATH 403- ENGINEERING DATA ANALYSIS density function is constructed so that the area under its curve bounded by the x axis is equal to 1 when computed over the range of X for which f(x) is defined. Should this range of X be a finite interval, it is always possible to extend the interval to include the entire sot of real numbers by defining f(x) to be zero at all points in the extended portions of the interval. In Figure 4.2, the probability that X assumes a value between a and /; is equal to the shaded area under the density function between the ordinates at. x = a and x = b, and from integral calculus is given by 𝒃 P (a < X < b) = ∫𝒂 𝐟(𝐱) 𝐝𝐱 Figure 4.2 P (a < X < b) 𝑥2 , −1 < 𝑥 < 2, Example 1. For the density function 𝑓(𝑥) = {3 , find f(x), and use it to (0), 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒 evaluate P (0 < X ≤ 1). MATH 403- ENGINEERING DATA ANALYSIS Solution: For –1 < x < 2, 𝑥 𝑥 𝑥 𝑡2 𝑡3 𝑥3+ 1 F(x) = ∫−∞ 𝑓(𝑡)𝑑𝑡 = ∫−1 𝑑𝑡 = | = 3 9 −1 9 Therefore, 0, 𝑥 < −1 𝑥3+ 1 F(x) = { , − 1 ≤ 𝑥 < 2, 9 1, 𝑥 ≥ 2. The cumulative distribution function F(x) is expressed graphically in Figure 4.3. Now, 2 1 1 P (0 < X ≤ 1) = F (1) – F (0) = 9 − 9 = 9 Figure 4.3 Continuous cumulative distribution function MATH 403- ENGINEERING DATA ANALYSIS 4.2 Expected Values of Continuous Random Variables Let X be a continuous random variable with range [a, b] and probability density function f(x). The expected value of X is defined by 𝑏 𝐸(𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥 𝑎 Let’s see how this compares with the formula for a discrete random variable: 𝑛 𝐸(𝑋) = ∑ 𝑥𝑖 𝑝(𝑥𝑖 ) 𝑖=1 The discrete formula says to take a weighted sum of the values xi of X, where the weights are the probabilities p (xi). Recall that f(x) is a probability density. Its units are prob/ (unit of X) So f(x) dx represents the probability that X is in an infinitesimal range of width dx around x. Thus we can interpret the formula for E(X) as a weighted integral of the values x of X, where the weights are the probabilities f(x) dx. As before, the expected value is also called the mean or average. The variance of X, V(X) or 2, is ∞ ∞ 𝜎 2 = 𝑉(𝑋) = ∫ (𝑥 − 𝜇)2 𝑓(𝑥)𝑑𝑥 = ∫ 𝑥 2 𝑓(𝑥)𝑑𝑥 − 𝜇 2 −∞ −∞ The standard deviation of X is 𝜎 = √𝜎 2 MATH 403- ENGINEERING DATA ANALYSIS Example 1. Let X ∼ uniform (0, 1). Find E(X). Solution: Since X has a range of [0, 1] and a density of f(x) = 1: 1 1 𝑥2 𝟏 𝐸(𝑋) = ∫ 𝑥𝑑𝑥 = | = 0 2 0 𝟐 Not surprisingly, the mean is at the midpoint of the range. 3 Example 2. Let X have range [0, 2] and density 𝑥 2. Find E(X). 8 2 2 2 3 3𝑥 4 3 𝐸(𝑋) = ∫0 𝑥𝑓(𝑥)𝑑𝑥 = ∫0 𝑥 3 𝑑𝑥 = | =2 8 32 0 Does it make sense that this X has mean is in the right half of its range? Yes. Since the probability density increases as x increases over the range, the average value of x should be in the right half of the range. µ is “pulled” to the right of the midpoint 1 because there is more mass to the right. MATH 403- ENGINEERING DATA ANALYSIS Properties of E(X) The properties of E(X) for continuous random variables are the same as for discrete ones: 1. If X and Y are random variables on a sample space Ω then E(X = Y) = E(X) + E(Y) 2. If a and b are constants then E (aX + b) = aE(X) + b Expectation of Functions of X This works exactly the same as the discrete case. If h(x) is a function then Y = h(X) is a random variable and ∞ 𝐸(𝑌) = 𝐸(ℎ(𝑋)) = ∫ ℎ(𝑥)𝑓𝑥(𝑥)𝑑𝑥 −∞ Example 1. Let X ∼ exp (λ). Find E(X2). ∞ 2𝑥 −λx 2 ∞ 𝟐 𝐸(𝑋 2 ) = ∫ 𝑥 2 λ𝑒 −λx 𝑑𝑥 = [−𝑥 2 𝑒 −λx − 𝑒 − 2 𝑒 −λx ] = 𝟐 0 λ λ 0 𝛌 MATH 403- ENGINEERING DATA ANALYSIS 4.3 Continuous Uniform Distribution This is the simplest continuous distribution as it is analogous to its discrete counterpart. A continuous random variable X with probability density function 1 𝑓(𝑥) = , 𝑎≤𝑥≤𝑏 𝑏−𝑎 Is a continuous uniform random variable. The probability density function of a continuous uniform random variable is shown below and the formula for computing its mean and variance. 𝑎+𝑏 2 (𝑏 − 𝑎)2 𝜇 = 𝐸(𝑋) = 𝑎𝑛𝑑 𝜎 = 𝑉(𝑋) = 2 12 4.4 Normal Distribution The Normal Distribution is the most important and most widely used continuous probability distribution. It is the cornerstone of the application of statistical inference in MATH 403- ENGINEERING DATA ANALYSIS analysis of data because the distributions of several important sample statistics tend towards a Normal distribution as the sample size increases. Empirical studies have indicated that the Normal distribution provides an adequate approximation to the distributions of many physical variables. Specific examples include meteorological data, such as temperature and rainfall, measurements on living organisms, scores on aptitude tests, physical measurements of manufactured parts, weights of contents of food packages, volumes of liquids in bottles/cans, instrumentation errors and other deviations from established norms, and so on. The graphical appearance of the Normal distribution is a symmetrical bell-shaped curve that extends without bound in both positive and negative directions. The probability density function is given by 1 (𝑥 − 𝜇)2 −∞ < 𝑥 < ∞; 𝑓(𝑥) = 𝑒𝑥𝑝 [− ], 𝜎√2𝜋 2𝜎 2 −∞ < 𝜇 < ∞, 𝜎 > 0 where μ and σ are parameters. These turn out to be the mean and standard deviation, respectively, of the distribution. As a shorthand notation, we write X ~ N (μ, σ2). The curve never actually reaches the horizontal axis buts gets close to it beyond about 3 standard deviations each side of the mean. For any Normally distributed variable: 68.3% of all values will lie between μ −σ and μ + σ (i.e. μ ± σ) 95.45% of all values will lie within μ ± 2 σ 99.73% of all values will lie within μ ± 3 σ MATH 403- ENGINEERING DATA ANALYSIS The graphs below illustrate the effect of changing the values of μ and σ on the shape of the probability density function. Low variability (σ = 0.71) with respect to the mean gives a pointed bell-shaped curve with little spread. Variability of σ = 1.41 produces a flatter bell-shaped curve with a greater spread. MATH 403- ENGINEERING DATA ANALYSIS Example 1. The volume of water in commercially supplied fresh drinking water containers is approximately Normally distributed with mean 70 litres and standard deviation 0.75 litres. Estimate the proportion of containers likely to contain (i) in excess of 70.9 litres, (ii) at most 68.2 litres, (iii) less than 70.5 litres. Solution: Let X denote the volume of water in a container, in litres. Then X ~ N (70, 0.752 ), i.e. μ = 70, σ = 0.75 and Z = (X − 70)/0.75 (i) X = 70.9 ; Z = (70.9 − 70)/0.75 = 1.20 P(X > 70.9) = P (Z > 1.20) = 0.1151 or 11.51% (ii) X = 68.2 ; Z = −2.40 P(X < 68.2) = P (Z < −2.40) = 0.0082 or 0.82% (iii) X = 70.5 ; Z = 0.67 P(X > 70.5) = 0.2514; P(X < 70.5) = 0.7486 or 74.86% MATH 403- ENGINEERING DATA ANALYSIS 4.5 Normal Approximation to Binomial and Poisson Distribution Binomial Approximation The normal distribution can be used as an approximation to the binomial distribution if X is a binomial random variable, 𝑋 − 𝑛𝑝 𝑍= √𝑛𝑝(1 − 𝑝) The above equation is the formula for standardizing the random variable X. Probabilities involving X can be approximated by using a standard distribution. The approximation is good when n is large relative to p and when np > 5 and n (1 – P) > 5. In some cases, working out a problem using the Normal distribution may be easier than using a Binomial. Poisson Approximation Poisson distribution was developed as the limit of a binomial distribution as the number of trials increased to infinity therefore the normal distribution can also be used to approximate probabilities of a Poisson random variable. If X is a Poisson random variable with E(X) = and V(X) = , then 𝑋−𝜆 𝑍= √𝜆 is approximately a standard normal random variable and this approximation is good for > 5. Continuity Correction The binomial and Poisson distributions are discrete random variables, whereas the normal distribution is continuous. We need to take this into account when we are using MATH 403- ENGINEERING DATA ANALYSIS the normal distribution to approximate a binomial or Poisson using a continuity correction. In the discrete distribution, each probability is represented by a rectangle (right hand diagram): When working out probabilities, we want to include whole rectangles, which is what continuity correction is all about. Example 1. Suppose we toss a fair coin 20 times. What is the probability of getting between 9 and 11 heads? Solution: Let X be the random variable representing the number of heads thrown. X ~ Bin (20, ½) Since p is close to ½ (it equals ½!), we can use the normal approximation to the binomial. X ~ N (20 × ½, 20 × ½ × ½) so X ~ N (10, 5). In this diagram, the rectangles represent the binomial distribution and the curve is the normal distribution: MATH 403- ENGINEERING DATA ANALYSIS We want P (9 ≤ X ≤ 11), which is the red shaded area. Notice that the first rectangle starts at 8.5 and the last rectangle ends at 11.5. Using a continuity correction, therefore, our probability becomes P (8.5 < X < 11.5) in the normal distribution. 4.6 Exponential Distribution The exponential distribution obtains its name from the exponential function in the probability density function. Plots of the exponential distribution for selected values of are shown in Fig. 4.4. For any value of, the exponential distribution is quite skewed. Figure 4.4 Probability density function of exponential random variables for selected values of λ MATH 403- ENGINEERING DATA ANALYSIS If the random variable X has an exponential distribution with parameter λ, 1 1 𝜇 = 𝐸(𝑋) = and 𝜎 2 = 𝑉(𝑋) = λ λ2 It is important to use consistent units in the calculation of probabilities, means, and variances involving exponential random variables. The following example illustrates unit conversions. Example 1. In a large corporate computer network, user log-ons to the system can be modeled as a Poisson process with a mean of 25 log-ons per hour. What is the probability that there are no logons in an interval of 6 minutes? Solution: Let X denote the time in hours from the start of the interval until the first log-on. Then, X has an exponential distribution with log-ons per hour. We are interested in the probability that X exceeds 6 minutes. Because is given in log-ons per hour, we express all time units in hours. That is, 6 minutes 0.1 hour. The probability requested is shown as the shaded area under the probability density function in Fig. 4.4. Therefore, ∞ 𝑃(𝑋 > 0.1) = ∫ 25𝑒 −25𝑥 𝑑𝑥 = 𝑒 −25(0.1) = 0.082 0.1 Figure 4.4 Probability for the exponential distribution MATH 403- ENGINEERING DATA ANALYSIS In the previous example, the probability that there are no log-ons in a 6-minute interval is 0.082 regardless of the starting time of the interval. A Poisson process assumes that events occur uniformly throughout the interval of observation; that is, there is no clustering of events. If the log-ons are well modeled by a Poisson process, the probability that the first log-on after noon occurs after 12:06 P.M. is the same as the probability that the first log-on after 3:00 P.M. occurs after 3:06 P.M. And if someone logs on at 2:22 P.M., the probability the next log-on occurs after 2:28 P.M. is still 0.082. Our starting point for observing the system does not matter. However, if there are high-use periods during the day, such as right after 8:00 A.M., followed by a period of low use, a Poisson process is not an appropriate model for log-ons and the distribution is not appropriate for computing probabilities. It might be reasonable to model each of the high and low-use periods by a separate Poisson process, employing a larger value for during the high-use periods and a smaller value otherwise. Then, an exponential distribution with the corresponding value of can be used to calculate log-on probabilities for the high- and low-use periods. REFERENCES: Montgomery, D. C. et al. (2003). Applied Statistics and Probability for Engineers 3rd Edition. USA. John Wiley & Sons, Inc. Walpole, R. E. et al. (2016). Probability & Statistics for Engineers & Scientists 9th Edition. England. Pearson Education Limited Jeremy Orloff, and Jonathan Bloom. 18.05 Introduction to Probability and Statistics. Spring 2014. Massachusetts Institute of Technology: MIT Open Courseware, https://ocw.mit.edu. License: Creative Commons BY-NC-SA. MATH 403- ENGINEERING DATA ANALYSIS CHAPTER TEST Solve the following problems completely. 1. Suppose that 𝑓(𝑥) = 𝑒 −𝑥 for 0 < x. Determine the following probabilities: a. P(1 < X) b. P(1 < X < 2.5) c. P(X = 3) d. P(X < 4) e. P (3 X) 2. The probability density function of the length of a metal rod is f(x) = 2 for 2.3 < x < 2.8 meters. b. If the specifications for this process are from 2.25 to 2.75 meters, what proportion of the bars fail to meet the specifications? c. Assume that the probability density function is f(x) = 2 for an interval of length 0.5 meters. Over what value the density should be centered to achieve the greatest proportion of bars within specifications? 3. Suppose f(x) = 0.125x for 0 < x < 4. Find the mean and variance of X. 4. Suppose the time it takes a data collection operator to fill out an electronic form for a database is usually between 1.5 and 2.2 minutes. d. What is the mean and variance of the time it takes the operator to fill out the form? e. What is the probability that it will take less than two minutes to fill out the form? f. Determine the cumulative distribution function of the time it takes to fill out the form. MATH 403- ENGINEERING DATA ANALYSIS 5. Suppose that X is a binomial random variable with n = 200 and p = 0.4 g. Approximate the probability that X is less than or equal to 70 h. Approximate that the probability of X is greater than 70 and less than 90. MATH 403- ENGINEERING DATA ANALYSIS Chapter 5 JOINT PROBABILITY DISTRIBUTIONS Introduction The study of random variables and their probability distributions in the preceding sections is restricted to one dimensional sample spaces, in that we recorded outcomes of an experiment as values assumed by a single random variable. However, it is often useful to have more than one random variable defined in a random experiment. In general, if X and Y are two random variables, the probability distribution that defines their simultaneous behaviour is called a joint probability distribution. In this chapter, we will investigate some important properties of these joint probability distributions. Intended Learning Outcomes At the end of this module, it is expected that the students will be able to: 1. Understand and use joint probability mass functions and joint probability density functions to calculate probabilities and calculate marginal probability distributions. 2. Understand and calculate conditional probability distributions from joint probability distributions and assess independence of random variables. 3. Calculate means and variances for linear functions of random variables and calculate probabilities for linear functions of normally distributed random variables. 4. Determine the distribution of general function of a random variable. MATH 403- ENGINEERING DATA ANALYSIS 5.1 JOINT PROBABILITY DISTRIBUTIONS FOR TWO RANDOM VARIABLES In the previous section, we studied probability distributions for a single random variable. There will be situations, however, where we may find it desirable to record the simultaneous outcomes of several random variables. For example, we might measure the amount of precipitate P and volume V of gas released from a controlled chemical experiment, giving rise to a two dimensional sample space consisting of the outcomes (p, v), or we might be interested in the hardness H and tensile strength T of cold-drawn copper, resulting in the outcomes (h, t). In a study to determine the likelihood of success in college based on high school data, we might use a three dimensional sample space and record for each individual his or her aptitude test score, high school rank, and grade- point average at the end of freshman year in college. If X and Y are two discrete random variables, the probability distribution for their simultaneous occurrence can be represented by a function with values f(x, y) for any pair of values (x, y) within the range of the random variables X and Y. It is customary to refer to this function as the joint probability distribution of X and Y. Hence, in the discrete case, f (x, y) = P (X = x, Y = y) that is, the values (x, y) give the probability that outcomes x and y occur at the same time. For example, if an 18 wheeler is to have its tires serviced and X represents the number of miles these tires have been driven and Y represents the number of tires that need to be replaced, then f(30000,5) is the probability that the tires are used over 30,000 miles and the truck needs 5 new tires. MATH 403- ENGINEERING DATA ANALYSIS Discrete case. The function f ( x, y) is a joint probability distribution or probability mass function of the discrete random variables X and Y if 1. 𝑓(𝑥, 𝑦) ≥ 0 𝑓𝑜𝑟 𝑎𝑙𝑙 (𝑥, 𝑦) 2. ∑𝑥 ∑𝑦 𝑓(𝑥, 𝑦) = 1 3. 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = 𝑓(𝑥, 𝑦) For any region A in the xy plane, 𝑃[(𝑋, 𝑌) ∈ 𝐴] = ∑ ∑𝐴 𝑓(𝑥, 𝑦). Just as the probability mass function of a single random variable X is assumed to be zero at all values outside the range of X, so the joint probability mass function of X and Y is assumed to be zero for which a probability is not specified. Example 1. Two ballpoint pens are selected at random from a box that contains 3 blue pens, 2 red pens, and 3 green pens. If X is the number of blue pens selected and Y is the number of red pens selected, find a.) the joint probability function f(x, y). b.) 𝑃[(𝑋, 𝑌) ∈ 𝐴] where A is the region{(𝑥, 𝑦)|𝑥 + 𝑦 ≤ 1}. Solution: The possible pairs of values (x, y) are (0, 0), (0, 1), (1, 0), (1, 1), (0, 2), and (2, 0). Now, f (0, 1), for example, represents the probability that a red and a green pens are selected. The total number of equally likely ways of selecting any 2 pens from the 8 is (82) = 28. The number of ways of selecting 1 red from 2 red pens and 1 green from 3 green pens is (21)(31) = 6. Hence, 𝑓(0.1) = 6⁄28 = 3⁄14. MATH 403- ENGINEERING DATA ANALYSIS Similar calculations will yield the probabilities for the other cases, which are presented in Table 1. Note that the probabilities sum to 1. It will become clear that the joint probability distribution of Table 1. can be represented by the formula (3x) (2y) (2-x-y 3 ) f(x, y)= (82) for x = 0, 1, 2; y = 0, 1, 2; and 0 ≤ x + y ≤ 2. (b) The probability that (X, Y) fall in the region A is 𝑃[(𝑋, 𝑌) ∈ 𝐴] = 𝑃(𝑋 + 𝑌 ≤ 1) = 𝑓(0,0) + 𝑓(0,1) + 𝑓(1,0) 3 3 9 9 = + + = 28 14 28 14 Table 1. Joint Probability Distribution for Example 1 x f(x,y) Row Totals 0 1 2 3 9 3 15 0 28 28 28 25 3 3 3 y 1 0 14 14 7 1 1 2 0 0 28 28 5 15 3 Column Totals 1 14 28 28 MATH 403- ENGINEERING DATA ANALYSIS Example2. Suppose we toss a pair of fair, four-sided dice, in which one of the dice is RED and the other is BLACK. Let, X = the outcome on the RED die = {1, 2, 3, 4} Y = the outcome on the black die = {1, 2, 3, 4} Find the following: a) What is the probability that X takes on a particular value x, and Y takes on a particular value y? b) What is P(X = x, Y = y)? Solution: Just as we have to in the case with one discrete random variable, in order to find the “joint probability distribution” of X and Y, we first need to define the support of X and Y. Well the support of X is, S1 = {1, 2, 3, 4} and the support of Y is: S2 = {1, 2, 3, 4} Now, that if we let (x, y) denote one of the possible outcomes of one toss of the pair of dice, then certainly (1, 1) is a possible outcome, as is (1, 2), (1, 3) and (1, 4). If we continue to enumerate all of the possible outcomes, we soon see that the joint support S has 16 possible outcomes: S = {(1,1),(1,2),(1,3),(1,4),(2,1),(2,2),(2,3),(3,1), (3,2),(3,3),(3,4), (4,1), (4,2),(4,3), (4,4)} MATH 403- ENGINEERING DATA ANALYSIS Now, because the dice are fair, we should expect each of 16 possible outcomes to be equally likely. Therefore, using the classical approach to assigning probability, the 1 probability that X equals any particular x value, and Y equals any particular y value, is 16. That is, for all (x, y) in the support S: 1 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = 16 Because we have identified the probability for each (x, y), we have found what we call the joint probability mass function. Perhaps, it is not too surprising that the joint probability mass function, which is typically denoted as f(x, y), can be defined as a formula (as we have above), or as a table. Here’s what our joint probability mass function would like in tabular form: Black(Y) f(x, y) fX (x) 1 2 3 4 1 1 1 1 4 1 16 16 16 16 16 1 1 1 1 4 2 16 16 16 16 16 Red(X) 1 1 1 1 4 3 16 16 16 16 16 1 1 1 1 4 4 16 16 16 16 16 4 4 4 4 fY(y) 16 16 16 16 1 MATH 403- ENGINEERING DATA ANALYSIS When X and Y are continuous random variables, the joint density function f(x, y) is a surface lying above the xy plane, and 𝑃[(𝑋, 𝑌) ∈ 𝐴], where A is any region in the xy plane, is equal to the volume of the right cylinder bounded by the base A and the surface. Continuous case. The case where both variables are continuous is obtained easily by analogy with discrete case on replacing sums by integrals. Thus, the joint probability function for the random variables X and Y (or, as it is more commonly called, the joint density function of X and Y). The function f(x, y) is a joint density function of the continuous random variables X and Y if 1. 𝑓(𝑥, 𝑦) ≥ 0 𝑓𝑜𝑟 𝑎𝑙𝑙 (𝑥, 𝑦) ∞ ∞ 2. ∫−∞ ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 𝑑𝑦 = 1 3.𝑃[(𝑋, 𝑌) ∈ 𝐴] = ∬𝐴 𝑓(𝑥, 𝑦) 𝑑𝑥 𝑑𝑦, for any region A in the xy plane. Example 1. A privately owned business operates both a drive-in facility and a walk-in facility. On a randomly selected day, let X and Y, respectively, be the proportions of the time that the drive-in and the walk-in facilities are in use, and suppose that the joint density function of these random variables is 2 𝑓(𝑥, 𝑦) = { 5 (2𝑥 + 3𝑦), 0 ≤ 𝑥 ≤ 1,0 ≤ 𝑦 ≤ 1, 0, Find the following: ∞ ∞ (a) Verify condition of joint density function, ∫−∞ ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥 𝑑𝑦 = 1 MATH 403- ENGINEERING DATA ANALYSIS 1 1 1 (b) Find 𝑃[(𝑋, 𝑌) ∈ 𝐴] 𝑤ℎ𝑒𝑟𝑒 𝐴 = {(𝑥, 𝑦)|0 < 𝑥 2 , 4 < 𝑦 < 2} Solution: (a) The integration of f(x, y) over the whole region is ∞ ∞ 1 1 2 ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥 𝑑𝑦 = ∫ ∫ (2𝑥 + 3𝑦)𝑑𝑥𝑑𝑦 −∞ −∞ 0 0 5 1 2𝑥 2 6𝑥𝑦 𝑥 = 1 = ∫ ( + )| 𝑑𝑦 0 5 5 𝑥=0 1 2 6𝑦 2𝑦 3𝑦 2 1 2 3 = ∫ ( + ) 𝑑𝑦 = ( + )| = + = 1 0 5 5 5 5 0 5 5 (b) To calculate the probability, we use 1 1 1 𝑃[(𝑋, 𝑌) ∈ 𝐴] = 𝑃 (0 < 𝑋 < , 2, Y ≤ 1); (c) P(X>Y ); (d) P(X + Y = 4) MATH 403- ENGINEERING DATA ANALYSIS Marginal Probability Distributions. If more than one random variable is defined in a random experiment, it is important to distinguish between the joint probability distribution of X and Y and the probability distribution of each variable individually. The individual probability distribution of a random variable is referred to as its marginal probability distribution. The marginal probability distribution of X can be determined from the joint probability distribution of X and other random variables. For example, consider discrete random variables X and Y. To determine P(X = x), we sum P(X = x, Y =y) over all points in the range of (X, Y) for which X = x. Subscripts on the probability mass functions distinguish between the random variables. For continuous random variables, an analogous approach is used to determine marginal ability distributions. In the continuous case, an integral replaces the sum. The marginal distributions of X alone and of Y alone are 𝑔(𝑥) = ∑ 𝑓(𝑥, 𝑦) 𝑎𝑛𝑑 ∑ 𝑓(𝑥, 𝑦) 𝑦 𝑥 for the discrete case, ∞ ∞ 𝑔(𝑥) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑦 𝑎𝑛𝑑 ℎ(𝑦) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑥 −∞ −∞ for the continuous case. The term marginal is used here because, in the discrete case, the values of g(x) and h(y) are just the marginal totals of the respective columns and rows when the values of f(x, y) are displayed in a rectangular table. MATH 403- ENGINEERING DATA ANALYSIS Example 1. Show that the column and row totals of Table 1. give the marginal distribution of X alone and of Y alone. Solution: For the random variable X, we see that 3 3 1 5 𝑔(0) = 𝑓(0,0) + 𝑓(0,1) + 𝑓(0,2) = + + = , 28 14 28 14 9 3 15 𝑔(1) = 𝑓(1,0) + 𝑓(1,1) + 𝑓(1,2) = + +0= , 28 14 28 3 3 𝑔(2) = 𝑓(2,0) + 𝑓(2,1) + 𝑓(2,2) = +0+0= 28 28 which are just the column totals of Table 1. In a similar manner we could show that the values of h(y) are given by the row totals. In tabular form, these marginal distributions may be written as follows: x 0 1 2 y 0 1 2 5 15 3 15 3 1 g(x) 14 28 28 h(y) 28 7 28 Example 2. Find g(x) and h(y) for the joint density function of Example 3. Solution: By definition, we have ∞ 1 2 4𝑥𝑦 6𝑦 2 𝑦 = 1 4𝑥 + 3 𝑔(𝑥) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑦 = ∫ (2𝑥 + 3𝑦)𝑑𝑦 = ( + )| = −∞ 0 5 5 10 𝑦 = 0 5 for 0 ≤ x ≤ 1, and g(x) = 0 elsewhere. MATH 403- ENGINEERING DATA ANALYSIS Similarly, we have ∞ 1 2 2(1 + 3𝑦) ℎ(𝑦) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑦 = ∫ (2𝑥 + 3𝑦)𝑑𝑥 = −∞ 0 5 5 for 0 ≤ y ≤ 1, and h(y) = 0 elsewhere. Practice Problem 1. A fast-food restaurant operates both a drive through facility and a walk-in facility. On a randomly selected day, let X and Y, respectively, be the proportions of the time that the drive-through and walk-in facilities are in use, and suppose that the joint density function of these random variables is 2 𝑓(𝑥, 𝑦) = { 3 (𝑥 + 2𝑦), 0 ≤ 𝑥 ≤ 1, 0 ≤ 𝑦 ≤ 1 0, (a) Find the marginal density of X. (b) Find the marginal density of Y. (c) Find the probability that the drive-through facility is busy less than one-half of the time. Conditional Probability Distribution. A special type of distribution in the form of f(x, y) / g(x) in order to be able to effectively compute conditional probabilities. Let X and Y be two random variables, discrete or continuous. The conditional distribution of the random variable Y given that X = x is 𝑓(𝑥, 𝑦) 𝑓(𝑦|𝑥) = , 𝑝𝑟𝑜𝑣𝑖𝑑𝑒𝑑 𝑔(𝑥) > 0 𝑔(𝑥) MATH 403- ENGINEERING DATA ANALYSIS Similarly, the conditional distribution of X given that Y = y is shown below 𝑓(𝑥, 𝑦) 𝑓(𝑥|𝑦) = , 𝑝𝑟𝑜𝑣𝑖𝑑𝑒𝑑 ℎ(𝑦) > 0 ℎ(𝑦) If we wish to find the probability that the discrete random variable X falls between a and b when it is known that the discrete variable Y = y, we evaluate 𝑃(𝑎 < 𝑋 < 𝑏|𝑌 = 𝑦) = ∑ 𝑓(𝑥|𝑦) 𝑎 0, 𝑥2 > 0 , 𝑥3 > 0, and f (𝑥1 , 𝑥2 , 𝑥3 ) = 0 elsewhere. Hence ∞ 3 2 P (𝑋1 < 2, 1 < 𝑋2 < 3, 𝑋3 < 2) = ∫2 ∫1 ∫0 𝑒 −𝑥1 −𝑥2 −𝑥3 𝑑𝑥1 𝑑𝑥2 𝑑𝑥3 = (1 − 𝑒 −2 )(𝑒 −1 − 𝑒 −3 )𝑒 −2 = 0.0372 MATH 403- ENGINEERING DATA ANALYSIS Practice Problem: 1. The amount of kerosene, in thousands of liters, in a tank at the beginning of any day is a random amount Y from which a random amount X is sold during that day. Suppose that the tank is not resupplied during the day so that x ≤ y, and assume that the joint density function of these variables is 2 𝑓(𝑥, 𝑦) = { 0 1.96 or z < −1.96 where the value 1.96 is found as z0.025 in the table of Areas Under the Normal Curve. A value of z in the critical region prompts the statement “The value of the test statistic is significant,” which we can then MATH 403- ENGINEERING DATA ANALYSIS translate into the user’s language. For example, if the hypothesis is given by H 0: μ = 12, H1: μ 12, one might say, “The mean differs significantly from the value 12.” The philosophy that the maximum risk of making a type I error should be controlled is he root of the pre-selection of a significance level. However, this approach does not account for values of test statistics that are “close” to the critical region. Suppose, for example, in the illustration with H0: μ = 12 versus H1: μ 12, a value of z = 1.84 is observed; strictly speaking, with = 0.05, the value is not significant. But the risk of committing a type I error if one rejects H0 in this case could hardly be considered severe. In fact, in a two-tailed scenario, one can quantify this risk as P = 2P (Z > 1.84 when μ = 12) = 2(0.0329) = 0.0658. As a result, 0.0658 is the probability of obtaining a value of z as large as or larger in magnitude than 1.84 when in fact μ = 12. It is an important information to the user although the evidence against H 0 is not as strong as that which would result from rejection at an = 0.05 level. As a result, the P-value approach has been extensively used in applied statistics. It is designed to have an alternative, in terms of a probability, to a mere “reject” or “do not reject” conclusion. The P-value also gives an important information when the z-value falls well into the ordinary critical region. For example, if z is 2.75, it is informative for the user to observe that P = 2(0.0030) = 0.0060, and thus the z-value is significant at a level considerably less than 0.05. It is important to know that under the condition of H0, a value of z = 2.75 is an extremely rare event. That is, a value at least that large in magnitude would only occur 60 times in 10,000 experiments. MATH 403- ENGINEERING DATA ANALYSIS A P-value is the lowest level of significance at which the observed value of the test statistic is significant. It is the smallest level of that would lead to rejection of the Ho with the given data. 8.1.3. General Procedure for Test of Hypothesis The following are the steps in hypothesis testing using the fixed probability of Type I Error approach. 1. State the null and alternative hypothesis. 2. Determine the level of significance and the direction of test. The direction of test will be based on whether the alternative hypothesis is stated as left or right tailed test or as two-tailed test. 3. Determine the appropriate statistical test based on the level of measurement of the data gathered. 4. Write the decision rule expressing on how to accept or reject the null hypothesis. 5. Compute the test statistic and compare with the critical value. The test statistic plays a vital role in rejecting or accepting the null hypothesis. 6. State the decision based on the resulting computed value when compared to the critical value. 7. Draw scientific or engineering conclusion for the given problem. If you will be testing the hypothesis using Significant Testing or the P-value approach, follow these steps: 1. State the null and alternative hypothesis. MATH 403- ENGINEERING DATA ANALYSIS 2. Determine the appropriate statistical test based on the level of measurement of the data gathered. 3. Compute the test statistic. 4. Compute the P-value based on the computed value of the test statistic. 5. State the decision based on the resulting P-value and knowledge of the scientific system. 6. Draw scientific or engineering conclusion for the given problem. 8.2. Test on the Mean of a Normal Distribution Variance Known Following the steps in hypothesis testing for only single mean, the hypothesized value referred to as the hypothesized mean (µo). The null hypothesis is stated as: Ho: µ = µo The alternative hypothesis can be written as: H1 : µ µo H1 : µ > µo H1: µ < µo The decision rule is stated as follows: reject the null hypothesis if the absolute value of the test statistic exceeds the critical value. Otherwise, do not reject the null hypothesis. In order to draw inference on a mean in one-population case assuming that the entries are normally distributed and the variance is known, Z-test is used. It can be used when the sample size is equal or greater than 30 (n 30). The Z-statistic, Zc, is the test statistic MATH 403- ENGINEERING DATA ANALYSIS used in order to lead for the rejection of null hypothesis in favor of the alternative hypothesis. This is computed as: 𝑋̅ − 𝜇𝑜 𝑧𝑐 = 𝜎/√𝑛 Where 𝑋̅ the computed mean is in the gathered data, 𝜇𝑜 is the hypothesized mean, 𝜎 is the population standard deviation which is known or given and n is the sample size. The critical value is obtained using the z-tabular value. For a two-tailed test, the value of 1-/2 written symbolically as z/2 is considered. Otherwise, for one-tailed test the value of 1- written as z is written. Figure 1. The Normal Distribution or Z- Distribution for Testing the Hypothesis Ho: = o with critical values for (a) H1: o, (b) > o, (c) < o Example 1. A random sample of 100 students enrolled in Statistics course under Professor X shows that the average grade in the midterm examination is 85%. Professor X claims that the average grade of the students in the midterm is at least 80% with a standard deviation of 16%. Is there an evidence to say that the claim is correct at 5% level of significance? Solution: 1. H0 : µ = 80% H1 : µ > 80% MATH 403- ENGINEERING DATA ANALYSIS 2. = 0.05, right-tailed test 𝑋̅ −𝜇𝑜 3. 𝑧𝑐 = 𝜎/√𝑛 4. Critical region: z > 1.645. Reject H0 if zc is greater than 1.645 5. Computing for z-statistic: 𝑋̅ − 𝜇𝑜 𝑧𝑐 = 𝜎 √𝑛 85 − 80 = 16 √100 = 3.125 6. Reject H0 since 3.125 is greater than 1.645 7. Therefore, the Professor claim is correct is 5% level of significance. Using the P-value approach, the P-value corresponding to z = 3.125 is 0.0009 using the table for Areas Under the Normal Curve. This results to an evidence stronger than the 0.05 level of significance in favor of the alternative hypothesis, H1. Example 2. A manufacturer of solar lamp claims that the mean useful life of their new product is 8 months with a standard deviation of 0.5 month. To test this clam, a random sample of 50 solar lamps were tested and found to have a mean life of 7.8 months. Test the hypothesis that = 8 months against the alternative hypothesis that 8 months using 1% level of significance. Solution: 1. H0 : µ = 8 months H1 : µ 8 months 2. = 0.01, two-tailed test MATH 403- ENGINEERING DATA ANALYSIS 𝑋̅ −𝜇𝑜 3. 𝑧𝑐 = 𝜎/√𝑛 4. Critical region: z < -2.575 and z > 2.575. Reject H0 if -2.575 > zc > 2.575 5. Computing for z-statistic: 𝑋̅ − 𝜇𝑜 𝑧𝑐 = 𝜎 √𝑛 7.8 − 8 = 0.5 √50 = −2.8284, 𝑠𝑎𝑦 2.83 6. Reject H0 since -2.83 is less than -2.575 7. Therefore, the mean useful life of the new product is not equal to 8 months. In fact it is less than 8 months at 1% level of significance. Using the P-value approach and considering that this is a two-tailed test, the P-value is twice as the area to the left of z = -2.83. Using the table for Areas Under the Normal Curve, 𝑃 = 𝑃(|𝑧| > 2.83) = 2𝑃(𝑧 < −2.83) = 0.0046