EDATERMS PDF
Document Details
Uploaded by IrresistibleElegy
Tags
Summary
This document provides an overview of statistical concepts and methods, covering terms, data collection techniques, and experimental design. It's suitable for those working in fields needing statistical knowledge, potentially within a professional or academic context, such as engineering or research.
Full Transcript
TERMS CHAPTER 1 Statistics may be defined as the science that deals with the collection, organization, presentation, analysis, and interpretation of data in order to be able to draw judgments or conclusions that help in the decision-making process. The two parts of this definition correspond to th...
TERMS CHAPTER 1 Statistics may be defined as the science that deals with the collection, organization, presentation, analysis, and interpretation of data in order to be able to draw judgments or conclusions that help in the decision-making process. The two parts of this definition correspond to the two main divisions of Statistics. These are Descriptive Statistics and Inferential Statistics. Descriptive Statistics, deals with the procedures that organize, summarize and describe quantitative data. It seeks merely to describe data. Inferential Statistics, deals with making a judgment or a conclusion about a population based on the findings from a sample that is taken from the population. Statistical Terms Population or Universe refers to the totality of objects, persons, places, things used in a particular study. All members of a particular group of objects (items) or people (individual), etc. which are subjects or respondents of a study. Sample is any subset of population or few members of a population. Data are facts, figures and information collected on some characteristics of a population or sample. These can be classified as qualitative or quantitative data. Ungrouped (or raw) data are data which are not organized in any specific way. They are simply the collection of data as they are gathered. Grouped Data are raw data organized into groups or categories with corresponding frequencies. Organized in this manner, the data is referred to as frequency distribution. Parameter is the descriptive measure of a characteristic of a population Statistic is a measure of a characteristic of sample Constant is a characteristic or property of a population or sample which is common to ali members of the group. Variable is a measure or characteristic or property of a population or sample that may have a number of different values. it differentiates a particular member from the rest of the group. It is the characteristic or property that is measured, controlled, or manipulated in research. Methods of Data Collection Collection of the data is the first step in conducting statistical inquiry. It simply refers to data gathering, a systematic method of collecting and measuring data from different sources of information in order to provide answers to relevant questions. Investigator is the person who conducts the inquiry, the one who helps in collecting information is an enumerator and information is collected from a respondent. Data can be primary or secondary. According to Wessel, "Data collected in the process of investigation are known as primary data." These are collected for the investigator's use from the primary source. Secondary data, on the other hand, is collected by some other organization for their own use but the investigator also gets it for his use. According to M.M. Blair, "Secondary data are those already in existence for some other purpose than answering the question in hand." In the field of engineering, the three basic methods of collecting data are through retrospective study, observational study and through a designed experiment. A retrospective study would use the population or sample of the historical data which had been archived over some period of time. It may involve a significant amount of data but those data may contain relatively little useful information about the problem, some of the relevant data may be missing and recording errors. Observational study process or population is observed and disturbed as little as possible, and the quantities of interest are recorded. Tests or Experiments are important and almost always necessary to be conducted to confirm the applicability and validity of the theory in a specific situation or environment. Designed experiment deliberate or purposeful changes in the controllable variables of the system or process is done. The resulting system output data must be observed, and an inference or decision about which variables are responsible for the observed changes in output performance is made. Designed experiments are very important in engineering design and development and in the improvement of manufacturing processes in which statistical thinking and statistical methods play an important role in planning, conducting, and analyzing the data. Planning and Conducting Surveys Survey is a method of asking respondents some well-constructed questions. It is an efficient way of collecting information and easy to administer wherein a wide variety of information can be collected. Surveys can be done through face-to-face interviews or self-administered through the use of questionnaires. The advantages of face-to-face interviews include fewer misunderstood questions, fewer incomplete responses, higher response rates, and greater control over the environment. The disadvantages of face-to-face interviews are that they can be expensive and time-consuming and may require a large staff of trained interviewers. In addition, the response can be biased by the appearance or attitude of the interviewer. Self-administered survey’s advantages are less expensive than interviews. It can be administered in large numbers and does not require many interviewers and there is less pressure on respondents. Their disadvantages however, is that in self-administered surveys, the respondents are more likely to stop participating midway through the survey and the investigators cannot ask to clarify their answers. They also have lower response rates in comparison to interviews. When designing a survey, the following steps are useful: 1. Determine the objectives of your survey: What questions do you want to answer? 2. Identify the target population sample: Whom will you interview? Who will be the respondents? What sampling method will you use? 3. Choose an interviewing method: face-to-face interview, phone interview, self- administered paper survey, or internet survey. 4. Decide what questions you will ask in what order, and how to phrase them. 5. Conduct the interview and collect the information. 6. Analyze the results by making graphs and drawing conclusions. Sampling is the process of selecting units (e.g., people, organizations) from a population of interest. Sample must be representative of the target population. Target population is the entire group a researcher is interested in. There are two ways of selecting a sample. These are the non-probability sampling and the probability sampling. Non-probability sampling is also called judgment or subjective sampling. This method is convenient and economical but the inferences made based on the findings are not so reliable. The most common types of non-probability sampling are the convenience sampling, purposive sampling, and quota sampling. In convenience sampling, the researcher uses a device in obtaining the information from the respondents which favors the researcher but can cause bias to the respondents. In purposive sampling, the selection of respondents is predetermined according to the characteristic of interest made by the researcher. Randomization is absent in this type of sampling. There are two types of quota sampling: proportional and non proportional. In proportional quota sampling the major characteristics of the population by sampling a proportional amount of each is represented. (For instance, if you know the population has 40% women and 60% men, and that you want a total sample size of 100, you will continue sampling until you get those percentages and then you will stop) Non-proportional quota sampling is a bit less restrictive. In this method, a minimum number of sampled units in each category is specified and not concerned with having numbers that match the proportions in the population. Probability sampling, every member of the population is given an equal chance to be selected as a part of the sample. There are several probability techniques. Among these are simple random sampling, stratified sampling and cluster sampling. Simple random sampling is the basic sampling technique where a group of subjects (a sample) is selected for study from a larger group (a population). Each individual is chosen entirely by chance and each member of the population has an equal chance of being included in the sample. Stratified Sampling is obtained by taking samples from each stratum or sub-group of a population. When a sample is to be taken from a population with several strata, the proportion of each stratum in the sample should be the same as in the population. Stratified sampling techniques are generally used when the population is heterogeneous, or dissimilar, where certain homogeneous, or similar, sub-populations can be isolated (strata). Simple random sampling is most appropriate when the entire population from which the sample is taken is homogeneous. Some reasons for using stratified sampling over simple random sampling are: 1. the cost per observation in the survey may be reduced; 2. estimates of the population parameters may be wanted for each subpopulation; 3. increased accuracy at given cost. Cluster sampling is a sampling technique where the entire population is divided into groups, or clusters, and a random sample of these clusters are selected. All observations in the selected clusters are included in the sample. Planning and Conducting Experiments: Introduction to Design of Experiments There are five stages to be carried out for the design of experiments. These are planning, screening, optimization, robustness testing and verification. 1. Planning. At this stage, identification of the objectives of conducting the experiment or investigation, assessment of time and available resources to achieve the objectives. Individuals from different disciplines related to the product or process should compose a team who will conduct the investigation. They are to identify possible factors to investigate and the most appropriate responses to measure. 2. Screening. used to identify the important factors that affect the process under investigation out of the large pool of potential factors. Screening process eliminates unimportant factors and attention is focused on the key factors. Screening experiments are usually efficient designs which require few executions and focus on the vital factors and not on interactions. 3. Optimization. (also known as the adjusting period) After narrowing down the important factors affecting the process, then determine the best setting of these factors to achieve the objectives of the investigation. The objectives may be to either increase yield or decrease variability or to find settings that achieve both at the same time depending on the product or process under investigation. 4. Robustness Testing. Once the optimal settings of the factors have been determined, it is important to make the product or process insensitive to variations resulting from changes in factors that affect the process but are beyond the control of the analyst. Such factors are referred to as noise or uncontrollable factors that are likely to be experienced in the application environment. 5. Verification. This final stage involves validation of the optimum settings by conducting a few follow- up experimental runs. This is to confirm that the process functions as expected and all objectives are achieved. CHAPTER 2 Probability is simply how likely an event is to happen. "The chance of rain today is 50%" is a statement that enumerates our thoughts on the possibility of rain. The likelihood of an outcome is measured by assigning a number from the interval [0, 1] or as a percentage from 0% to 100%. A zero (0) probability indicates that the outcome is impossible to happen while the probability of one (1) indicates that the outcome will occur inevitably. Experiment used to describe any process that generates a set of data Event consists of a set of possible outcomes of a probability experiment. Can be one or more than one outcome/s. Simple event, an event with one outcome. Compound event is an event with more than one outcome. Sample Space and Relationships among Events Sample space is the set of all possible outcomes or results of a random experiment. Sample space is represented by letter S. Each outcome in the sample space is called an element of that set. An event is the subset of this sample space and it is represented by Null space is a subset of the sample space that contains no elements and is denoted by the symbol Ø. It is also called empty space. Intersection of events The intersection of two events A and B is denoted by the symbol A ∩ B. It is the event containing all elements that are common to A and B. Mutually Exclusive Events we can say that an event is mutually exclusive if they have no elements in common. The intersection of a mutually exclusive event is immediately a null space. Union of Events is the event containing all the elements that belong to A or to B or to both and is denoted by the symbol A u B. Complement of an Event A with respect to S is the set of all elements of S that are not in A and is denoted by A'. n (S) represents number of elements in a sample space of an experiment n (E) represents a number of elements in the event set; and P (E) represents the probability of an event. Rules of Probability Two events are mutually exclusive or disjoint if they cannot occur at the same time. The probability that Event A occurs, given that Event B has occurred, is called a conditional probability. The conditional probability of Event A, given Event B, is denoted by the symbol P (A l B). The complement of an event is the event not occurring. The probability that Event A will not occur is denoted by P (A'). The probability that Events A and B both occur is the probability of the intersection of A and B. The probability of the intersection of Events A and B is denoted by P(A ∩ B). If Events A and B are mutually exclusive, P(A ∩ B) = 0. The probability that Events A and B occur is the union of A and B. If the occurrence of Event A changes the probability of Event B, then Events A and B are dependent. If the occurrence of Event A does not change the probability of Event B then they are independent. Dependent - two outcomes are said to be dependent if knowing that one of the outcomes has occurred affects the probability that the other occurs Conditional Probability - an event B in relationship to an event A is the probability that event B occurs after event A has already occurred. Rule of Subtraction - The probability that event A will occur is equal to 1 minus the probability that event A will not occur. Permutation Rule - this is a situation in which the order matters as the permutation is the arrangement of elements in a distinct order. Combination Rule - This rule is to be used when the order does not matter. CHAPTER 3 Discrete Probability Distribution describes the probability of occurrence of each value of a discrete random variable. A discrete random variable is a random variable that has countable values, such as a list of non-negative integers. This also uses the listing method. Random Variables and Their Probability Distributions Random Variables is a variable whose value is subject to variations due to chance (i.e. randomness, in a mathematical sense). As opposed to other mathematical variables, a random variable conceptually does not have a single, fixed value (even if unknown); rather, it can take on a set of possible different values, each with an associated probability. Discrete Random Variables can take on either a finite or at most a countably infinite set of discrete values (for example, the integers). Their probability distribution is given by a probability mass function which directly maps each value of the random variable to a probability. Discrete Probability Distribution shows the probability mass function of a discrete probability distribution. The probabilities of the singletons {1), {3), and (7) are respectively 0.2, 0.5, 0.3. A set not containing any of these points has probability zero. Probability Distributions for Discrete Random Variables can be displayed as a formula, in a table, or in a graph. Examples of discrete random variables include: The number of eggs that a hen lays in a given day (it can't be 2.3) The number of people going to a given soccer match The number of students that come to class on a given day The number of people in line at McDonald's on a given day and time Probability Histogram - This displays the probabilities of each of the three random variables. The pmf has the same purpose as the probability histogram, and displays specific probabilities for each discrete random variable. Probability Mass Function - This shows the graph of a probability mass function. All the values of this function must be non-negative and must sum up to one. Expected Values of Random Variables The expected value of a random variable is the weighted average of all possible values that this random variable can take on. A Discrete Random Variable has a countable number of possible values. The probability distribution of a discrete random variable X lists the values and their probabilities, such that xi has a probability of pi. The probabilities pi must satisfy two requirements: 1. Every probability pi is a number between 0 and 1. 2. The sum of the probabilities is 1: p1+p+...+p,= 1. Expected Value Definition in probability theory, the expected value (or expectation, mathematical expectation, EV, mean, or first moment) of a random variable is the weighted average of all possible values that this random variable can take on. Binomial Experiment is a statistical experiment that has the following properties: The experiment consists of n repeated trials. Each trial can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure. The probability of success, denoted by P, is the same on every trial. The trials are independent; that is, the outcome on one trial does not affect the outcome on other trials. The following notation is helpful, when we talk about binomial probability. X: The number of successes that result from the binomial experiment. n: The number of trials in the binomial experiment. P: The probability of success on an individual trial. Q: The probability of failure on an individual trial. (This is equal to 1 - P.) ni: The factorial of n (also known as n factorial). b (x; n, P): Binomial probability - the probability that an n-trial binomial experiment results in exactly x successes, when the probability of success on an individual triai is P. nCr: The number of combinations of n things, taken r at a time. A binomial random variable is the number of successes x in repeated trials of a binomial experiment. The probability distribution of a binomial random variable is called a binomial distribution. The binomial probability refers to the probability that a binomial experiment results in exactly x successes. A cumulative binomial probability refers to the probability that the binomial random variable falls within a specified range (e.., is greater than or equal to a stated lower limit and less than or equal to a stated upper limit). The Poisson Distribution A Poisson distribution is the probability distribution that results from a Poisson experiment. A Poisson experiment is a statistical experiment that has the following properties: The experiment results in outcomes that can be classified as successes or failures. The average number of successes ( μ) that occur in a specified region is known. The probability that a success will occur is proportional to the size of the region. The probability that a success will occur in an extremely small region is virtually zero. Note that the specified region could take many forms. For instance, it could be a length, an area, a volume, a period of time, etc. Notation The following notation is helpful, when we talk about the Poisson distribution. e: A constant equal to approximately 2.71828. (Actually, e is the base of the natural logarithm system.) p: The mean number of successes that occur in a specified region. X: The actual number of successes that occur in a specified region. P (x; μ): The Poisson probability that exactly x successes occur in a Poisson experiment, when the mean number of successes is p. A Poisson random variable is the number of successes that result from a Poisson experiment. The probability distribution of a Poisson random variable is called a Poisson distribution. A cumulative Poisson probability refers to the probability that the Poisson random variable is greater than some specified lower limit and less than some specified upper limit. CHAPTER 5 If X and Y are two discrete random variables, the probability distribution for their simultaneous occurrence can be represented by a function with values f(x,y) for any pair of values (x,y) within the range of the random variables X and Y. It is customary to refer to this function as the joint probability distribution of X and Y. Continuous case - The case where both variables are continuous is obtained easily by analogy with discrete case on replacing sums by integrals. Marginal Probability Distributions - If more than one random variable is defined in a random experiment, it is important to distinguish between the joint probability distribution of X and Y and the probability distribution of each variable individually. The individual probability distribution of a random variable is referred to as its marginal probability distribution. The term marginal could be used if the values of g(x) and h(y) are just marginal totals of the respective columns and rows when the values of f(x,y) are displayed in a rectangular table. Conditional Probability Distributions are a special type of distribution in the form of f(x,y)/g(x) in order to be able to effectively compute conditional probabilities.