CHAPTER-1 (1).pptx
Document Details
Uploaded by PowerfulMilwaukee
Tags
Full Transcript
Chapter 1 ROLE OF STATISTICS IN ENGINEERING Engineering Data Analysis The field of statistics deals with the collection, presentation, analysis, and use of data to make decisions, solve problems, and design products and processes. It also deals with methods and techniques that can be used to...
Chapter 1 ROLE OF STATISTICS IN ENGINEERING Engineering Data Analysis The field of statistics deals with the collection, presentation, analysis, and use of data to make decisions, solve problems, and design products and processes. It also deals with methods and techniques that can be used to draw conclusions about the characteristics of a large number of data points, commonly called a population by using a smaller subset of the entire data called sample. Because many aspects of engineering practice involve working with data, obviously some knowledge of statistics is important to any engineer. Specifically, statistical techniques can be a powerful aid in designing new products and systems, improving existing designs, and designing, developing, and improving production processes. Engineers apply physical and chemical laws and mathematics to design, develop, test and supervise various products and services. Engineers perform tests to learn how things behave under stress, and at what point they might fail. As engineers perform experiments, they collect data that can be used to explain relationships better and to reveal information about the quality of products and services they provide. Jens Jens Martensson Martensson 2 Enumerative Studies An enumerative study is focused on obtaining information about, and taking action on, specific items contained in a frame, which is a well defined group of physical items. (e.g. sampling from a batch of product to answer the question, “Should the batch be rejected or accepted?”) The statistical inference made from the data is applied to the remaining units in the frame. The goal is not to characterize the process that produced the frame, but to decide and act on the frame. A 100% sample of the units in the frame (i.e. a complete census) will eliminate all uncertainty and provide a complete answer to the question posed in an enumerative study. (e.g. Does the batch contain less than 1% defective?) In an enumerative study, the method of choice to reduce uncertainty is to increase sample size (reduce standard error). Therefore, within enumerative studies there is a rational basis to utilize confidence intervals, significance levels, analysis of variance, etc. Jens Jens Martensson Martensson 3 Enumerative Studies Example Jens Jens Martensson Martensson 4 Analytic Studies The most significant problems in business involve improving performance in the future, thus obtaining information from the system or process under study in order to take appropriate action. (e.g. sampling from a batch of product to answer the question, “Has the process or system changed as a result of our actions?” or “Is the process consistently producing acceptable prduct?”) The statistical inference made from the data is applied to the process. The goal is to characterize the process that produced the frame, not to describe and act on the frame. A 100% sample of the units in a frame will be inconclusive concerning the future performance of the process. In an analytic study, the major source of uncertainty is the dynamics of the process or system under study (i.e. the physics of the process, effects of entropy, human will, and other assignable causes). The method of choice to reduce uncertainty is to study the process over time (control charts), increasing knowledge of the cause system and reducing variability and improving predictability. Therefore, within analytic studies there is no rational basis to utilize confidence intervals, significance levels, analysis of variance, etc. Increasing sample size is of little help and can be costly. Jens Jens Martensson Martensson 5 IN BOFF NES Y TU FUNK OBTAINING DATA Jens Jens Martensson Martensson 6 INTRODUCTION Statistics may be defined as the science that deals with the collection, organization, presentation, analysis, and interpretation of data in order be able to draw judgments or conclusions that help in the decision-making process. The two parts of this definition correspond to the two main divisions of Statistics. These are Descriptive Statistics and Inferential Statistics. Descriptive Statistics, which is referred to in the first part of the definition, deals with the procedures that organize, summarize and describe quantitative data. It seeks merely to describe data. Inferential Statistics, implied in the second part of the definition, deals with making a judgment or a conclusion about a population based on the findings from a sample that is taken from the population. Statistical Terms Population or Universe refers to the totality of objects, persons, places, things used in a particular study. All members of a particular group of objects (items) or people (individual), etc. which are subjects or respondents of a study. Sample is any subset of population or few members of a population. Data are facts, figures and information collected on some characteristics of a population or sample. These can be classified as qualitative or quantitative data. Ungrouped (or raw) data are data which are not organized in any specific way. They are simply the collection of data as they are gathered. Grouped Data are raw data organized into groups or categories with corresponding frequencies. Organized in this manner, the data is referred to as frequency distribution. Jens Jens Martensson Martensson 7 Parameter is the descriptive measure of a characteristic of a population. Statistic is a measure of a characteristic of sample. Constant is a characteristic or property of a population or sample which is common to all members of the group. Variable is a measure or characteristic or property of a population or sample that may have a number of different values. It differentiates a particular member from the rest of the group. It is the characteristic or property that is measured, controlled, or manipulated in research. They differ in many respects, most notably in the role they are given in the research and in the type of measures that can be applied to them Jens Jens Martensson Martensson 8 1. METHOD OF DATA COLLECTION What is Data Collection? Collection of the data is the first step in conducting statistical inquiry. It simply refers to the data gathering, a systematic method of collecting and measuring data from different sources of information in order to provide answers to relevant questions. This involves acquiring information published literature, surveys through questionnaires or interviews, experimentations, documents and records, tests or examinations and other forms of data gathering instruments. Types of Data 1. Primary Data-data which are collected a fresh and for the first time and thus happen to be original in character and known as PRIMARY DATA. 2. Secondary Data-data which have been collected by someone else and which have already been passed through the statistical process. Jens Jens Martensson Martensson 9 METHOD OF DATA COLLECTION: PRIMARY DATA 1. Observation 2. Interview 3. Questionnaire 4. Case Study 5. Survey OBSERVATION METHOD o is a method under which data from the field is collected with the help of observation by the observer or by personally going to the field. Advantages Disadvantages Subjective bias eliminated Time consuming Current information Limited information Independent to respondent’s variable Unforeseen factors Jens Jens Martensson Martensson 10 METHOD OF DATA COLLECTION: PRIMARY DATA TYPES OF OBSERVATION (STRUCTURED AND UNSTRUCTURED) Structured Observation-when observation is done by characterizing style of recording the observed information, standardized conditions of observation , definition of the units to be observed , selection of pertinent data of observation. Example: An auditor performing inventory analysis in store Unstructured Observation-when observation is done without any thought before observation. Example: Observing children playing with new toys. TYPES OF OBSERVATION (PARTICIPANT AND NON- PARTICIPANT) Participant-when the Observer is member of the group which he is observing. Advantages: 1. Observation of natural behaviour 2. Closeness with the group 3. Better understanding None-participant-when observer is observing people without giving any information to them. Advantages: 1. Objectivity and neutrality 2. More willingness of the respondent Jens Jens Martensson Martensson 11 METHOD OF DATA COLLECTION: PRIMARY DATA TYPES OF OBSERVATION (CONTROLLED AND UNCONTROLLED) Controlled-when the observation takes place in natural condition. It is done to get spontaneous picture of life and persons Uncontrolled-when observation takes place according to definite pre arranged plans , with experimental procedure then it is controlled observation generally done in laboratory under controlled condition. INTERVIEW METHOD o This method of collecting data involves presentation or oral verbal stimuli and reply in terms of oral-verbal responses. Interview Method is an oral verbal communication where interviewer asks questions (which are aimed to get information required for study) to respondent. TYPES OF INTERVIEW Personal interviews : The interviewer asks questions generally in a face to face contact to the other person or persons. Structured interviews : in this case, a set of pre- decided questions are there. Unstructured interviews : in this case, we don’t follow a system of pre-determined questions. Focused interviews : attention is focused on the given experience of the respondent and its possible effects. Jens Jens Martensson Martensson 12 METHOD OF DATA COLLECTION: PRIMARY DATA TYPES OF INTERVIEW Clinical interviews : concerned with broad underlying feelings or motivations or with the course of individual’s life experience, rather than with the effects of the specific experience, as in the case of focused interview Group interviews : a group of 6 to 8 individuals is interviewed. Qualitative and quantitative interviews : divided on the basis of subject matter i.e. whether qualitative or quantitative. Individual interviews : interviewer meets a single person and interviews him. Selection interviews : done for the selection of people for certain jobs. Depth interviews : it deliberately aims to elicit unconscious as well as other types of material relating especially to personality dynamics and motivations. Telephonic interviews : contacting samples on telephone. Jens Jens Martensson Martensson 13 METHOD OF DATA COLLECTION: PRIMARY DATA QUESTIONNAIRE METHOD o This method of data collection is quite popular, particularly in case of big enquiries. o The questionnaire is mailed to respondents who are expected to read and understand the questions and write down the reply in the space meant for the purpose in the questionnaire itself. The respondents have to answer the questions on their own. Advantages Disadvantages Low cost even if the geographical area is Low rate of return of duly filled questionnaire. too large Answers are in respondents word so free Slowest method of data collection. from bias. Adequate time to think for answers. Difficult to know if the expected respondent have filled the form or it is filled by someone else. Non approachable respondents may be conveniently contacted. Large samples can be used so results are more reliable. Jens Jens Martensson Martensson 14 METHOD OF DATA COLLECTION: PRIMARY DATA CASE STUDY METHOD o is essentially an intensive investigation of the particular unit under consideration. Advantages Disadvantages They are less costly and less time- They are subject to selection bias consuming; they are advantageous when exposure data is expensive or hard to obtain. They are advantageous when studying They generally do not allow calculation of dynamic populations in which follow-up is incidence (absolute risk). difficult. Jens Jens Martensson Martensson 15 METHOD OF DATA COLLECTION: SECONDARY DATA SOURCES OF DATA o Publications of Central, state , local government o Technical and trade journals o Books, Magazines, Newspaper o Reports & publications of industry ,bank, stock exchange o Reports by research scholars, Universities, economist o Public Records FACTORS TO BE CONSIDERED BEFORE USING SECONDARY DATA Reliability of data – Who, when , which methods, at what time etc. Suitability of data – Object ,scope, and nature of original inquiry should be studied, as if the study was with different objective then that data is not suitable for current study Adequacy of data– Level of accuracy, Area differences then data is not adequate for study Jens Jens Martensson Martensson 16 METHOD OF DATA COLLECTION: SECONDARY DATA SELECTION OF PROPER METHOD FOR COLLECTION OFand Nature ,Scope DATA object of inquiry Availability of Funds Time Factor Precision Required Jens Jens Martensson Martensson 17 SELECTION OF PROPER METHOD FOR COLLECTION OFand Nature ,Scope DATA object of inquiry Availability of Funds Time Factor Precision Required Frequency Distributions The organization of data in tabular form yields frequency distributions. Data in frequency distributions may be grouped or ungrouped. Raw data are collected data that have not been organized numerically an arrangement of raw data in ascending or descending order or magnitude is an array. In an array, any value may appear several times. The number of times a value appears in the listing is its frequency. The relative frequency of any observation is obtained by dividing the actual frequency of the observation by the total Frequency. Classification of Data UNGROUPED DATA- When the data is small (n ≤ 30) or when there are few distinct values, the data may be organized without grouping. Jens Jens Martensson Martensson 18 EXAMPLE 1. A certain machine is to dispense 1.5 kilos of sodium nitrate. To determine whether it is properly adjusted to dispense 1.5 kilos, the quality control engineer weighed 30 bags of sodium nitrate, 1.5 kilo each after the machine was adjusted. The data given below refer to the net weight (in kilos) of each bag. Solution: The data has only 5 distinct values as listed in Table 1.1 found on the next page. Each data value is tallied to determine its frequency. Frequency distribution may also be presented in terms of relative frequency. By definition , relative frequency = f/∑f Hence, 6/30 = 0.20 is the relative frequency of 1.46. The frequency distribution is completed in Table 1.1. Jens Jens Martensson Martensson 19 GROUPED DATA- Statistical data gathered the large masses (n ≥ 30) can be assessed by grouping the data into different classes. The following are suggested steps in forming a frequency distribution from raw data: 1. Find the range (R). The range is the difference between the largest and smallest value. 2. Decide on a suitable number of classes. This will depend upon what information the table is supposed to present. Surge suggested the number of classes (m) as m= 1+3.3 log n where n= number of cases 3. Determine the class size (c). c = R/m The class size (c) may be rounded off to the same place value as the data. 4. Find the number of observations in each class. This is the class frequency (f). CLASS INTERVALS- Classes represent the grouping or classification. The range of values in a class is the class interval consisting of a lower limit and an upper limit. Whenever possible, we must make the class interval of equal width and make the ranges multiples of numbers which are easy to work with such as 6, 10 or 100. Jens Jens Martensson Martensson 20 EXAMPLE 2: The following are data on the observed compressive strength in psi of 50 samples of concrete interlocking blocks. 136 92 115 118 121 137 132 120 104 125 119 115 101 129 87 108 110 133 135 126 127 103 110 126 118 82 104 137 120 95 146 126 119 119 105 132 126 118 100 113 106 125 117 102 146 129 124 113 95 148 PREPARE THE FREQUENCY DISTRIBUTION TABLE. Solution: Range = H – L R = 148 - 82 = 66 m = 1 + 3.3log 50 = 7 Classes c = 66/7 = 9.4 ; use c = 10 since the data values are to the nearest ones. The lowest value is 82. It is convenient to start with 80 as the lower limit of the first class. 80 + 10 = 90 is the lower limit of the second class. The number of observed values tallied in each is the class frequency. The relative frequency of each class is also obtained and presented in Table 1.2. Jens Jens Martensson Martensson 21 Jens Jens Martensson Martensson 22 CLASS MARKS AND CLASS BOUNDERIES The midpoint of the class interval is the class mark. It is half the sum of lower and upper limits of a class. A point that represents halfway, or a dividing point between successive classes is the class boundary. The upper class boundary of the first class is the dividing point between the first class and the second class. The lower class boundary of the second class is the dividing point between the first class and the second class. Thus in Table 1.2, ½ (89 + 90) = 89.5 is the upper class boundary of the first class. This is the lower boundary of the second class. The class mark of the first class is equal to ½ (80 + 89) = 84.5. for the succeeding classes, the class mark may be obtained Classes by adding c = 10 sinceClass the classes have equal widths. Bounderies Class Marks 80-89 79.5-89.5 84.5 90-99 89.5-99.5 94.5 100-109 99.5-109.5 104.5 110-119 109.5-119.5 114.5 120-129 119.5-129.5 124.5 130-139 129.5-139.5 134.5 140-149 139.5-149.5 144.5 Jens Jens Martensson Martensson 23 MEASURES OF CENTRAL TENDENCY: MEAN, MEDIAN AND MODE MEAN The arithmetic mean or simply the mean is the overall average. If the data represent the entire population the mean of the values is referred to as the population mean, μ. This mean is a quantitative measure describing the characteristic of a population and therefore, it is a parameter. If the data constitute a sample drawn from a population, the mean is referred to as the sample mean, ᵪ , which is a statistic. If there are n observations with numerical values x1, x2,…xn, then the sample mean is given by 𝑛 ∑ 𝑥 𝑖=1 𝑥= 𝑛 The population mean, μ is given by 𝑁 ∑ 𝑥 𝑖=1 𝜇= 𝑁 If the values x1, x2,….,xm occurs f1,f2,….,fm times, then 𝑛 ∑ 𝑓𝑖𝑥 𝑖=1 𝑥= 𝑛 Jens Jens Martensson Martensson 24 Example: The following data represent the time in seconds for 9 glued samples to dry and attains its bond strength: 3.6, 2.5, 3.1, 4.3, 2.4, 2.9, 2.5, 4.1 and 3.4. Calculate the mean. 𝑛 ∑ 𝑓𝑖𝑥 𝑖=1 𝑥= 𝑛 28.8 𝑥= =3.2 seconds 9 Example: Find the mean weight of sodium nitrate in Example 1. 𝑛 ∑ 𝑓𝑖𝑥 𝑖=1 𝑥= 𝑛 44.81 𝑥= =1.49 kilos 30 Jens Jens Martensson Martensson 25 MEAN OF GROUPED DATA If the data in a frequency table consist of n observations having m classes with class marks x1, x2,…, xm with class frequencies f1, f2,…fm then the sample mean is 𝑛 ∑ 𝑓𝑖𝑥𝑖 𝑖= 1 𝑥= 𝑛 Example: Find the mean of the observed compressive strength in psi in Example 2. Classes fi xi (Class Mark) fixi 𝑛 80-89 2 84.5 169 ∑ 𝑓𝑖𝑥𝑖 𝑖=1 𝑥= 90-99 3 94.5 283.5 𝑛 100-109 9 104.5 940.5 5875 𝑥= =117.5 psi 110-119 13 114.5 1488.5 50 120-129 13 124.5 1618.5 130-139 7 134.5 941.5 140-149 3 144.5 433.5 50 801.5 5875 Jens Jens Martensson Martensson 26 MEDIAN The median of a set of numbers in an array is either the middle value or the arithmetic mean of two middle values. The sample median ᵪ is used to estimate the population median μ. Example: For the set of numbers 1, 3, 3, 5, 6, 8, 9, 9, 10 Median = 6 Example: For the set of numbers 4, 4, 7, 9, 11, 12, 15, 18 Median = MEDIAN OF GROUPED DATA 𝑛 = lowest class boundary of the median class 𝑀𝑒𝑑𝑖𝑎𝑛= 𝐿𝑚 + 2 [ − ( Σ 𝑓 𝐿) 𝑓𝑚 ] 𝐶 = sum of frequencies of all classes lower than the median classes = frequency of the median class = size of the median class Jens Jens Martensson Martensson 27 Example: Determine the median of the data in example 2. f f Class Boundary 𝑛 50 79.5-89.5 2 2 = =25 ; th observation 2 2 89.5-99.5 3 5 𝐿𝑚 =109.5 99.5-109.5 9 14 Σ 𝑓 𝐿 =14 109.5-119.5 13 27 =13 119.5-129.5 13 40 =10 129.5-139.5 7 47 50 139.5-149.5 𝑛 3 50 𝑀𝑒𝑑𝑖𝑎𝑛=109.5 + 2 13[ − 14 10 ] 𝑀𝑒𝑑𝑖𝑎𝑛= 𝐿𝑚 + 2 [ − ( Σ 𝑓 𝐿) 𝑓𝑚 ]𝐶 𝑴𝒆𝒅𝒊𝒂𝒏=𝟏𝟏𝟕.𝟗𝟔 = lower boundary of the median class = sum of frequencies of all classes lower than the median classes = frequency of the median class = size of the median class Jens Jens Martensson Martensson 28 MODE Mode is the value which occurs with greatest frequency. Example: For the set of numbers 3, 3, 5, 7, 9, 10, 11, 10, 11, 12, 9, 18, 9 Mode = 9 (unimodal) Example: For the set of values 2.2, 3.1, 4.1, 4.1, 5.4, 5.4, 5.4, 5.4, 6.2, 7.7, 7.7, 8.5, 8.5, 8.5, 9.3 Mode =5.4 and 8.5 (bimodal) MODE OF GROUPED DATA 𝑑1 = lower boundary of the modal class 𝑀𝑜𝑑𝑒= 𝐿𝑚𝑜 + [ 𝑑1 + 𝑑 2 ] 𝐶 = difference of the frequency of the modal class and the class previous it = difference of the frequency of the modal class and the class next it = size of the modal class Jens Jens Martensson Martensson 29 Example: Determine the mode of the data in example 2. Classes f Class (110-119) 4 80-89 2 𝐿𝑚 𝑜=1 09.5 𝑀𝑜𝑑𝑒 110 − 119 =10 9.5 + [ ](10) 4 +0 𝑑 1 =13 − 9=4 90-99 3 𝑴𝒐𝒅𝒆𝟏𝟏𝟎 − 𝟏𝟏𝟗 =𝟏𝟏𝟗. 𝟓 𝑑 2 =13− 13=0 100-109 9 110-119 13 =10 120-129 13 Class (120-129) 0 130-139 7 𝐿𝑚 𝑜=119.5 𝑀𝑜𝑑𝑒 120− 129 =119.5 + [ ](10) 6 +0 𝑑 1 =13 − 13=0 140-149 3 𝑴𝒐𝒅𝒆 𝟏𝟏𝟎 − 𝟏𝟏𝟗 =𝟏𝟏 𝟗.𝟓 𝑑 2 =13 − 7=6 𝑑1 𝑀𝑜𝑑𝑒= 𝐿𝑚𝑜 + [ 𝑑1 + 𝑑 2 ] 𝐶 =10 = lower boundary of the modal class = difference of the frequency of the modal class and the class previous it = difference of the frequency of the modal class and the class next it = size of the modal class Jens Jens Martensson Martensson 30 2. PLANNING AND CONDUCTING SURVEY A survey is a method of asking respondents some well-constructed questions. It is an efficient way of collecting information and easy to administer wherein a wide variety of information can be collected. The researcher can be focused and can stick to the questions that interest him and are necessary in his statistical inquiry or study However surveys depend on the respondents honesty, motivation, memory and his ability to respond. Sometimes answers may lead to vague data. Surveys can be done through face-to-face interviews or self-administered through the use of questionnaires. The advantages of face- to-face interviews include fewer misunderstood questions, fewer incomplete responses, higher response rates, and greater control over the environment in which the survey is administered; also, the researcher can collect additional information if any of the respondents’ answers need clarifying. The disadvantages of face-to-face interviews are that they can be expensive and time-consuming and may require a large staff of trained interviewers. In addition, the response can be biased by the appearance or attitude of the interviewer Self-administered surveys are less expensive than interviews. It can be administered in large numbers and does not require many interviewers and there is less pressure on respondents. However, in self-administered surveys, the respondents are more likely to stop participating mid-way through the survey and respondents cannot ask to clarify their answers. There are lower response rates than in personal interviews. Jens Jens Martensson Martensson 31 Designing a Survey When designing a survey, the following steps are useful: 1. Determine the objectives of your survey: What questions do you want to answer? 2. Identify the target population sample: Whom will you interview? Who will be the respondents? What sampling method will you use? 3. Choose an interviewing method: face-to-face interview, phone interview, self administered paper survey, or internet survey. 4. Decide what questions you will ask in what order, and how to phrase them. (This is important if there is more than one piece of information you are looking for.) 5. Conduct the interview and collect the information. 6. Analyze the results by making graphs and drawing conclusions In choosing the respondents, sampling techniques are necessary. Sampling is the process of selecting units (e.g., people, organizations) from a population of interest. Sample must be a representative of the target population. The target population is the entire group a researcher is interested in; the group about which the researcher wishes to draw conclusions. There are two ways of selecting a sample. These are the non-probability sampling and the probability sampling. Jens Jens Martensson Martensson 32 Non-probability Sampling Non-probability sampling is also called judgment or subjective sampling. This method is convenient and economical but the inferences made based on the findings are not so reliable. The most common types of non-probability sampling are the convenience sampling, purposive sampling and quota sampling. Types of Non-probability sampling In convenience sampling, the researcher use a device in obtaining the information from the respondents which favors the researcher but can cause bias to the respondents. In purposive sampling, the selection of respondents is predetermined according to the characteristic of interest made by the researcher. Randomization is absent in this type of sampling. There are two types of quota sampling: proportional and non-proportional. In proportional quota sampling the major characteristics of the population by sampling a proportional amount of each is represented. For instance, if you know the population has 40% women and 60% men, and that you want a total sample size of 100, you will continue sampling until you get those percentages and then you will stop. Non-proportional quota sampling is a bit less restrictive. In this method, a minimum number of sampled units in each category is specified and not concerned with having numbers that match the proportions in the population. Jens Jens Martensson Martensson 33 Probability Sampling In probability sampling, every member of the population is given an equal chance to be selected as a part of the sample. There are several probability techniques. Among these are simple random sampling, stratified sampling and cluster sampling. Types of Probability sampling Stratified Sampling Simple random sampling is the basic sampling technique where a group of subjects (a sample) is selected for study from a larger group (a population). Each individual is chosen entirely by chance and each member of the population has an equal chance of being included in the sample. Every possible sample of a given size has the same chance of selection; i.e. each member of the population is equally likely to be chosen at any stage in the sampling process. Stratified Sampling There may often be factors which divide up the population into sub-populations (groups / strata) and the measurement of interest may vary among the different subpopulations. This has to be accounted for when a sample from the population is selected in order to obtain a sample that is representative of the population. This is achieved by stratified sampling. A stratified sample is obtained by taking samples from each stratum or sub-group of a population. When a sample is to be taken from a population with several strata, the proportion of each stratum in the sample should be the same as in the population. Jens Jens Martensson Martensson 34 Stratified sampling techniques are generally used when the population is heterogeneous, or dissimilar, where certain homogeneous, or similar, sub-populations can be isolated (strata). Simple random sampling is most appropriate when the entire population from which the sample is taken is homogeneous. Some reasons for using stratified sampling over simple random sampling are: 1. the cost per observation in the survey may be reduced; 2. estimates of the population parameters may be wanted for each subpopulation; 3. increased accuracy at given cost. Clustered Sampling Cluster sampling is a sampling technique where the entire population is divided into groups, or clusters, and a random sample of these clusters are selected. All observations in the selected clusters are included in the sample. Constructing a Survey Example: Martha wants to construct a survey that shows which sports students at her school like to play the most. a) List the goal of the survey. The goal of the survey is to find the answer to the question: “Which sports do students at Martha’s school like to play the most?” b) What population sample should she interview? A sample of the population would include a random sample of the student population in Martha’s school. A good strategy would be to randomly select students (using dice or a random number generator) as they walk into an all-school assembly. Jens Jens Martensson Martensson 35 c) How should she administer the survey? Face-to-face interviews are a good choice in this case. Interviews will be easy to conduct since the survey consists of only one question which can be quickly answered and recorded, and asking the question face to face will help eliminate non-response bias. d) Create a data collection sheet that she can use to record her results. In order to collect the data to this simple survey Martha can design a data collection sheet such as the one below: Sports Tally This is a good, simple data collection sheet because: Baseball Plenty of space is left for the tally marks. Basketball Only one question is being asked. Many possibilities are included, but space is left at the bottom in Football case students give answers that Martha didn’t think of. Soccer The answer from each interviewee can be quickly collected and Volleyball then the data collector can move on to the next person. Swimming Once the data has been collected, suitable graphs can be made to display the results. Jens Jens Martensson Martensson 36 Example: Raoul wants to construct a survey that shows how many hours per week the average student at his school works. a) List the goal of the survey. The goal of the survey is to find the answer to the question “How many hours per week do you work?” b) What population sample will he interview? Raoul suspects that older students might work more hours per week than younger students. He decides that a stratified sample of the student population would be appropriate in this case. The strata are grade levels 9th through 12th. He would need to find out what proportion of the students in his school are in each grade level, and then include the same proportions in his sample. c) How would he administer the survey? Face-to-face interviews are a good choice in this case since the survey consists of two short questions which can be quickly answered and recorded. d) Create a data collection sheet that Raoul can use to record his results. In order to collect the data for this survey Raoul designed the data collection sheet shown below: Number of Hours Total Number Grade Level Worked of Students 9th grade This data collection sheet allows Raoul to write down the actual 10 grade th numbers of hours worked per week by students as opposed to 11th grade just collecting tally marks for several categories. 12th grade Jens Jens Martensson Martensson 37 Display, Analyze, and Interpret Statistical Survey Data In the previous section we considered two examples of surveys you might conduct in your school. The first one was designed to find the sport that students like to play the most. The second survey was designed to find out how many hours per week students worked. For the first survey, students’ choices fit neatly into separate categories. Appropriate ways to display the data might be a pie chart or a bar graph. Let’s revisit this example. In Example A Martha interviewed 112 students and obtained the following results. a) Make a bar graph of the results showing the percentage of Sports Tally Percentage students in each category. To make a bar graph, we list the Baseball 31 31/112=28% sport categories on the x−axis Basketball 17 17/112=15% and let the percentage of students be represented by the Football 14 14/112=12.5% y−axis. To find the percentage Soccer 28 28/112=25% of students in each category, we Volleyball 9 9/112=8% divide the number of students in each category by the total Swimming 8 8/112=7% number of students surveyed: Gymnastics 3 3/112=2.5% Now we can make a graph Fencing 2 2/112=2% where the height of each bar 112 represents the percentage of students in each category: Jens Jens Martensson Martensson 38 Display, Analyze, and Interpret Statistical Survey Data b) Make a pie chart of the collected information, showing the percentage of students in each category. To make a pie chart, we find the percentage of the students in each category by dividing the number of students in each category as in part a. The central angle of each slice of the pie is found by multiplying the percentage of students in each category by 360 degrees (the total number of degrees in a circle). To draw a pie-chart by hand, you can use a protractor to measure the central angles that you find for each category. Sports Tally Percentage Central Angle Here is the pie-chart that represents the percentage of Baseball 31 31/112=28% 0.28(360)=101 students in each category: Basketball 17 17/112=15% 0.15(360)=54 Football 14 14/112=12.5 0.125(360)=45 % Soccer 28 28/112=25% 0.25(360)=90 Volleyball 9 9/112=8% 0.08(360)=29 Swimming 8 8/112=7% 0.07(360)=25 Gymnastics 3 3/112=2.5% 0.025(360)=9 Fencing 2 2/112=2% 0.02(360)=7 112 Jens Jens Martensson Martensson 39 3. PLANNING AND CONDUCTING EXPIREMENTS: INTRODUCTION TO DESIGN EXPERIMENTS (DOE) The products and processes in the engineering and scientific disciplines are mostly derived from experimentation. An experiment is a series of tests conducted in a systematic manner to increase the understanding of an existing process or to explore a new product or process. Design of Experiments, or DOE, is a tool to develop an experimentation strategy that maximizes learning using minimum resources. Design of Experiments is widely and extensively used by engineers and scientists in improving existing process through maximizing the yield and decreasing the variability or in developing new products and processes. It is a technique needed to identify the "vital few" factors in the most efficient manner and then directs the process to its best setting to meet the ever-increasing demand for improved quality and increased productivity. of DOE ensures that all factors and their interactions are systematically The methodology investigated resulting to reliable and complete information. There are five stages to be carried out for the design of experiments. These are planning, screening, optimization, robustness testing and verification. Planning It is important to carefully plan for the course of experimentation before embarking upon the process of testing and data collection. At this stage, identification of the objectives of conducting the experiment or investigation, assessment of time and available resources to achieve the objectives. Individuals from different disciplines related to the product or process should compose a team who will conduct the investigation. They are to identify possible factors to investigate and the most appropriate responses to measure. A team approach promotes synergy that gives a richer set of factors to study and thus a more complete experiment. Experiments which are carefully planned always lead to increased understanding of the product or process. Well planned experiments are easy to execute and analyze using the available statistical software. Jens Jens Martensson Martensson 40 Screening Screening experiments are used to identify the important factors that affect the process under investigation out of the large pool of potential factors. Screening process eliminates unimportant factors and attention is focused on the key factors. Screening experiments are usually efficient designs which require few executions and focus on the vital factors and not on interactions Optimization After narrowing down the important factors affecting the process, then determine the best setting of these factors to achieve the objectives of the investigation. The objectives may be to either increase yield or decrease variability or to find settings that achieve both at the same time depending on the product or process under investigation. Robustness Testing Once the optimal settings of the factors have been determined, it is important to make the product or process insensitive to variations resulting from changes in factors that affect the process but are beyond the control of the analyst. Such factors are referred to as noise or uncontrollable factors that are likely to be experienced in the application environment. It is important to identify such sources of variation and take measures to ensure that the product or process is made robust or insensitive to these factors Verification This final stage involves validation of the optimum settings by conducting a few follow up experimental runs. This is to confirm that the process functions as expected and all objectives are achieved. Jens Jens Martensson Martensson 41 NES Y TU FUNK Launch IN BOFF Thank You! SEATWORK NO. 1 Exercise 1. The following are the observed gasoline consumption in miles per gallon of 40 cars. Arrange the data in a frequency distribution. Construct a table for class limits, class boundaries and class marks of the frequency distribution. Determine the mean, median and mode. Exercise 2. The following are the data from the survey, accomplished among the students of university, which answered the question of how many books they read per year. Determine the mean, median and mode. Jens Jens Martensson Martensson 43