Engineering Data Analysis Module 1 PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This is a learning module on engineering data analysis, focusing on obtaining data and different data collection methods. It introduces descriptive and inferential statistics, along with examples and activities.
Full Transcript
ES209 Engineering Data Analysis Module No. 01 Topic OBTAINING DATA Period Week no. 01 Date: OBTAINING DATA Introduction Hello dear young engineers! Welcome to this module on Engin...
ES209 Engineering Data Analysis Module No. 01 Topic OBTAINING DATA Period Week no. 01 Date: OBTAINING DATA Introduction Hello dear young engineers! Welcome to this module on Engineering Data Analysis. This module will help you understand how to obtain data by following the methods, planning, and conducting. But first we will define what is statistics? Statistics may be defined as the science that deals with the collection, organization, presentation, analysis, and interpretation of data in order be able to draw judgments or conclusions that help in the decision-making process. The two parts of this definition correspond to the two main divisions of Statistics. These are Descriptive Statistics and Inferential Statistics. Descriptive Statistics, which is referred to in the first part of the definition, deals with the procedures that organize, summarize and describe quantitative data. It seeks merely to describe data. Inferential Statistics, implied in the second part of the definition, deals with making a judgment or a conclusion about a population based on the findings from a sample that is taken from the population. Objective/Intended Learning Outcomes At the end of this module, you are expected to: Demonstrate understanding of the different methods of obtaining data. Explain the procedures for planning and conducting surveys and experiment. 1 WHAT IS DATA ? ▪ “Dataa measurement or characteristics are values of qualitative of an item. or quantitative variables, belonging to a set of items.” Set of items sometimes called the population: the set of objects you are interested in. Variables a measurement or characteristics of an item. qualitative A categorical variable. A variable that is not numerical. It describes data that fits into categories. Example: Eye colors (variables include: blue, green, brown, hazel). States (variables include: Florida, New Jersey, Washington). Dog breeds (variables include: Alaskan Malamute, German Shepherd, Siberian Husky, Shih tzu). quantitative measurement variable or numerical variables Example: counts, percent, or numbers. Practice Activity 01 Determine whether the following is a qualitative or a quantitative variable. Write QLV if qualitative variable and QTV if quantitative variable. 1. High school Grade Point Average (e.g. 4.0, 3.2, 2.1). 2. Number of pets owned (e.g. 1, 2, 4). 3. How many cousins you have (e.g. 0, 12, 22). 4. Your race (e.g. Asian, Latino, black). 5. Party affiliation (e.g. Republican, Democrat, Independent). What have you observed with the first three statements? How the statements four and five? The general rule of thumb: if you can add it, it is quantitative, if you cannot add something, then it is qualitative. 2 WHAT DO DATA LOOK LIKE ? Figure 1: Image Figure 2: Music Figure 3: News Figure 4: Excel File Practice Activity 02 Give five (5) example of data you use in daily life. ___________________________ ___________________________ ___________________________ ___________________________ ___________________________ 3 METHODS OF DATA COLLECTION Data collection is the process of gathering and measuring information on variables of interest, in an established systematic fashion that enables one to answer stated research questions, test hypotheses, and evaluate outcomes. TYPES OF DATA PRIMARY DATA data which are collected fresh and for the first time and thus happen to be original in character and known as PRIMARY DATA. SECONDARY data which have been collected by someone else and which have already DATA been passed through the statistical process. 4 METHODS OF DATA COLLECTION |Primary Data 1. Observation is a method under which data from the field. is collected with the help of observation by the observer or by personally going to the field. Advantages Disadvantages Subjective bias eliminated Time consuming Current information Limited information Independent to respondent’s Unforeseen factors variable Types of Observation 1a. Structured Observation when observation is done by characterizing the style of recording the observed information, standardized conditions of observation, the definition of the units to be observed, and selection of pertinent data of observation. Example: An auditor performing inventory analysis in store 1b. Unstructured Observation when observation is done without any thought before observation. Example: Observing children playing with new toys. 1c. Participant when the observer is a member of the group he is observing. Advantages: 1. Observation of natural behavior. 2. Closeness with the group. 3. Better understanding. 1d. Non-participate when the observer is observing people without giving any information to them. Advantages: 1. Objectivity and neutrality. 2. More willingness of the respondent. 5 Types of Observation 1e. Uncontrolled when the observation takes place in natural conditions. It is done to get a spontaneous picture of life and persons. 1f. Controlled when an observation takes place according to definite pre-arranged plans, with the experimental procedure then it is a controlled observation generally done in the laboratory under controlled conditions. 2. Interview This method of collecting data involves presentation or oral-verbal stimuli and replies in terms of oral-verbal responses. The interview method is an oral verbal communication where the interviewer asks questions (which are aimed to get information required for study) to respondent. Types of Interview 1. Personal interviews The interviewer asks questions generally in a face to face contact with the other person or persons. 2. Structured interviews In this case, a set of pre-decided questions are there. 3. Unstructured interviews In this case, we don’t follow a system of pre-determined questions. Attention is focused on the given experience of the 4. Focused interviews respondent and its possible effects. 5. Clinical interviews concerned with broad underlying feelings or motivations or with the course of an individual’s life experience, rather than with the effects of the specific experience, as in the case of a focused interview. 6 Types of Interview 6. Group interviews A group of 6 to 8 individuals is interviewed. 7. Qualitative and Divided on the basis of subject matter i.e. whether quantitative interviews qualitative or quantitative. 8. Individual interviews The interviewer meets a single person and interviews him. 9. Selection interviews Done for the selection of people for certain jobs. 10. Depth interviews It deliberately aims to elicit unconscious as well as other types of material relating especially to personality dynamics and motivations 11. Telephonic interviews Contacting samples on telephone 3. Questionnaire This method of data collection is quite popular, particularly in the case of big enquiries. Is mailed to respondents who are expected to read and understand the questions and write down the reply in the space meant for the purpose of the questionnaire itself. The respondents have to answer the questions on their own. Advantages Disadvantages Low cost even if the geographical area is Low rate of return of duly filled too large questionnaire. Answers are in respondents word so free Slowest method of data collection. from bias. Adequate time to think for answers. Difficult to know if the expected respondent have filled the form or it is filled by someone else. Non approachable respondents may be conveniently contacted. Large samples can be used so results are more reliable. 7 METHODS OF DATA COLLECTION |Primary Data 4. Case Study is essentially an intensive investigation of the particular unit under consideration. Advantages Disadvantages They are less costly and less They are subject to selection bias time-consuming; they are advantageous when exposure data is expensive or hard to obtain. They are advantageous when They generally do not allow studying dynamic populations in calculation of incidence (absolute which follow-up is difficult. risk). 5. Survey is one of the common methods of diagnosing and solving social problems is by undertaking surveys. Advantages Disadvantages Relatively easy to administer Respondents may not feel encouraged to provide accurate, honest answers Can be developed in less time Surveys with closed-ended questions (compared to other data-collection may have a lower validity rate than methods) other question types. Cost-effective, but cost depends on Data errors due to question survey mode non-responses may exist. Practice Activity 03 Define the following: Registration Method Experimentation Method 8 METHODS OF DATA COLLECTION |Secondary Data Sources of Data Publications of Central, state, and local government Technical and trade journals Books, Magazines, Newspaper Reports & publications of industry, bank, stock exchange Reports by research scholars, Universities, economist Public Records Factors to considered before using Secondary Data Reliability of data Who, when, which methods, at what time etc. Object, scope, and nature of original inquiry should be studied, as if the Suitability of data study was with a different objective then that data is not suitable for the current study Adequacy of data Level of accuracy, Area differences then data are not adequate for the study 9 Factors to consider when choosing a Data collection methods There are various factors to consider when choosing a data collection method. As such the researcher must judiciously select the method/methods for his own study, keeping in view the following factors: Nature, scope and object of inquiry This constitutes the most important factor affecting the choice of a particular method. The method selected should be such that it suits the type of enquiry that is to be conducted by the researcher. This factor is also important in deciding whether the data already available (secondary data) are to be used or the data not yet available (primary data) are to be collected. Availability of funds The availability of funds for the research project determines to a large extent the method to be used for the collection of data. When funds at the disposal of the researcher are very limited, he will have to select a comparatively cheaper method which may not be as efficient and effective as some other costly method. Finance, in fact, is a big constraint in practice and the researcher has to act within this limitation. Time factor Availability of time has also to be taken into account in deciding a particular method of data collection. Some methods take relatively more time, whereas with others the data can be collected in a comparatively shorter duration. The time at the disposal of the researcher, thus, affects the selection of the method by which the data are to be collected. Precision required Precision required is yet another important factor to be considered at the time of selecting the method of collection of data. 10 Designing a Survey Surveys can take different forms. They can be used to ask only one question or they can ask a series of questions. We can use surveys to test out people’s opinions or to test a hypothesis. When designing a survey, the following steps are useful: 1. Determine the goal of your survey: What question do you want to answer? 2. Identify the sample population: Whom will you interview? 3. Choose an interviewing method: face-to-face interview, phone interview, self-administered paper survey, or internet survey. 4. Decide what questions you will ask in what order, and how to phrase them. (This is important if there is more than one piece of information you are looking for.) 5. Conduct the interview and collect the information. 6. Analyze the results by making graphs and drawing conclusions. Example: Martha wants to construct a survey that shows which sports students at her school like to play the most. Step 1: List the goal of the survey Step 2: What population should she interview? Step 3: How should she administer the survey? Step 4: Create a data collection sheet that she can use to record her results Step 1: GOAL The goal of the survey is to find the answer to the question: “Which sports do students at Martha’s school like to play the most?” Step 2: POPULATION A sample of the population would include a random sample of the student population in Martha’s school. A good strategy would be to randomly select students (using dice or a random number generator) as they walk into an all-school assembly Step 3: METHODS Face-to-face interviews are a good choice in this case. Interviews will be easy to conduct since the survey consists of only one question which can be quickly answered and recorded, and asking the question face to face will help eliminate non-response bias. Step 4: DATA 11 Basis of Conducting Experiment 1. With an experiment, the researcher is trying to learn something new about the world, an explanation of 'why' something happens. 2. The experiment must maintain internal and external validity, or the results will be useless. 3. When designing an experiment, a researcher must follow all of the steps of the scientific method, from making sure that the hypothesis is valid and testable, to using controls and statistical tests. 12 Introduction to Design of Experiments (DOE) What is the Scientific Method? Do you remember learning about this back in high school or junior high even? What were those steps again? Decide what phenomenon you wish to investigate. Specify how you can manipulate the factor and hold all other conditions fixed, to insure that these extraneous conditions aren't influencing the response you plan to measure. Then measure your chosen response variable at several (at least two) settings of the factor under study. If changing the factor causes the phenomenon to change, then you conclude that there is indeed a cause-and-effect relationship at work. How many factors are involved when you do an experiment? Some say two - perhaps this is a comparative experiment? Perhaps there is a treatment group and a control group? If you have a treatment group and a control group then, in this case, you probably only have one factor with two levels. How many of you have baked a cake? What are the factors involved to ensure a successful cake? Factors might include preheating the oven, baking time, ingredients, amount of moisture, baking temperature, etc.-- what else? You probably follow a recipe so there are many additional factors that control the ingredients - i.e., a mixture. In other words, someone did the experiment in advance! What parts of the recipe did they vary to make the recipe a success? Probably many factors, temperature and moisture, various ratios of ingredients, and the presence or absence of many additives. Now, should one keep all the factors involved in the experiment at a constant level and just vary one to see what would happen? This is a strategy that works but is not very efficient. This is one of the concepts that we will address in this course. “All experiments are designed experiments, it is just that some are poorly designed and some are well-designed.” What is your thought about this quote, young engineers? 13 Engineering Experiments If we had infinite time and resource budgets there probably wouldn't be a big fuss made over designing experiments. In production and quality control we want to control the error and learn as much as we can about the process or the underlying theory with the resources at hand. From an engineering perspective we're trying to use experimentation for the following purposes: reduce time to design/develop new products & processes improve performance of existing processes improve reliability and performance of products achieve product & process robustness perform an evaluation of materials, design alternatives, setting component & system tolerances, etc. We always want to fine-tune or improve the process. In today's global world this drive for competitiveness affects all of us both consumers and producers. Robustness is a concept that enters into statistics at several points. In the analysis, stage robustness refers to a technique that isn't overly influenced by bad data. Even if there is an outlier or bad data you still want to get the right answer. Regardless of who or what is involved in the process - it is still going to work. Every experiment design has input. Back to the cake baking example: we have our ingredients such as flour, sugar, milk, eggs, etc. Regardless of the quality of these ingredients we still want our cake to come out successfully. In every experiment there are inputs and in addition, there are factors (such as time of baking, temperature, the geometry of the cake pan, etc.), some of which you can control and others that you can't control. The experimenter must think about factors that affect the outcome. We also talk about the output and the yield or the response to your experiment. For the cake, the output might be measured as texture, height, size, or flavor. 14 The Basic Principles of DOE Randomization This is an essential component of any experiment that is going to have validity. If you are doing a comparative experiment where you have two treatments, a treatment and a control, for instance, you need to include in your experimental process the assignment of those treatments by some random process. An experiment includes experimental units. You need to have a deliberate process to eliminate potential biases from the conclusions, and random assignment is a critical step. Replication Blocking Blocking is a technique to include other factors in our experiment which contribute to undesirable variation. Much of the focus in this class will be to creatively use various blocking techniques to control sources of variation that will reduce error variance. For example, in human studies, the gender of the subjects is often an important factor. Age is another factor affecting the response. Age and gender are often considered nuisance factors which contribute to the variability and make it difficult to assess the systematic effects of treatment. By using these as blocking factors, you can avoid biases that might occur due to differences between the allocation of subjects to the treatments, and as a way of accounting for some noise in the experiment. We want the unknown error variance at the end of the experiment to be as small as possible. Our goal is usually to find out something about a treatment factor (or a factor of primary interest), but in addition to this, we want to include any blocking factors that will explain variation. 15 The Basic Principles of DOE Multi-factor Designs Confounding Confounding is something that is usually considered bad! Here is an example. Let's say we are doing a medical study with drugs A and B. We put 10 subjects on drug A and 10 on drug B. If we categorize our subjects by gender, how should we allocate our drugs to our subjects? Let's make it easy and say that there are 10 male and 10 female subjects. A balanced way of doing this study would be to put five males on drug A and five males on drug B, five females on drug A and five females on drug B. This is a perfectly balanced experiment such that if there is a difference between males and females at least it will equally influence the results from drug A and the results from drug B. An alternative scenario might occur if patients were randomly assigned treatments as they came in the door. At the end of the study, they might realize that drug A had only been given to the male subjects and drug B was only given to the female subjects. We would call this design totally confounded. This refers to the fact that if you analyze the difference between the average response of the subjects on A and the average response of the subjects on B, this is exactly the same as the average response of males and the average response of females. You would not have any reliable conclusion from this study at all. The difference between the two drugs A and B might just as well be due to the gender of the subjects since the two factors are totally confounded. Confounding is something we typically want to avoid but when we are building complex experiments we sometimes can use confounding to our advantage. We will confound things we are not interested in order to have more efficient experiments for the things we are interested in. This will come up in multiple factor experiments later on. We may be interested in main effects but not interactions so we will confound the interactions in this way in order to reduce the sample size, and thus the cost of the experiment, but still has good information on the main effects. 16 Steps for Planning, Conducting and Analyzing an Experiment The practical steps needed for planning and conducting an experiment include: recognizing the goal of the experiment, choice of factors, choice of response, choice of the design, analysis and then drawing conclusions. This pretty much covers the steps involved in the scientific method. 1. Recognition and statement of the problem 2. Choice of factors, levels, and ranges 3. Selection of the response variable(s) 4. Choice of design 5. Conducting the experiment 6. Statistical analysis 7. Drawing conclusions, and making recommendations What this topic will deal with primarily is the choice of design. This focus includes all the related issues about how we handle these factors in conducting our experiments. Factors We usually talk about "treatment" factors, which are the factors of primary interest to you. In addition to treatment factors, there are nuisance factors which are not your primary focus, but you have to deal with them. Sometimes these are called blocking factors, mainly because we will try to block these factors to prevent them from influencing the results. There are other ways that we can categorize factors: Experimental vs. Classification Factors Experimental Factors Classification Factors These are factors that you can specify (and set These can't be changed or assigned, these the levels) and then assign at random as the come as labels on the experimental units. The treatment to the experimental units. age and sex of the participants are Examples would be temperature, level of an classification factors which can't be changed additive fertilizer amount per acre, etc. or randomly assigned. But you can select individuals from these groups randomly. 17 Steps for Planning, Conducting and Analyzing an Experiment Quantitative vs. Qualitative Factors Quantitative Factors Qualitative Factors You can assign any specified level of a These factors have categories which are quantitative factor. Examples: percent or pH different types. Examples might be species of level of a chemical. a plant or animal, a brand in the marketing field, gender, - these are not ordered or continuous but are arranged perhaps in sets. You are finally done with Module 1 ! Hop on for more exciting and challenging activities in Module 2! 18 References Dodge, Y.; Cox, D.; Commenges, D.; Davidson, A; Solomon, P.; and Wilson, S. (Eds.). The Oxford Dictionary of Statistical Terms, 6th Edition. New York: Oxford University Press, 2006. Beyer, W. H. CRC Standard Mathematical Tables, 31st ed. Boca Raton, FL: CRC Press, pp. 536 and 571, 2002. Agresti A. (1990) Categorical Data Analysis. John Wiley and Sons, New York. Kotz, S.; et al., eds. (2006), Encyclopedia of Statistical Sciences, Wiley. Lindstrom, D. (2010). Schaum’s Easy Outline of Statistics, Second Edition (Schaum’s Easy Outlines) 2nd Edition. McGraw-Hill Education Selection of appropriate method for data collection in research methodology tutorial 04 September 2022 - learn selection of appropriate method for data collection in research methodology tutorial (11495): Wisdom Jobs India. Wisdom Jobs. (n.d.). Retrieved September 4, 2022, from https://www.wisdomjobs.com/e-university/research-methodology-tutorial-355/selection-of- appropriate-method-for-data-collection-11495.html Lesson 1: Introduction to design of experiments: Stat 503. PennState: Statistics Online Courses. (n.d.). Retrieved September 4, 2022, from https://online.stat.psu.edu/stat503/lesson/1 19