Engineering Data Analysis Module 1 PDF
Document Details
Polytechnic University of the Philippines
Dr. Robert G. de Luna, PECE
Tags
Summary
This document is a module on engineering data analysis, covering basic concepts, data collection methods, and sampling techniques. It includes sections on questionnaires, interviews, schedules, observation, rating scales, and more.
Full Transcript
10/15/2023 Engineering Data Analysis Module 1: Basic Concepts, Obtaining Data, and Data Collection and Sampling Techniques...
10/15/2023 Engineering Data Analysis Module 1: Basic Concepts, Obtaining Data, and Data Collection and Sampling Techniques Instrumentation Dr. Robert G. de Luna, PECE Associate Professor Dr. Robert G. de Luna, PECE 2 1 2 Data Collection Questionnaires It is the process of gathering and measuring information on variables It is a research instrument consisting of a series of interest, in an established systematic fashion that enables one to of questions for the purpose of gathering answer stated research questions, test hypotheses, and evaluate information from respondents. outcomes. It is a device for securing answers to questions by using a form which the respondent will fill by Tools for Data Collection himself. Questionnaires Interviews Interview Schedules It is a two-way method which permits an Observation exchange of ideas and information. Rating Scales Engineering Data Analysis by Dr. Robert G. de Luna, PECE 3 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 4 3 4 3 4 10/15/2023 Schedules Rating Scale It is the tool or instrument used to collect data It is term applied to express opinion or judgment regarding some situation, from the respondents while interview is object or character. conducted. It consists of close-ended questions along with a set of categories as It contains questions, statements, and blank options for respondents. spaces/tables for filling up the respondents. It helps gain information on the qualitative and quantitative attributes. Observation It is described as a method to observe and describe the behavior of a subject. It is a way of collecting relevant information and data by observing. Engineering Data Analysis by Dr. Robert G. de Luna, PECE 5 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 6 5 6 5 6 Survey It is a research method used for collecting data from a predefined group of respondents to gain information and insights into various topics of interest. They can have multiple purposes, and researchers can conduct it in Sampling and Instrumentation many ways depending on the methodology chosen and the study's goal. Two Methods to Conduct Survey Census Method (Parametric) Sampling Method (Non-parametric) Engineering Data Analysis by Dr. Robert G. de Luna, PECE 8 Dr. Robert G. de Luna, PECE 7 Dr. Robert G. de Luna, PECE 8 7 8 10/15/2023 Investigator Population Element The person who plans and conducts the statistical investigation It is the individual participant or object on which the measurement is taken. It is the unit of study. It may be a person, but it could also be any object of independently or with the help of others. interest. Respondent Population The person who answers/responds to the set of questions is called. It is the total collection of elements about which we wish to make some inferences. Enumerator Sample Frame The person who collects data by conducting an enquiry or an It is the listing of all population elements from which the sample will be investigation. drawn. Engineering Data Analysis by Dr. Robert G. de Luna, PECE 9 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 10 Dr. Robert G. de Luna, PECE 9 Dr. Robert G. de Luna, PECE 10 9 10 Census Method Why Use Sampling? It is the procedure of systematically calculating, acquiring and recording information about the members of a given population. It deals with the investigation of the entire population. Availability of Lower cost elements Sampling Method Sampling It is the selection of a subset of individuals from provides within a statistical population to estimate Greater speed Greater characteristics of the whole population. accuracy Here a small group is selected as representative of the whole population. Engineering Data Analysis by Dr. Robert G. de Luna, PECE 11 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 12 Dr. Robert G. de Luna, PECE 11 Dr. Robert G. de Luna, PECE 12 11 12 10/15/2023 Accuracy: It refers to how close is a What is a Valid Sample? computed or measured value to the true value. Precision or Reproducibility: It refers to how close is a computed or measured value Accurate Precise to previously computed or measured values. Inaccuracy or Bias: It refers to a systematic deviation from the actual value. Imprecision or Uncertainty: It refers to Accuracy the magnitude of scatter. It refers to how close or far off a given set of measurements are to their true value. A. Inaccurate and Imprecise Precision B. Accurate and Imprecise It refers to how close or dispersed the measurements are to each other. C. Inaccurate and Precise D. Accurate and Precise Engineering Data Analysis by Dr. Robert G. de Luna, PECE 13 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 14 Dr. Robert G. de Luna, PECE 13 Dr. Robert G. de Luna, PECE 14 13 14 Methods of Sampling Methods of Sampling Probability Sampling Element Selection Probability Nonprobability It is the selection of a sample from a population based on the principle Unrestricted Simple Random Convenience of randomization or by chance. It is sampling method in which all members of the population have an Restricted Complex Random Purposive equal chance of participating in the study. Systematic Judgment Cluster Quota Nonprobability Sampling It is sampling method in which not all members of the population have Stratified Snowball an equal chance of participating in the study. Engineering Data Analysis by Dr. Robert G. de Luna, PECE 15 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 16 Dr. Robert G. de Luna, PECE 15 Dr. Robert G. de Luna, PECE 16 15 16 10/15/2023 Probability Sampling Probability Sampling Simple Random Simple Random Sampling It is an entirely random method Systematic of selecting the sample. A sample selected from a Cluster population in such a manner that all members have an equal chance of being Stratified selected. Engineering Data Analysis by Dr. Robert G. de Luna, PECE 17 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 18 Dr. Robert G. de Luna, PECE 17 Dr. Robert G. de Luna, PECE 18 17 18 Probability Sampling Probability Sampling Systematic Sampling Cluster Sampling Sample members from a larger The population is divided into smaller population are selected according to a groups known as clusters then samples random starting point but with a fixed, are randomly selected among these periodic interval. clusters. This interval, called the sampling This method of probability sampling is interval, is calculated by dividing the often used to study large populations, population size by the desired sample particularly those that are widely size. geographically dispersed. Engineering Data Analysis by Dr. Robert G. de Luna, PECE 19 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 20 Dr. Robert G. de Luna, PECE 19 Dr. Robert G. de Luna, PECE 20 19 20 10/15/2023 Probability Sampling Probability Sampling Stratified Random Sampling Simple Random It is a method of sampling that involves dividing a population into smaller groups–called strata. Systematic The groups or strata are organized based on the shared characteristics or Cluster attributes of the members in the group. The process of classifying the Stratified population into groups is called stratification. Engineering Data Analysis by Dr. Robert G. de Luna, PECE 21 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 22 Dr. Robert G. de Luna, PECE 21 Dr. Robert G. de Luna, PECE 22 21 22 Nonprobability Sampling Nonprobability Sampling No need to Convenience generalize Judgment Limited Feasibility objectives Quota Snowball Time Cost Engineering Data Analysis by Dr. Robert G. de Luna, PECE 23 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 24 Dr. Robert G. de Luna, PECE 23 Dr. Robert G. de Luna, PECE 24 23 24 10/15/2023 Nonprobability Sampling Nonprobability Sampling Convenience Sampling Purposive Sampling It is a sampling method where research It is also known as judgmental, data are collected from a conveniently selective, or subjective sampling. available pool of respondents. It is a form of non-probability It is the most used sampling technique sampling in which researchers rely on as it’s incredibly prompt, uncomplicated, their own judgment when choosing and economical. members of the population to participate in their surveys. Engineering Data Analysis by Dr. Robert G. de Luna, PECE 25 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 26 Dr. Robert G. de Luna, PECE 25 Dr. Robert G. de Luna, PECE 26 25 26 Nonprobability Sampling Nonprobability Sampling Purposive Sampling Snowball Sampling Judgment Sampling It is a non-probability sampling It refers to a nonprobability sampling technique where the researcher technique in which a researcher begins selects units to be sampled based on with a small population of known his own existing knowledge, or his individuals and expands the sample by professional judgment. asking those initial participants to Quota Sampling identify others that should participate in It refers to selecting participants that the study. is a non-probabilistic version of stratified sampling. Engineering Data Analysis by Dr. Robert G. de Luna, PECE 27 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 28 Dr. Robert G. de Luna, PECE 27 Dr. Robert G. de Luna, PECE 28 27 28 10/15/2023 Steps in Sampling Design Sample Size What is the target population? Samples should be as large as a researcher can obtain with a What are the parameters of reasonable expenditure of time and energy. interest? The recommended minimum number of subjects are as follows for the following types of studies: What is the sampling frame? Minimum of 50 for a Descriptive Study What is the appropriate sampling Minimum of 100 for a Correlational Study method? Minimum of 30 in each group for Experimental and Causal- Comparative Study What size sample is needed? Engineering Data Analysis by Dr. Robert G. de Luna, PECE 29 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 30 Dr. Robert G. de Luna, PECE 29 Dr. Robert G. de Luna, PECE 30 29 30 Determination of Sample Size Determination of Sample Size Z-scores for the most common confidence intervals are: Define population size. Here’s how the calculations work out for 80% confidence => 1.28 z-score Designate your margin of error. our voice recognition technology example 85% confidence => 1.44 z-score in an office of 500 people, with a 95% 90% confidence => 1.65 z-score Determine your confidence level. confidence level and 5% margin of error: 95% confidence => 1.96 z-score 99% confidence => 2.58 z-score Predict expected variance. Finalize your sample size. Z-scores for the most common confidence intervals are: 80% confidence => 1.28 z-score Sample Size = 217 85% confidence => 1.44 z-score 90% confidence => 1.65 z-score 95% confidence => 1.96 z-score 99% confidence => 2.58 z-score Engineering Data Analysis by Dr. Robert G. de Luna, PECE 31 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 32 Dr. Robert G. de Luna, PECE 31 Dr. Robert G. de Luna, PECE 32 31 32 10/15/2023 When to Use Larger Sample Sizes? Population variance Number of Desired Data subgroups precision Confidence Small error level range Engineering Data Analysis by Dr. Robert G. de Luna, PECE 33 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 34 Dr. Robert G. de Luna, PECE 33 Dr. Robert G. de Luna, PECE 34 33 34 Data One variable per These are factual information (such as DATA: the answers to questions or column measurements or statistics) used as a measurements from the experiment. basis for reasoning, discussion, or calculation. VARIABLE: measurement which varies between subjects e.g. height or gender. Variables These are any characteristics, number, or quantity that can be measured or counted. One row per subject It may also be called a data item. Engineering Data Analysis by Dr. Robert G. de Luna, PECE 35 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 36 Dr. Robert G. de Luna, PECE 35 Dr. Robert G. de Luna, PECE 36 35 36 10/15/2023 Two General Types of Data Quantitative Data Data that can be counted or measured in numerical values. Qualitative Data Data that is describing qualities or characteristics. Engineering Data Analysis by Dr. Robert G. de Luna, PECE 37 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 38 Dr. Robert G. de Luna, PECE 37 Dr. Robert G. de Luna, PECE 38 37 38 Two General Types of Qualitative Data Two General Types of Quantitative Data Nominal Data Continuous Data It is also called the nominal scale. Data that can be measured on an infinite scale. Data is classified without a natural order or rank. Data that can take any value. Examples of nominal data are religion, gender, country, etc.. Examples of continuous data are height, weight, temperature, length, etc.. Discrete Data Ordinal Data Data that can be measured on a finite scale. It is also called the Ordinal scale. Data that can take certain value. Data is classified with a natural order or rank. Examples of discrete data are number of students in a class, number of Examples of ordinal data are educational level, satisfaction level, etc.. patients in the hospital, etc.. Engineering Data Analysis by Dr. Robert G. de Luna, PECE 39 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 40 Dr. Robert G. de Luna, PECE 39 Dr. Robert G. de Luna, PECE 40 39 40 10/15/2023 What data types relate to following questions? Q1: What is your favourite subject? Maths English Science Art French Nominal Data Statistics Q2: Gender: Male Female Nominal Data Q3: I consider myself to be good at mathematics: Ordinal Data Strongly Disagree Disagree Not Sure Agree Strongly Agree Q4: Score in a Math Exam: Score between 0% and 100% Discrete Data Engineering Data Analysis by Dr. Robert G. de Luna, PECE 41 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 42 Dr. Robert G. de Luna, PECE 41 Dr. Robert G. de Luna, PECE 42 41 42 Statistics Two Major Types of Statistics It is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. 1. Descriptive Statistics It is used to describe and draw conclusions about the data. Set of techniques and methods used in data analysis to summarize, organize, and describe the main features or characteristics of a Two Major Types of Statistics dataset. Descriptive Statistics Help researchers and analysts gain insights into the central tendencies, variability, and distribution of their data without making Inferential Statistics inferences or drawing conclusions about populations. Engineering Data Analysis by Dr. Robert G. de Luna, PECE 43 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 44 Dr. Robert G. de Luna, PECE 43 Dr. Robert G. de Luna, PECE 44 43 44 10/15/2023 Two Major Types of Statistics Descriptive Statistics Calculation 2. Inferential Statistics Measures of Central Tendency It is used to draw inferences or conclusions about the characteristics of These are statistical values or indicators that describe the central or the samples or population. typical value of a dataset. It allow researchers to make predictions, test hypotheses, and draw generalizations about a larger population from which the sample was They provide a way to summarize the "center" of a distribution of data drawn. points and help us understand where most of the data values are Statistical Inference located. It is the process through which inferences about a population are made based on certain statistics calculated from a sample of data drawn from that population. Engineering Data Analysis by Dr. Robert G. de Luna, PECE 45 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 46 Dr. Robert G. de Luna, PECE 45 Dr. Robert G. de Luna, PECE 46 45 46 Descriptive Statistics Calculation Descriptive Statistics Calculation Measures of Central Tendency Measures of Variability Mean: The arithmetic average of a dataset, calculated by summing all These also known as measures of dispersion or spread. values and dividing by the number of observations. These are statistical values that provide information about the extent to Median: The middle value of a dataset when it is ordered from smallest which data points in a dataset deviate from the central tendency (e.g., to largest. It is less affected by extreme outliers than the mean. mean, median). Mode: The most frequently occurring value(s) in a dataset. They quantify the degree of spread, scatter, or dispersion of data points within a distribution, helping to understand how diverse or consistent the data is. Engineering Data Analysis by Dr. Robert G. de Luna, PECE 47 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 48 Dr. Robert G. de Luna, PECE 47 Dr. Robert G. de Luna, PECE 48 47 48 10/15/2023 Descriptive Statistics Calculation Descriptive Statistics Calculation Measures of Variability Measures of Distribution Shape Range: The difference between the maximum and minimum values in a dataset. These are statistical tools used to describe the form or pattern of a Variance: A measure of how data points vary from the mean, calculated by dataset's distribution, specifically focusing on how data points are averaging the squared differences between each data point and the mean. spread or clustered across different values within the distribution. Standard Deviation: The square root of the variance, which provides a These measures help researchers and analysts understand the shape, measure of the average distance between each data point and the mean. symmetry, skewness, and tailedness of a dataset's distribution, which Interquartile Range (IQR): The range between the first quartile (Q1) and the can provide insights into the underlying data patterns. third quartile (Q3) of the data, which represents the middle 50% of the data and is less sensitive to outliers than the range. Engineering Data Analysis by Dr. Robert G. de Luna, PECE 49 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 50 Dr. Robert G. de Luna, PECE 49 Dr. Robert G. de Luna, PECE 50 49 50 Descriptive Statistics Calculation Descriptive Statistics Calculation Measures of Distribution Shape Frequency Distributions Skewness: A measure of the asymmetry of the data distribution. These are tabular or graphical representation of data that shows the Positive skew indicates a longer tail on the right side, while negative number of times each value or category occurs in a dataset. skew indicates a longer tail on the left side. It summarizes the data by displaying how often each unique data point Kurtosis: A measure of the "tailedness" of the data distribution. High or category appears, providing a way to understand the distribution kurtosis indicates a more peaked distribution, while low kurtosis and patterns within the dataset. indicates a flatter distribution. Frequency distributions are commonly used in statistics and data analysis to gain insights into data sets, especially when dealing with categorical or discrete data. Engineering Data Analysis by Dr. Robert G. de Luna, PECE 51 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 52 Dr. Robert G. de Luna, PECE 51 Dr. Robert G. de Luna, PECE 52 51 52 10/15/2023 Descriptive Statistics Calculation Inferential Statistics Calculation Hypotheses Testing Frequency Distributions It involves comparing sample statistics to population parameters to Histograms: A graphical representation of data that displays the determine if observed differences or relationships are statistically significant. frequency of values within specified intervals or bins. Z- or T- tests are used to determine whether there was any significant difference between the means of two random samples. Frequency Tables: A tabular summary of data that shows the count or frequency of values in various categories or intervals. Analysis of Variance (ANOVA) It involve partitioning the total variation in the data into different components, including the variation between groups and within groups. This is used to test if there are significant differences between group means. Calculations include the F-statistic and associated p-value. Engineering Data Analysis by Dr. Robert G. de Luna, PECE 53 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 54 Dr. Robert G. de Luna, PECE 53 Dr. Robert G. de Luna, PECE 54 53 54 Inferential Statistics Calculation Chi-Square Test These are used for categorical data analysis. The calculations involve comparing observed frequency and expected frequency in contingency tables to determine if there is a significant association between variables. Descriptive Statistics Regression Analysis It involve estimating the coefficients of a regression model, including the intercept and slopes. The most common method for estimating coefficients in linear regression is the least squares method, which minimizes the sum of squared residuals (the vertical distances between data points and the regression line). Engineering Data Analysis by Dr. Robert G. de Luna, PECE 55 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 56 Dr. Robert G. de Luna, PECE 55 Dr. Robert G. de Luna, PECE 56 55 56 10/15/2023 Mean Median Sample No. Age Sample No. Age Lowest to Highest It is the average of a data set. 1 17 It is the value separating the higher half from 1 17 17 2 22 2 22 18 μ is the average of the population. the lower half of a data sample, a population, 3 18 18 3 18 4 21 or a probability distribution. 4 21 18 𝑥̅ is the average of the sample. 5 18 5 18 18 n For a dataset, it may be thought of as "the 6 22 19 N 6 22 X middle" value. Xi 7 18 7 18 21 8 22 22 i 8 22 9 19 22 X i 1 X ODD x n 1 i 1 9 19 10 18 22 10 18 N n 2 xn xn n 10 n 10 1 Mean 19.5 X EVEN Mean 19.5 2 2 Median 18.5 2 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 57 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 58 Dr. Robert G. de Luna, PECE 57 Dr. Robert G. de Luna, PECE 58 57 58 Mode Range It is the value that appears most frequently in a data set. It is the difference between the largest and smallest values. Sample No. Age Lowest to Highest Age Groupings Occurrence It is the result of subtracting the sample maximum and minimum. 1 17 17 17 1 Sample No. Age Lowest to Highest Age Groupings Occurrence Maximum Minimum 2 22 18 18 4 1 17 17 17 1 22 17 3 18 18 19 1 2 22 18 18 4 4 21 18 21 1 3 18 18 19 1 5 18 18 22 3 4 21 18 21 1 5 18 18 22 3 6 7 22 18 19 21 RANGE = Max(X) - Min(X) 6 22 19 7 18 21 8 22 22 8 22 22 9 19 22 9 19 22 10 18 22 10 18 22 n 10 n 10 Mean 19.5 Mean 19.5 Median 18.5 Median 18.5 Mode 18 Mode 18 Range 5 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 59 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 60 Dr. Robert G. de Luna, PECE 59 Dr. Robert G. de Luna, PECE 60 59 60 10/15/2023 Variance A higher variance indicates that data points are more spread out from the mean, implying greater variability in the dataset. It is a measure of how far a set of numbers is spread out from their average value. A lower variance suggests that data points are closer to the mean, indicating The higher the variance, the greater the variability or spread of scores. less variability. N Sample No. Age Lowest to Highest Age Groupings Occurrence Maximum Minimum (X 1 17 17 17 1 22 17 i ) 2 2 3 22 18 18 18 18 19 4 1 Variance is sensitive to outliers or extreme values in the dataset, as the 4 21 18 21 1 squared differences can be substantial. 2 i 1 5 6 18 22 18 19 22 3 N 7 8 18 22 21 22 n 9 10 19 18 22 22 Variance is expressed in squared units of the original data, which can be (X i X) 2 n Mean 10 19.5 less interpretable. To have a more interpretable measure, the standard deviation is often used, as it has the same units as the original data. s2 i 1 Median 18.5 Mode 18 n 1 Range 5 Variance 4.05556 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 61 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 62 Dr. Robert G. de Luna, PECE 61 Dr. Robert G. de Luna, PECE 62 61 62 Standard Deviation A higher standard deviation indicates greater variability or spread of data points from the mean. It is a measure of the dispersion of a dataset relative to its mean and is calculated as the square root of the variance. Sample No. Age Lowest to Highest Age Groupings Occurrence Maximum Minimum A lower standard deviation suggests that data points are closer to the N mean, indicating less variability. (X 1 17 17 17 1 22 17 i ) 2 2 3 4 22 18 21 18 18 18 18 19 21 4 1 1 i 1 5 6 18 22 18 19 22 3 Standard deviation is a measure of spread that is less sensitive to N 7 8 18 22 21 22 outliers than variance, as it does not involve squaring the differences. 9 19 22 10 18 22 n ( X X )2 Standard deviation is often used in conjunction with the mean to n 10 Mean 19.5 i Median 18.5 describe the central tendency and spread of data in a dataset. s i 1 Mode 18 Range 5 n 1 Variance 4.05556 Standard Deviation 2.01384 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 63 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 64 Dr. Robert G. de Luna, PECE 63 Dr. Robert G. de Luna, PECE 64 63 64 10/15/2023 Quartile A survey was given to ECE students Group A Group B Sample No. Age Lowest to Highest Age Groupings Occurrence Maximum Minimum 1 17 17 17 1 22 17 to find out how many hours per (Female) (Male) It is a type of quantile 2 3 22 18 18 18 18 19 4 1 week they would listen to a student- which divides the 4 21 18 21 1 run radio station. The sample 5 18 18 22 3 15 30 number of data points 6 22 19 responses were separated by 7 18 21 25 15 into four parts, or 8 9 22 19 22 22 gender. Determine the mean, range, 10 18 22 12 21 quarters, of more-or-less variance, and standard deviation of equal size. n 10 7 12 Mean 19.5 each group. Median Mode 18.5 18 3 26 The data must be Range Variance 5 4.05556 32 20 ordered from smallest to Standard Deviation 2.01384 First Quartile 18.00 17 5 largest to compute Second Quartile Third Quartile 18.50 22.00 16 24 quartiles. IQR 4.00 9 18 24 10 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 65 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 66 Dr. Robert G. de Luna, PECE 65 66 65 66 Coefficient of Determination It is a statistical measurement that examines how differences in one variable can be explained by the difference in a second variable, when predicting the outcome of a given event. X Y Pearson Coefficient of Correlation 1 4 3 6 5 10 5 12 1 13 2 3 4 3 6 8 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 67 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 68 67 Dr. Robert G. de Luna, PECE 68 67 68 10/15/2023 Y Y Find the value of the correlation coefficient from the following table: Y R = 1.00 R = 0.18 R = 0.85 X X X Correlation Coefficient Value Relationship Y -0.3 to +0.3 Weak -0.5 to -0.3 or 0.3 to 0.5 Moderate R = -0.92 -0.9 to -0.5 or 0.5 to 0.9 Strong X -1.0 to -0.9 or 0.9 to 1.0 Very Strong Engineering Data Analysis by Dr. Robert G. de Luna, PECE 69 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 70 Dr. Robert G. de Luna, PECE 69 Dr. Robert G. de Luna, PECE 70 69 70 Find the value of the correlation coefficient from the following table: Find the value of the correlation coefficient from the following table: r = 0.52981 Strong Positive Correlation Engineering Data Analysis by Dr. Robert G. de Luna, PECE 71 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 72 Dr. Robert G. de Luna, PECE 71 Dr. Robert G. de Luna, PECE 72 71 72 10/15/2023 Inferential Statistics It can be used to prove or disprove theories, determine associations between variables, and determine if findings are significant and whether we can generalize from our sample to the entire population. Inferential Statistics Inferential statistics provide: Test for Difference To test whether a significant difference exists between groups Tests for Relationship To test whether a significant relationship exist between a dependent variable (Y) and independent variable (X) Engineering Data Analysis by Dr. Robert G. de Luna, PECE 73 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 74 Dr. Robert G. de Luna, PECE 73 Dr. Robert G. de Luna, PECE 74 73 74 Hypothesis Testing Key Steps in Hypothesis Testing It is a fundamental and widely used technique in statistics to make A. Formulate Hypotheses decisions, draw conclusions, or assess the validity of statements or Null Hypothesis (H0): This is the default or status quo hypothesis. It claims about a population based on sample data. typically represents a statement of no effect, no difference, or no It is a structured and systematic approach for evaluating hypotheses or relationship in the population. research questions in a scientific and statistical manner. Alternative Hypothesis (Ha): This is the hypothesis that the researcher is trying to support or demonstrate. It often represents the research question or the statement that there is an effect, difference, or relationship in the population. Engineering Data Analysis by Dr. Robert G. de Luna, PECE 75 Engineering Data Analysis by Dr. Robert G. de Luna, PECE 76 Dr. Robert G. de Luna, PECE 75 Dr. Robert G. de Luna, PECE 76 75 76