W1_Statistical Evaluation of Data PDF
Document Details
Tags
Related
Summary
This document, likely lecture notes, introduces chemometrics and statistical evaluation of data. It covers concepts and calculations related to different types of errors (gross, systematic, random), measures of central tendency (mean), measures of spread (variance, standard deviation), standard error, relative standard deviation and provides examples.
Full Transcript
BSEN40860 Chemometrics 1 Statistical Evaluation of Data Module overview Lecturer: Junli Xu SOURCES OF ERROR Gross error: for example, by an instrumental breakdown such as a power failure, a lamp failing, severe contamination of the specimen or a si...
BSEN40860 Chemometrics 1 Statistical Evaluation of Data Module overview Lecturer: Junli Xu SOURCES OF ERROR Gross error: for example, by an instrumental breakdown such as a power failure, a lamp failing, severe contamination of the specimen or a simple mislabeling of a specimen (in which the bottle’s contents are not as recorded on the label). The presence of gross errors renders an experiment useless. The most easily applied remedy is to repeat the experiment. However, it can be quite difficult to detect these errors, especially if no replicate measurements have been made. Systematic error arises from imperfections in an experimental procedure, leading to a bias in the data, i.e., the errors all lie in the same direction for all measurements (the values are all too high or all too low). These errors can arise due to a poorly calibrated instrument or by the incorrect use of volumetric glassware. The errors that are generated in this way can be either constant or proportional. Random error (commonly referred to as noise) produces results that are spread about the average value. The greater the degree of randomness, the larger the spread. Random errors are typically ones that we have no control over, such as electrical noise in a transducer. These errors affect the precision or reproducibility of the experimental results. The goal is to have small random errors that lead to good precision in our measurements. SOME COMMON TERMS Accuracy: An experiment that has small systematic error is said to be accurate, i.e., the measurements obtained are close to the true values. Precision: An experiment that has small random errors is said to be precise, i.e., the measurements have a small spread of values. Within-run: This refers to a set of measurements made in succession in the same laboratory using the same equipment. Between-run: This refers to a set of measurements made at different times, possibly in different laboratories and under different circumstances. Repeatability: This is a measure of within-run precision. Reproducibility: This is a measure of between-run precision. The arithmetic mean is a measure of the average or central tendency of a set of data and is usually denoted by the symbol. The value for the mean is calculated by summing the data and then dividing this sum by the number of values (n). σ 𝑥𝑖 𝑥= ҧ 𝑛 The variance in the data, a measure of the spread of a set of data, is related to the precision of the data. For example, the larger the variance, the larger the spread of data and the lower the precision of the data. 2 σ 𝑥𝑖 −𝑥ҧ 2 𝑠 = 𝑛 The standard deviation of a set of data, usually given the symbol s, is the square root of the variance. The difference between standard deviation and variance is that the standard deviation has the same units as the data, whereas the variance is in units squared. For example, if the measured unit for a collection of data is in meters (m), then the units for the standard deviation is m and the unit for the variance is 𝑚2. σ 𝒙𝒊 −ഥ 𝒙 𝟐 S= 𝒏 When making some analytical measurements of a quantity (x), for example, the concentration of lead in drinking water, all the results obtained will contain some random errors. The standard error of the mean, which is a measure of the error in the final answer. 𝑆 𝑆𝑀 = 𝑛 𝑺 ഥ± Present your result as 𝒙 𝒏 The relative standard deviation (or coefficient of variation), a dimensionless quantity (often expressed as a percentage), is a measure of the relative error, or noise in some data. 𝒔 RSD= 𝒙ഥ A plot of the normal distribution showing that approximately 68% of the data lie within ± 1 𝑠𝑡𝑑, 95% lie within ±2 𝑠𝑡𝑑, and 99.7% lie within ±3 𝑠𝑡𝑑. The confidence interval is the range within which we can reasonably assume a true value lies. The extreme values of this range are called the confidence limits. The term “confidence” implies that we can assert a result with a given degree of confidence, i.e., a certain probability. SOURCES OF ERROR One can visually estimate that the results from two methods produce similar results, but without the use of a statistical test, a judgment on this approach is purely empirical. We could use the empirical statement “there is no difference between the two methods,” but this conveys no quantification of the results. If we employ a significance test, we can report that “there is no significant difference between the two methods.” Null hypothesis: The term “null” is used to imply that there is no difference between the observed and known values other than that which can be attributed to random variation. The null hypothesis is rejected if the probability of the observed difference’s occurring by chance is less than 1 in 20 (i.e., 0.05 or 5%). In such a case, the difference is said to be significant at the 0.05 (or 5%) level. This also means there will be a 1 in 20 chance of making an incorrect conclusion from the test results. Significance testing falls into two main sections: testing for accuracy (using the student t-test) and testing for precision (using the F-test). THE F-TEST FOR COMPARISON OF VARIANCE (PRECISION) The F-test is a very simple ratio of two sample variances (the squared standard deviations) 𝑺𝟏 𝟐 𝑭= 𝑺𝟐 𝟐 The two-tailed version tests against the alternative that the variances are not equal. The one-tailed version only tests in one direction, that is the variance from the first population is either greater than or less than (but not both) the second population variance. Number of freedom=n-1 If the calculated F value is smaller than the critical value of F, we can see that the null hypothesis is accepted and that there is no significant difference in the precision of the two sample sets. Brightspace Exercise To test that the precision of the two routes is the same, we use the F test, so 𝟏.𝟓𝟖 𝑭= =1.26 𝟏.𝟐𝟓 As we are testing for a significant difference in the precision of the two routes, the two- tailed test value for F is required. In this case, at the 95% significance level, for 5 degrees of freedom for both the numerator and denominator, the critical value of F is 7.146. As the calculated value from the data is smaller than the critical value of F, we can see that the null hypothesis is accepted and that there is no significant difference in the precision of the two synthetic routes. THE STUDENT T-TEST This test is employed to estimate whether an experimental mean, differs significantly from the true value of the mean, μ. ഥ−µ 𝒙 𝒕= 𝑺/ 𝒏 If the calculated value of t (without regard to sign) exceeds a certain critical value (defined by the required confidence limit and the number of degrees of freedom) then the null hypothesis is rejected. THE STUDENT T-TEST There are three major uses for the t-test: 1. Comparison of a sample mean with a certified value 2. Comparison of the means from two samples 3. Comparison of the means of two methods with different samples (paired T-test) Comparison of a sample mean with a certified value A common situation is one in which we wish to test the accuracy of an analytical method by comparing the results obtained from it with the accepted or true value of an available reference sample. The utilization of the test is illustrated in the following example: (obtained from 10 replicate test objects) 𝑥=85 ҧ (obtained from 10 replicate test objects) s = 0.6 μ = 83 (the accepted value or true value of the reference material) ഥ 𝒙−µ 𝟖𝟓−𝟖𝟑 𝒕 = 𝑺/ 𝒏 = 𝟎.𝟔/ 𝟏𝟎 =10.5 Comparing the calculated value of t with the critical value of t, we observe that the null hypothesis is rejected and there is a significant difference between the experimentally determined mean compared with the reference result. Comparison of the means from two samples However, as there are two standard deviations, we must first calculate the pooled estimate of the standard deviation, which is based on their individual standard deviations. 𝟐 𝒏𝟏 −𝟏 𝑺𝟏 𝟐 + 𝒏𝟐 −𝟏 𝑺𝟐 𝟐 𝑺 = 𝒏𝟏 +𝒏𝟐 −𝟐 𝒙𝟏 − 𝒙𝟐 𝒕= 𝟏 𝟏 𝑺(𝒏 + 𝒏 )𝟏/𝟐 𝟏 𝟐 Comparison of the means of two methods with different samples (paired T-test) The type of test that is used for this analysis of this kind of data is known as the paired t-test. As the name implies, test objects or specimens are treated in pairs for the two methods under observation. Each specimen or test object is analyzed twice, once by each method. Instead of calculating the mean of method one and method two, we need to calculate the differences between method one and method two for each sample, and use the resulting data to calculate the mean of the differences and the standard deviation of these differences, sd. The use of the paired t-test is illustrated using the data shown below as an example. 𝒙𝒅 𝒕= 𝑺𝒅/ 𝒏 ANALYSIS OF VARIANCE Analysis of variance (ANOVA) is a useful technique for comparing more than two methods or treatments. The variation in the sample responses (treatments) is used to decide whether the sample treatment effect is significant. The null hypothesis in this case is that the sample means (treatment means) are not different and that they are from the same population of sample means (treatments). Thus, the variance in the data can be assessed in two ways, namely the between-sample means and the within-sample means For example, one may wish to compare the results from four laboratories, or perhaps to evaluate three different methods performed in the same laboratory. With inter-laboratory data, there is clearly variation between the laboratories (between sample/treatment means) and within the laboratory samples (treatment means). ANOVA is used in practice to separate the between-laboratories variation (the treatment variation) from the random within-sample variation. A chemist wishes to evaluate four different extraction procedures that can be used to determine an organic compound in river water (the quantitative determination is obtained using ultraviolet [UV] absorbance spectroscopy). To achieve this goal, the analyst will prepare a test solution of the organic compound in river water and will perform each of the four different extraction procedures in replicate. In this case, there are three replicates for each extraction procedure. The quantitative data is shown below. From the data we can see that the mean values obtained for each extraction procedure are different; however, we have not yet included an estimate of the effect of random error that may cause variation between the sample means. ANOVA is used to test whether the differences between the extraction procedures are simply due to random errors. Step 1: Calculate Within-Sample Mean Square; Step 2: Calculate Between-Sample Mean Square; Step 3: Compute F-Statistic; Step 4: Compute p-value. This is known as the mean square because it is a sum of the squared terms (SS) divided by the number of degrees of freedom. This estimate has 8 degrees of freedom (independent values that can vary in the dataset); each sample estimate (treatment) has 2 degrees of freedom and there are four samples (treatments). One is then able to calculate the sum of squared terms by multiplying the mean square (MS) by the number of degrees of freedom. The between-treatment variation is calculated in the same manner as the within treatment variation. Step 1: Calculate Within-Sample Mean Square; Step 2: Calculate Between-Sample Mean Square; Step 3: Compute F-Statistic; Step 4: Compute p-value. Step 1: Calculate Within-Sample Mean Square; Step 2: Calculate Between-Sample Mean Square; Step 3: Compute F-Statistic; Step 4: Compute p-value. The p-value represents the probability of observing the test statistic (in this case, the F-statistic) under the null hypothesis. A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, leading to the rejection of the null hypothesis. This suggests that there are significant differences between the group means. Conversely, a large p-value suggests insufficient evidence to reject the null hypothesis, implying that the group means may not differ significantly. o High F-Statistic: A high value of the F-statistic indicates a larger ratio of variance between groups compared to the variance within groups, which typically corresponds to a small p-value, suggesting significant differences among group means. o Low F-Statistic: A low F-statistic suggests that the group means are similar, leading to a larger p-value, indicating insufficient evidence to reject the null hypothesis. Brightspace Exercise Arsenic content of coal taken from different parts of a ship’s hold, where there are five sampling points and four aliquots or specimens taken at each point We wish to determine whether there is a statistically significant difference in the sampling error vs. the error of the analytical method.