CHM121 Topic II: Evaluation of Analytical Data PDF

Summary

This document provides an overview of the evaluation and analysis of analytical data in chemistry, specifically focused on measures of central tendency and precision. The material covers concepts like mean, median, mode, standard deviation, relative standard deviation, and the roles of replicates. Topics include handling outliers, systematic and random errors, and their role in experimental data.

Full Transcript

CHM121 Topic II: Evaluation of Analytical Data “It is impossible to perform a chemical analysis that is totally free of errors, or uncertainties. All can hope is to minimize these errors and to estimate their size with acceptable accuracy.” 2nd Semeste...

CHM121 Topic II: Evaluation of Analytical Data “It is impossible to perform a chemical analysis that is totally free of errors, or uncertainties. All can hope is to minimize these errors and to estimate their size with acceptable accuracy.” 2nd Semester 2023-2024 Errors in Chemical Analysis  Measurements invariably involve errors and uncertainties which combine to produce a scatter of results (experimenter + faulty calibrations or standardization)  Measurement uncertainties can never be completely eliminated, so the true value for any quantity is always unknown – estimate of the “true value”.  Magnitude of error can be evaluated – define limits within which the true value of a measured quantity lies with a given level of probability  Data of unknown quality are worthless – to address, estimate the reliability of the data Assessing reliability of data 1. Experiments designed to reveal the presence of errors can be performed 2. Standards of known composition can be analyzed, and the results can be compared with the known composition 3. Equipment calibration - maintains accuracy, standardization and repeatability in measurements, assuring reliable benchmarks and results 4. Statistical test Measures of Central Tendency  Chemists usually carry three to five replicates (portions) of a sample through an analytical procedure – to improve the reliability and to obtain information about the variability of results.  Individual results from a set of measurements are seldom the same, so a “best” estimate is considered as the central value for the set.  Ways to justify doing replicates: a. central value of a set should be more reliable than any of the individual results. b. analysis of the variation in the data allows us to estimate the uncertainty associated with the central value 1. Mean, x  arithmetic mean or average  the quantity obtained by dividing the sum of replicate measurements (xi) by the number of measurements (N) in the set. Mathematically speaking: = ( x 1 + x 2 + x 3 + x4 + … + xN ) N 2. Median, M  middle value of a sample of results arranged in order of increasing/decreasing magnitude.  odd number of results  take the middle value  even number of results  take the mean of the two middle values Example 1 Calculate the mean and the median for the following data: 20.3, 19.4, 19.8, 20.1, 19.6 , 19.5 Arranged as: 19.4, 19.5, 19.6, 19.8, 20.1, 20.3 Mean = 19.8 Median = 19.7 Ideally, the mean and the median are identical - unidentical when the number of measurements in the set is small. Example 1 The median is used advantageously when a set of data contains an outlier. An outlier can have a significant effect on the mean but lesser on the median.  20.5, 20.8, 20.6, 20.4, 21.8 => mean = 20.8 median = 20.6 without outlier => mean = 20.6  20.5, 20.8, 20.6, 20.4, 18.8 => mean = 20.2 median = 20.5 without outlier => mean = 20.6 3. Mode  the value that occurs most frequently in a set of determinations. Example: 20.5, 20.8, 20.6, 20.4, 20.8, 20.3, 20.8 Precision  describes the reproducibility of the measurements.  tells how close the results are, provided that they are obtained in exactly the same way.  deals with repeatability (within-runs) and reproducibility (between- runs).  three terms are widely used to describe the precision of a set of replicate data:  standard deviation  Variance  coefficient of variation. All these terms are functions of how much an individual xi differs from the mean which is defined as: Standard deviation from the mean = di  / xi  xt / Accuracy  indicates the closeness of the measurements to its true value or accepted value and is expressed by the error (or simply the proximity to the true value).  is expressed in terms of: a. absolute error, E = xi – xt b. relative error, Er xi  xt Er  x100% (in terms of %) xt xi  xt (in terms of ppt) Er  x1000 xt  Signs are kept to show if the experimental result is smaller or larger than the accepted value Illustration   Tell whether:      (high or low) accuracy  X   X  (high or low) precision      (X represents the true low accuracy low accuracy value low precision high precision () represents the    replicates)   X     X      high accuracy high accuracy low precision high precision Example 2 Determine the relative error (in % and ppt) and the absolute error for the mean in ex. 1 given that the true value is 20.0 20.3, 19.4, 19.8, 20.1, 19.6 , 19.5 = 19.8 Absolute error, E = -0.2 19.8 – 20.0= -0.2 Relative error, Er = -1% -0.2/20.0 x100 = -1% Er = -10ppt -0.2/20.0 x 1000 = -10 ppt Types of Errors in Experimental Data 1. Random / Indeterminate Error  Unpredictably high or low. Causes data to be scattered more or less systematically around a mean value. Ex. T or P changes  reflected by the precision. 2. Systematic / Determinate Error  Causes the mean of a set of data to differ from the accepted value. Causes the results in a series of replicate measurements to be all high or low.  Have a definite value, an assignable cause, and are of the same magnitude for replicate measurements. Ex. Improper shielding and grounding of instrum. or error in the calibration or prep of standards). Reflected by the accuracy 3. Gross Error  occur only occasionally, are often large and may cause a result to be either high or low. Leads to outliers, results that obviously differ significantly from the rest of the data of replicate measurements. Ex. Entered wrong value into calculation or misread balance. Sources of Systematic Errors a. Instrumental error  caused by imperfection of measuring devices and instabilities in their power supplies.  glasswares used at temperatures that differ from their calibration temperature  distortion of container walls  errors in the original calibration  contaminants on the inner surface of the containers Sources of Systematic Errors b. Methodic error  arises from non-ideal chemical or physical behavior of analytical systems.  due to slowness of some reactions  incompleteness of a reaction  instability of some species  nonspecificity of the reagents  possible occurrence of side reactions (interferences) Sources of Systematic Errors c. Personal error  results from the carelessness, inattention or personal limitations of the experimenter. estimating the level of the liquid between two scale divisions the color of the solution at the end point in a titration o Persons who make measurements must guard against personal bias to preserve the integrity of the collected data. o Of the three types of systematic errors encountered in a chemical analysis, methodic errors are usually the most difficult to identify and correct. Detection of Systematic Error: a. systematic instrument error usually found and corrected by calibration. periodic calibration of the equipment is always desirable because the response of most instruments changes with time as a result of wear, corrosion and mistreatment. b. systematic personal error can be minimized by and self-discipline. a good habit is to check the instrument readings, notebook entries, and calculations systematically. c. systematic methodic error analytical method has its biases and difficult to detect. Detection of Systematic Error:  One or more of the following steps recognize and adjust for a systematic error in an analytical method: a. analysis of standard samples  the analysis of the standard reference materials, SRM (materials that contain one or more analytes with exactly known concentration levels.)  standrd material can be purachased or sometimes prepared by synthesis but unfortunately, this often impossible or so difficult and time consuming that this approach is not practical. Standard Reference Materials (SRM)  SRM can be purchased from a number of governmental and industrial sources (e.g. National Institute of Standards and Technology, NIST which offers over 900 SRM)  Concentration of the SRM has been determined in one of the three ways:  through analysis by previously validated reference method,  through analysis by two or more independent, reliable measurement methods,  through analysis by a network of cooperating laboratories, technically competent and throughly knowledgeable with the material being tested. Detection of Systematic Error: a. independent analysis a second independent and reliable analytical method to be used in parallel with the method being evaluated.  should differ as much as possible from the method used.  This minimizes the possibility that some common factor in the sample has the same effect on both methods. Detection of Systematic Error: b. blank determination  useful for detecting certain types of constant errors.  all steps of the analysis are performed in the absence of the sample.  the results from the blank are then applied as a correction to the sample measurements.  this reveals errors due to interfering contaminants from the reagents and vessels employed in the analysis.  this also allow the analyst to correct titration data for the volume of reagent needed to cause an indicator to change color at the end-point. Detection of Systematic Error: c. variation in sample size  can detect constant errors (as the size of the measurement increases, the effect of a constant error decreases). Effect of Systematic Errors on Results a. Constant errors  Magnitude is essentially constant regardless of the size of the quantity measured  absolute error is constant with sample size while relative error varies with sample size changes  becomes more serious when the sample size decreases b. Proportional errors  Dependent on the sample size of the sample  absolute error varies with sample size while relative error is unaffected with sample size change  Due to the presence of interfering contaminants Random Errors  arisewhen a system of measurement is extended to its maximum sensitivity. This type of error is caused by many uncontrollable variables that are an inevitable part of every physical or chemical measurement.  the accumulated effect of the individual indeterminate uncertainties, however, causes replicate measurements to fluctuate randomly around the mean of the set. Sources of Random Errors Sources of Random Errors The frequency distribution of some results or measurements can be plotted into different (b) Frequency distribution for forms of graphs. measurements containing 10 random uncertainties (a) Frequency distribution for measurements (c) Frequency distribution for measurements containing 4 random uncertainties containing a very large number of random uncertainties A histogram showing distribution of the 50 results in calibrating a 10-mL pipet A histogram showing distribution of the 50 results in calibrating a 10-mL pipet Figure 6-3 (A) A histogram showing distribution of the 50 results in calibrating a 10-mL pipet. (B) A Gaussian curve for data having the same mean and standard deviation as the data in the histogram. Statistical Treatment of Random Errors  sample – a finite number of experimental observations; a tiny fraction of infinite number of observations.  population or universe – theoretical infinite number of data.  population mean, μ – true mean of the population; in the absence of any systematic error, this is also the true value for the measured quantity. Statistical Treatment of Random Errors  population mean, μ – true mean of the population; in the absence of any systematic error, this is also the true value for the measured quantity. n x i  i 1 where N =  N Statistical Treatment of Random Errors  sample mean, – the mean of a limited sample drawn from the population of the data. n x i 1 i where N is finite x N Measures of Precision  population standard deviation, σ – a measure of the precision of a population of data and is mathematically given by: n  x i   2  i 1 N Measures of Precision  sample standard deviation, s – a measure of the precision of a sample of data and is mathematically given by:  x  n 2 i x i 1 s N 1 Measures of Precision  standard deviation of the mean, S or sm s S = ----- N Other ways of expressing precision:  Variance, s2 – simply the square of the standard deviation.  x  n 2 i x i 1 s  2 N 1 Other ways of expressing precision:  Relative standard deviation, RSD, and coefficient of variation, CV s RSD = x 1000 ppt x s CV = x 100% x Other ways of expressing precision: 3. spread or range, w  the difference between the largest value and the smallest in the set of data. w = highest value – lowest value Example 3 The following results were obtained in the replicate determination of the lead content of a blood sample: 0.752, 0.756, 0.752, 0.751, and 0.760 ppm Pb. Calculate the mean, standard deviation, standard deviation of the mean and relative standard deviation. Example 3 The following results were obtained in the replicate determination of the lead content of a blood sample: 0.752, 0.756, 0.752, 0.751, and 0.760 ppm Pb. Mean = 0.754 Standard deviation = 0.004 Standard deviation of the mean = 0.002 Relative standard deviation (RSD) = 4.996 ppt Reliability of s as a Measure of Precision  Most of the statistical tests described are based upon sample standard deviations, and the probability of correctness of the results of these tests improves as the reliability of s becomes greater. Uncertainty in the calculated value of s decreases as N increases. When N is greater than 20, s and σ can assumed to be identical for all practical purposes. Pooling Data to Improve the Reliability of s  data from a series of similar samples accumulated over time can often be pooled to provide an estimate of s superior to the value of the individual subset.  mathematical equation of the superior s or pooled standard deviation:  x    x  N1 N2 2 2 i  x1 j  x2 ... i 1 j 1 S pooled  N 1  N 2  N 3 ...  N T where: N1 = number of data in set 1 N2 = number of data in set 2 NT = number of (data) sets that are pooled N1 + N2 + … – NT = degrees of freedom Example: S pooled Glucose levels are routinely monitored in patients suffering from diabetes. The glucose concentrations in a patient with mildly elevated glucose levels were determined in different months by a spectrophotometric analytical method. The patient was placed on a low- sugar diet to reduce the glucose levels. The following results were obtained during a study to determine the effectiveness of the diet. Calculate a pooled estimate of the standard deviation for the method. Example: S pooled Glucose levels are routinely monitored in patients suffering from diabetes. The glucose concentrations in a patient with mildly elevated glucose levels were determined in different months by a spectrophotometric analytical method. The patient was placed on a low- sugar diet to reduce the glucose levels. The following results were obtained during a study to determine the effectiveness of the diet. Calculate a pooled estimate of the standard deviation for the method.  x    x  N1 N2 2 2 i  x1 j  x2 ... i 1 j 1 S pooled  N 1  N 2  N 3 ...  N T x +x1182.80   +x1086.80  x + 2950.86 x  x    N1 N2 N1 N  6907.89 2 2 2 1687.43 i 1 ... j 2 i 1 i 1 j 1 i 1 j S pooled  S pooled  N 1 7 N+2 5 N+3 5... + 7N-T 4 24N 1- 4 N 2  N  x    x  N1 N2 2 2 i  x1 j  x2 ... i 1 j 1 S pooled  N 1  N 2  N 3 ...  N T x +x1182.80   +x1086.80  x + 2950.86 x  x    N1 N2 N1 N  6907.89 2 2 2 1687.43 i 1 ... j 2 i 1 i 1 j 1 i 1 j S pooled  S pooled  N 1 7 N+2 5 N+3 5... + 7N-T 4 24N 1- 4 N 2  N The Confidence Limit  The exact value of the mean, μ, for a population of data can never be determined exactly because such a determination requires that an infinite number of measurements be made. Statistical theory, however, allows us to set limits around an experimentally determined mean within which the population mean lies with a given degree of probability. These limits are called confidence limits, and the interval they define is known as the confidence interval, CI. CI for μ = x ± ts N Example 4 From the same set of data in Example 3, replicate determination of the lead content of a blood sample: 0.752, 0.756, 0.752, 0.751, and 0.760 ppm Pb. Mean = 0.754 Standard deviation = 0.004 Standard deviation of the mean = 0.002 Relative standard deviation (RSD) = 4.996 ppt Calculate the 95% confidence interval. Example 4 From the same set of data in Example 3, replicate determination of the lead content of a blood sample: 0.752, 0.756, 0.752, 0.751, and 0.760 ppm Pb. Calculate the 95% confidence interval. CI for μ = x ± ts N Confidence limit = 0.754 ± 2.78 (0.00377) = 0.754 ± 0.005 5 Confidence interval = 0.750 – 0.759 Detection of Gross Error  When a set of data contains an outlying result that appears to differ exclusively from the average, the decision must be made whether to retain or reject it. It is an unfortunate fact that no universal rule can be invoked to settle the question of retention or rejection. The Q-test  is a simple, widely used statistical test; Qexp is the absolute value of the questionable result Xq and its neighbor Xn (provided that the result was arranged in increasing or decreasing order) divided by the range or spread of the entire set. Xq  Xn Qexp  w Example 5 2. Apply Q-test to the following set of data and determine whether the outlying result is retained or rejected at 95%. 41.27, 41.71, 41.84, 41.78 41.27 - Xq w, range = 41.84 - 41.27 = 0.57 41.71 - Xn Xq  Xn 41.78 Qexp  = /41.27-41.71/ = 0.772 41.84 w 0.57 Qexp 0.772 < Qcrit = 0.829  retain So, Mean = 41.65 and Median = 41.75 Reported value is 41.75 What if the outlier is to be rejected? The reported value is the mean (without the outlier) = 41.78 Recommendation for Treatment of Outliers:  Reexamine carefully all data relating to the outlying result to see if a gross error could have affected its value.  If possible, estimate the precision that can be reasonably expected from the procedure to be sure that the outlying result actually is questionable.  If more data cannot be secured, apply Q-test to the existing set if the doubtful result should be retained or rejected on statistical grounds.  If the Q-test indicates retention, consider reporting the median of the set rather than the mean. The median has the great virtue of allowing inclusion of all data in a set without undue influence from an outlying value. In addition, the median of a normally distributed set containing 3 measurements provides a better estimate of the correct value than the mean of the set after an outlying value has been discarded. Least-Square Method (A tool for calibration plots)  Most analytical methods are based on experimentally determined/derived calibration plots/curves in which a measured quantity, y, is plotted as a function of x. The ordinate is the dependent variable and the abscissa is the independent variable. As is typical and desirable, the plot approximates a straight line. Note however than along the process, indeterminate errors may arise and consequently not all data fall exactly on the same line. Least-Square Method (A tool for calibration plots) Assumptions: 1. There is actually a linear relationship between the measured variable, y, and the analyte concentration, x. Recall: Equation of the Line: y = mx + b where b = y-intercept (the value of y when x is zero) m = slope of the line. 2. Any deviations of the individual points from the straight line results from error in the measurement, that is, it is assumed there is no error in the x values of the points. 2. R2 or coefficient of determination - measures the fraction of the observed variation in the y axis Application of Statistics to Data Treatment and Evaluation Experimentalist use statistical calculations to sharpen their judgments concerning the quality of experimental measurements. The most common application of statistics to analytical chemistry includes: a. establishing confidence limits for the mean of a set of replicate data. b. determining the number of replications required to decrease the confidence limit for a mean for a given level of confidence. c. determining at a given probability whether an experimental mean is different from the accepted value for the quantity being measured. d. determining at a given probability level whether two experimental means are different.. Application of Statistics to Data Treatment and Evaluation Experimentalist use statistical calculations to sharpen their judgments concerning the quality of experimental measurements. The most common application of statistics to analytical chemistry includes: e. determining at a given probability level whether precision of two sets of measurements differs. f. deciding whether an outlier is probably the result of a gross error and should be discarded in calculating a mean. g. defining and estimating detection limits h. treating calibration data. i. in quality control of analytical data and of industrial products. Exercises 1 1. Consider the following sets of replicate samples. 0.0902, 0.0980, 0.0956, 0.1000, 0.0925 Calculate the: a. Mean b. Median c. standard deviation, d. coefficient of variation, e. standard deviation of the mean f. 95% confidence limit. (Observe rules on significant figures) Exercises 2 2. Apply Q-test to the following set of data and determine whether the outlying result is retained or rejected at 95% probability. What is then the reported value? 7.290, 7.284, 7.388, 7.292 Exercises 3 3. The sulfate ion concentration in natural water can be determined by measuring the turbidity that results when an excess of BaCl2 is added to a measured quantity of the sample. A turbiditimeter, the instrument used for this analysis, was calibrated with a series of standard Na2SO4 solutions. The following data were obtained for the calibration: Assume that a linear relationship exists Cx , mg SO42– Turbiditimeter between the instrument reading and /L reading, R concentration. 0.00 0.06 a. Derive an equation of a line that tells us the relationship of Cx and 5.00 1.48 R. 10.00 2.28 b. What is the concentration of 15.00 3.98 sulfate when the turbiditimeter 20.00 4.61 reading was 3.67? 6 a. y = 0.232x + 0.162 5 R² = 0.9834 4 3 2 1 0 0 5 10 15 20 25 b. x = y - 0.162 = 3.67 - 0.162 = 15.12 0.232 0.232 R2, the coefficient of determination  a better measure for best fit in a line.  the goodness of fit is judged by the number of 9’s.  three 9’s (0.999) or better represents an excellent fit. References  D.A. Skoog, D.M. West, F.J. Holler, and S.R. Crouch, Fundamentals of Analytical Chemistry, 9th ed., Thomson Learning Asia, Singapore, 2014.  Supplemental Notes  Web references

Use Quizgecko on...
Browser
Browser