Basic Measurement Statistics 2023 PDF
Document Details
Uploaded by TolerableBliss
Vrije Universiteit Amsterdam
2023
Govert W. Somsen
Tags
Summary
These notes cover basic measurement statistics, including confidence intervals and significance tests. They also cover alcohol control, determination of alcohol, and quantitative analysis of ethanol. The data provided comes from a 24th November 2023 MAP lecture at Vrije Universiteit Amsterdam.
Full Transcript
Basic measurement statistics confidence intervals significance tests Govert W. Somsen Division BioAnalytical Chemistry Vrije Universiteit Amsterdam [email protected] MAP 24 November 2023 alcohol control ? analysis determination of alcohol head space sampling followed by gas chromatography ai...
Basic measurement statistics confidence intervals significance tests Govert W. Somsen Division BioAnalytical Chemistry Vrije Universiteit Amsterdam [email protected] MAP 24 November 2023 alcohol control ? analysis determination of alcohol head space sampling followed by gas chromatography air+vapor (head space) diluted blood quantitative analysis of ethanol by HS-GC golden standard for determination of ethanol in blood peak height or area is proportional to concentration ethanol alcohol limit legal limit for concentration alcohol in blood when in traffic: 0.5 mg/ml = 0.5‰ Result (mg/ml) Decision: fine or no fine? Person 1: 0.61 Person 2: 0.55 Person 3: 0.51 Person 3 (5 times): 0.49; 0.53; 0.52; 0.51; 0.53 average = 0.516 standard deviation = 0.017 how to deal with uncertainty in measurements? analytical measurements analytical measurement is an estimate of true value T Xi true value measured value • measurement result (observed value) = Xi • true value = T error = Xi - T errors in analytical measurements • gross errors to be avoided • systematic errors = bias - consistent magnitude and direction - method bias - laboratory bias can be estimated by comparison with reference and corrected for • random errors - many causes and sources (mostly unknown) - variable magnitude and direction - always occur (unavoidable) cannot be prevented, but effect can be reduced by repeating/averaging random error gives fluctuation of measurement values a proper analytical method has no bias infinite number of repeated measurements gives distribution of measured values (xi) frequency T xi μ (mean of infinite measurement values) when random errors only (no bias): µ = T measurement values are normally distributed y 1 y e 2 σ µ population of measurements with mean µ en standard deviation σ x 1 ( x )2 2 2 normal distribution of measurement values 95.45% of all possible measurement values lies between (µ-3σ) and (µ+3σ) the chance that a measurement value lies between (µ-3σ) en (µ+3σ) is 95.45% average of measurements we estimate µ by calculating the average x of a number of measurements (sample): average of a sample n = number of measurements (sample size) because of random error, x is not a fully accurate estimate of µ (= T) we need to take the effect of random error into account ! standard deviation of measurements the standard deviation σ is a measure for the spread the analytical method causes sometimes the standard deviation σ of an analytical method is known IF NOT, we estimate σ by calculating the standard deviation s of the sample: standard deviation of sample (s) s relative standard deviation s RSD x (x x) i (n 1) 2 n = number of measurements (sample size) variance coefficient of variation (CV) CV (%) = 100 ∙ RSD s 2 (x i x )2 (n 1) confidence interval we express the uncertainty in x by giving a confidence interval (CI) the width of the CI reflects the uncertainty in the estimate of the true value (T) by x x is the average of the measurements n is the number of measurements increasing n makes CI smaller σx is the standard deviation of the analytical method z determines the degree of certainty (confidence) z value certainty 1.65 90% 1.96 95% 2.58 99% 3.29 99.9% example: with a chance of 95% (5% uncertainty) de true value T lies within the interval x 1 .96 x n standard confidence levels y y 95% µ-1.96σ µ 99% µ+1.96σ x µ-2.58σ µ x µ+2.58σ extra uncertainty when σ is estimated by s the standard deviation σ of the analytical method often is not known then we estimate σ by calculating the standard deviation s of the sample this causes an extra uncertainty in the confidence interval: we cannot replace σ by s just like that the extra uncertainty is accounted for by replacing z by t: s (x i x )2 (n 1) t follows from the so-called t distribution the value of t depends on the degrees of freedom and the selected certainty (e.g. 95% or 99%) the number of degrees of freedom = n-1 t-table certainty degrees of freedom (n-1) 95% 99% 99.9% 1 12.71 63.67 636.6 2 4.30 9.92 31.6 3 3.18 5.84 12.9 4 2.78 4.60 8.61 5 2.57 4.03 6.87 6 2.45 3.71 5.96 10 2.23 3.17 4.59 t increases with higher certainty t decreases with more measurements for very large n: t = z CI calculation From a urine sample the concentration Na+ has been measured 6 times. Results: 102, 97, 99, 98, 101, 106 mM. Give the 95% en 99% confidence interval for the Na+ concentratie. average Na+ concentratie x = 100.5 mM standard deviation s = 3.27 mM number of measurements n = 6; degrees of freedom = n-1 = 6-1 = 5 t value for 95% = 2.57; t value for 99% = 4.03 CI95% = 100.5 ± 2.57·(3.27/√6) = 100.5 ± 3.4 mM CI99% = 100.5 ± 4.03·(3.27/√6) = 100.5 ± 5.4 mM procedure for calculating and reporting CIs when the standard deviation σ of the analytical method is known 1. choose confidence level (e.g. 95%, 99% or 99.9%); 2. find the value for z (from z table); 3. calculate x from measured values (xi); CI x x n 4. calculate z(σx/√n); 5. report interval (incl. confidence level, and number of measurements). when the standard deviation σ of the analytical method is unknown 1. choose confidence level (e.g. 95%, 99% or 99.9%); 2. determine the degrees of freedom (= n-1) 3. find the value for t (from t table); 4. calculate x s CI x t x n from measured values (xi); 5. calculate t(sx/√n); 6. report interval (incl. confidence level and number of measurements). comparison of measurement results answering analytical questions comparison of measurement result with a reference value - does the tablet contain the stated amount of paracetamol of 500 mg? - does the pH meter give the correct value? comparison of two measurement results - do two soil samples have the same iron content? - does the glucose blood level differ between two patients? comparison of measurement results with each other or with a reference value is called testing significance testing Problem: when are two values really different? • measurements results follow a normal distribution • difference may be result of random errors only • difference should be large enough to exclude influence of random errors at a certain confidence • include the risk of being wrong in conclusion the ratio of the difference (Δ) between the tested values and the standard deviation (σ) of this difference is our test parameter we test the difference at a chosen standard confidence levels: confidence level 95% - 99% - 99.9% comparison of mean with reference value (σ known): z-test z-test examines the difference between measurement average and a reference value (a) when σ of method is known procedure - calculate test parameter: z x a x / n - find limiting z value for required confidence level from z table - when |zΔ| ≤ zlimit: no significant difference when |zΔ| > zlimit: significant difference zlimit confidence 1.65 90% 1.96 95% 2.58 99% 3.29 99.9% comparison of mean with reference value (σ known): z-test example - a tablet should contain 25.0 mg caffeine - the content is determined 4 times with an existing method (σ = 1.0 mg) - results: 25.6; 24.8; 26.1; 26.0 mg - question: does the content deviate from the norm? (with 95% confidence) test parameter z x a x / n a = 25.0 n=4 x = 25.62 σx = 1.0 zΔ = (25.62 – 25.0)/(1.0/√4) = 1.24 zlimit (95%) = 1.96 (from z-table) zΔ ≤ zlimit : caffeine content does not significantly differ from the norm comparison of mean with reference value (σ unknown): t-test t-test examines the difference between measurement average and a reference value (a) when σ of method is unknown and estimated by s procedure - calculate test parameter: ∆ ∆ - find limiting t-value (df=n-1) for required confidence level from t-table - when |tΔ| ≤ tlimit: no significant difference - when |tΔ| > tlimit: significant difference comparison of mean with reference value (σ unknown): t-test example - a tablet should contain 25.0 mg caffeine - the content is determined 5 times with a new analytical method - results: 25.8; 27.6; 25.7; 25.2; 26.3 mg - question: does the tablet deviate from the norm? (95% confidence) test parameter ∆ a = 25.0 n = 5 (df = 4) ∆ x = 26.06 sx = 0.91 tΔ = (26.06 – 25.0)/(0.91/√5) = 2.60 tlimit (95%, 4 df) = 2.78 (from t-table) tΔ ≤ tlimit: caffeine content does not significantly differ from the norm comparison of means (σ’s known): z-test z-test examines the difference between two measurements averages when σ of method(s) is known variance (σ2) of ∆ ∆ 2 x2 1 n1 x2 2 n2 procedure z - calculate test parameter: x1 x2 x2 1 n1 x2 zlimit confidence n2 1.65 90% 1.96 95% 2.58 99% 3.29 99.9% 2 - find limiting z-value for required confidence level from z-table - when zΔ ≤ zlimit: no significant difference; - when zΔ > zlimit: significant difference comparison of means (σ’s known): z-test example Serum uric acid levels (mg/dL) are determined for individuals with and without Down’s syndrome. The σ for the method used for persons with Down’s is 1.0 mg/dL. The σ for the method used for persons without Down’s is 1.2 mg/dL. with Down’s n = 12; σ = 1.0 mg/dL x = 4.5 mg/dL Results: without Down’s n = 15; σ = 1.2 mg/dL x = 3.4 mg/dL Is there a significant difference in serum uric acid level between persons with and without Down’s (95% confidence)? test parameter z x1 x2 2 x1 n1 2 x2 n2 σΔ2 = ((1.0)2/12)+((1.2)2/15) = 0.179 σΔ = √(0.179) = 0.423 zΔ = (4.5 - 3.4)/0.423 = 2.60 zlimit (95%) = 1.96 (from z table) zΔ > zlimit: significant difference in serum uric acid level of persons with and without Down’s comparison of means (σ’s unknown): t-test t-test examines the difference between two averages measured with the same method when σ of the method is unknown calculate the pooled variance sp: s 2p ( n1 1) s12 ( n 2 1) s 22 n1 n 2 2 variance of ∆ ∆ 𝑠∆ = 𝑠 ̅ +𝑠 ̅ 𝑠 𝑠 = + 𝑛 𝑛 procedure t - calculate test parameter: s x1 x2 s 2p n1 s 2p n2 - find limiting t-value (df=n1+n2-2) for required confidence level from t-table - when tΔ ≤ tlimit: no significant difference - when tΔ > tlimit: significant difference comparison of means (σ unknown): t-test example For two solutions (A and B) the concentration sodium chloride (mg/l) is determined using the same method. Is there a significant difference between the sodium chloride concentrations of A and B (99% confidence)? Results: solution A solution B n=5 n=7 x = 26.06 x = 24.46 s = 1.00 s = 0.72 test parameter t x x 1 2 s s 2p s 2p n1 n2 s 2p (5 1) (1 .00 ) 2 ( 7 1) ( 0 .72 ) 2 0 .71 572 s2Δ = (0.71/5)+(0.71/7) = 0.243 sΔ = 0.49 tΔ = (26.06 - 24.46)/0.49 = 3.26 tlimit (99%, 10 df) = 3.17 (from t-table) tΔ > tlimit: significant difference in sodium chloride concentrations of solution A and B summary significance tests test σ test parameter comparison of mean with reference value known (z test) z comparison of mean with reference value unknown (t test) ∆ comparison of two means known (z test) z x a x / n ∆ x1 x2 x2 1 n1 comparison of two means using the same method unknown (t test) t x2 2 n2 x x 1 2 s s 2p s 2p n1 n2 s 2p ( n1 1) s12 ( n 2 1) s 22 n1 n 2 2 next Today Exercises Significance testing Friday 30 November at 10:00 Short Quiz 7 (Canvas) Friday 30 November at 13:30 Lecture Error propagation & Quantitative analysis Exercises Error propagation & Quantitative analysis