🎧 New: AI-Generated Podcasts Turn your study notes into engaging audio conversations. Learn more

MAP 2023- Basic measurement statistics.pdf

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Full Transcript

Basic measurement statistics confidence intervals significance tests Govert W. Somsen Division BioAnalytical Chemistry Vrije Universiteit Amsterdam [email protected] MAP 24 November 2023 alcohol control ? analysis determination of alcohol head space sampling followed by gas chromatography ai...

Basic measurement statistics confidence intervals significance tests Govert W. Somsen Division BioAnalytical Chemistry Vrije Universiteit Amsterdam [email protected] MAP 24 November 2023 alcohol control ? analysis determination of alcohol head space sampling followed by gas chromatography air+vapor (head space) diluted blood quantitative analysis of ethanol by HS-GC golden standard for determination of ethanol in blood peak height or area is proportional to concentration ethanol alcohol limit legal limit for concentration alcohol in blood when in traffic: 0.5 mg/ml = 0.5‰ Result (mg/ml) Decision: fine or no fine? Person 1: 0.61 Person 2: 0.55 Person 3: 0.51 Person 3 (5 times): 0.49; 0.53; 0.52; 0.51; 0.53 average = 0.516 standard deviation = 0.017 how to deal with uncertainty in measurements? analytical measurements analytical measurement is an estimate of true value T Xi true value measured value • measurement result (observed value) = Xi • true value = T error = Xi - T errors in analytical measurements • gross errors to be avoided • systematic errors = bias - consistent magnitude and direction - method bias - laboratory bias can be estimated by comparison with reference and corrected for • random errors - many causes and sources (mostly unknown) - variable magnitude and direction - always occur (unavoidable) cannot be prevented, but effect can be reduced by repeating/averaging random error gives fluctuation of measurement values a proper analytical method has no bias infinite number of repeated measurements gives distribution of measured values (xi) frequency T xi μ (mean of infinite measurement values) when random errors only (no bias): µ = T measurement values are normally distributed y 1 y e  2 σ µ population of measurements with mean µ en standard deviation σ x  1 ( x  )2 2 2 normal distribution of measurement values 95.45% of all possible measurement values lies between (µ-3σ) and (µ+3σ) the chance that a measurement value lies between (µ-3σ) en (µ+3σ) is 95.45% average of measurements we estimate µ by calculating the average x of a number of measurements (sample): average of a sample n = number of measurements (sample size) because of random error, x is not a fully accurate estimate of µ (= T) we need to take the effect of random error into account ! standard deviation of measurements the standard deviation σ is a measure for the spread the analytical method causes sometimes the standard deviation σ of an analytical method is known IF NOT, we estimate σ by calculating the standard deviation s of the sample: standard deviation of sample (s) s relative standard deviation s RSD  x  (x  x) i (n  1) 2 n = number of measurements (sample size) variance coefficient of variation (CV) CV (%) = 100 ∙ RSD s 2  (x  i  x )2 (n  1) confidence interval we express the uncertainty in x by giving a confidence interval (CI) the width of the CI reflects the uncertainty in the estimate of the true value (T) by x x is the average of the measurements n is the number of measurements increasing n makes CI smaller σx is the standard deviation of the analytical method z determines the degree of certainty (confidence) z value certainty 1.65 90% 1.96 95% 2.58 99% 3.29 99.9% example: with a chance of 95% (5% uncertainty) de true value T lies within the interval x  1 .96 x n standard confidence levels y y 95% µ-1.96σ µ 99% µ+1.96σ x µ-2.58σ µ x µ+2.58σ extra uncertainty when σ is estimated by s the standard deviation σ of the analytical method often is not known then we estimate σ by calculating the standard deviation s of the sample this causes an extra uncertainty in the confidence interval: we cannot replace σ by s just like that the extra uncertainty is accounted for by replacing z by t: s  (x i  x )2 (n  1) t follows from the so-called t distribution the value of t depends on the degrees of freedom and the selected certainty (e.g. 95% or 99%) the number of degrees of freedom = n-1 t-table certainty degrees of freedom (n-1) 95% 99% 99.9% 1 12.71 63.67 636.6 2 4.30 9.92 31.6 3 3.18 5.84 12.9 4 2.78 4.60 8.61 5 2.57 4.03 6.87 6 2.45 3.71 5.96 10 2.23 3.17 4.59 t increases with higher certainty t decreases with more measurements for very large n: t = z CI calculation From a urine sample the concentration Na+ has been measured 6 times. Results: 102, 97, 99, 98, 101, 106 mM. Give the 95% en 99% confidence interval for the Na+ concentratie. average Na+ concentratie x = 100.5 mM standard deviation s = 3.27 mM number of measurements n = 6; degrees of freedom = n-1 = 6-1 = 5 t value for 95% = 2.57; t value for 99% = 4.03 CI95% = 100.5 ± 2.57·(3.27/√6) = 100.5 ± 3.4 mM CI99% = 100.5 ± 4.03·(3.27/√6) = 100.5 ± 5.4 mM procedure for calculating and reporting CIs when the standard deviation σ of the analytical method is known 1. choose confidence level (e.g. 95%, 99% or 99.9%); 2. find the value for z (from z table); 3. calculate x from measured values (xi); CI  x  x n 4. calculate z(σx/√n); 5. report interval (incl. confidence level, and number of measurements). when the standard deviation σ of the analytical method is unknown 1. choose confidence level (e.g. 95%, 99% or 99.9%); 2. determine the degrees of freedom (= n-1) 3. find the value for t (from t table); 4. calculate x  s  CI  x  t  x   n from measured values (xi); 5. calculate t(sx/√n); 6. report interval (incl. confidence level and number of measurements). comparison of measurement results answering analytical questions comparison of measurement result with a reference value - does the tablet contain the stated amount of paracetamol of 500 mg? - does the pH meter give the correct value? comparison of two measurement results - do two soil samples have the same iron content? - does the glucose blood level differ between two patients? comparison of measurement results with each other or with a reference value is called testing significance testing Problem: when are two values really different? • measurements results follow a normal distribution • difference may be result of random errors only • difference should be large enough to exclude influence of random errors at a certain confidence • include the risk of being wrong in conclusion the ratio of the difference (Δ) between the tested values and the standard deviation (σ) of this difference is our test parameter   we test the difference at a chosen standard confidence levels: confidence level 95% - 99% - 99.9% comparison of mean with reference value (σ known): z-test z-test examines the difference between measurement average and a reference value (a) when σ of method is known procedure - calculate test parameter: z     x a x / n - find limiting z value for required confidence level from z table - when |zΔ| ≤ zlimit: no significant difference when |zΔ| > zlimit: significant difference zlimit confidence 1.65 90% 1.96 95% 2.58 99% 3.29 99.9% comparison of mean with reference value (σ known): z-test example - a tablet should contain 25.0 mg caffeine - the content is determined 4 times with an existing method (σ = 1.0 mg) - results: 25.6; 24.8; 26.1; 26.0 mg - question: does the content deviate from the norm? (with 95% confidence) test parameter z     x a x / n a = 25.0 n=4 x = 25.62 σx = 1.0 zΔ = (25.62 – 25.0)/(1.0/√4) = 1.24 zlimit (95%) = 1.96 (from z-table) zΔ ≤ zlimit : caffeine content does not significantly differ from the norm comparison of mean with reference value (σ unknown): t-test t-test examines the difference between measurement average and a reference value (a) when σ of method is unknown and estimated by s procedure - calculate test parameter: ∆ ∆ - find limiting t-value (df=n-1) for required confidence level from t-table - when |tΔ| ≤ tlimit: no significant difference - when |tΔ| > tlimit: significant difference comparison of mean with reference value (σ unknown): t-test example - a tablet should contain 25.0 mg caffeine - the content is determined 5 times with a new analytical method - results: 25.8; 27.6; 25.7; 25.2; 26.3 mg - question: does the tablet deviate from the norm? (95% confidence) test parameter ∆ a = 25.0 n = 5 (df = 4) ∆ x = 26.06 sx = 0.91 tΔ = (26.06 – 25.0)/(0.91/√5) = 2.60 tlimit (95%, 4 df) = 2.78 (from t-table) tΔ ≤ tlimit: caffeine content does not significantly differ from the norm comparison of means (σ’s known): z-test z-test examines the difference between two measurements averages when σ of method(s) is known variance (σ2) of ∆ ∆   2   x2 1 n1   x2 2 n2 procedure z  - calculate test parameter:    x1  x2  x2 1 n1   x2 zlimit confidence n2 1.65 90% 1.96 95% 2.58 99% 3.29 99.9% 2 - find limiting z-value for required confidence level from z-table - when zΔ ≤ zlimit: no significant difference; - when zΔ > zlimit: significant difference comparison of means (σ’s known): z-test example Serum uric acid levels (mg/dL) are determined for individuals with and without Down’s syndrome. The σ for the method used for persons with Down’s is 1.0 mg/dL. The σ for the method used for persons without Down’s is 1.2 mg/dL. with Down’s n = 12; σ = 1.0 mg/dL x = 4.5 mg/dL Results: without Down’s n = 15; σ = 1.2 mg/dL x = 3.4 mg/dL Is there a significant difference in serum uric acid level between persons with and without Down’s (95% confidence)? test parameter z     x1  x2  2 x1 n1   2 x2 n2 σΔ2 = ((1.0)2/12)+((1.2)2/15) = 0.179 σΔ = √(0.179) = 0.423 zΔ = (4.5 - 3.4)/0.423 = 2.60 zlimit (95%) = 1.96 (from z table) zΔ > zlimit: significant difference in serum uric acid level of persons with and without Down’s comparison of means (σ’s unknown): t-test t-test examines the difference between two averages measured with the same method when σ of the method is unknown calculate the pooled variance sp: s 2p  ( n1  1)  s12  ( n 2  1)  s 22 n1  n 2  2 variance of ∆ ∆ 𝑠∆ = 𝑠 ̅ +𝑠 ̅ 𝑠 𝑠 = + 𝑛 𝑛 procedure t  - calculate test parameter:   s x1  x2 s 2p n1  s 2p n2 - find limiting t-value (df=n1+n2-2) for required confidence level from t-table - when tΔ ≤ tlimit: no significant difference - when tΔ > tlimit: significant difference comparison of means (σ unknown): t-test example For two solutions (A and B) the concentration sodium chloride (mg/l) is determined using the same method. Is there a significant difference between the sodium chloride concentrations of A and B (99% confidence)? Results: solution A solution B n=5 n=7 x = 26.06 x = 24.46 s = 1.00 s = 0.72 test parameter t   x x  1 2 s s 2p s 2p  n1 n2 s 2p  (5  1)  (1 .00 ) 2  ( 7  1)  ( 0 .72 ) 2  0 .71 572 s2Δ = (0.71/5)+(0.71/7) = 0.243 sΔ = 0.49 tΔ = (26.06 - 24.46)/0.49 = 3.26 tlimit (99%, 10 df) = 3.17 (from t-table) tΔ > tlimit: significant difference in sodium chloride concentrations of solution A and B summary significance tests test σ test parameter comparison of mean with reference value known (z test) z  comparison of mean with reference value unknown (t test) ∆ comparison of two means known (z test) z     x a x / n ∆    x1  x2  x2 1 n1 comparison of two means using the same method unknown (t test) t    x2 2 n2  x x  1 2 s s 2p s 2p  n1 n2 s 2p  ( n1  1)  s12  ( n 2  1)  s 22 n1  n 2  2 next Today Exercises Significance testing Friday 30 November at 10:00 Short Quiz 7 (Canvas) Friday 30 November at 13:30 Lecture Error propagation & Quantitative analysis Exercises Error propagation & Quantitative analysis

Use Quizgecko on...
Browser
Browser