Class 1 Introduction 2023 PDF Analysis of Biological Data
Document Details
Uploaded by CleanestRhodochrosite3697
Vrije Universiteit Brussel
2023
Bram Vanschoenwinkel
Tags
Summary
This document is a course introduction to the analysis of biological data. It discusses the importance of evidence-based reasoning in science and critiques pseudoscience and superstition. The introduction includes examples and warns against common fallacies in evaluating evidence, such as anecdotal evidence and false causality.
Full Transcript
Analysis of Biological Data Analysis of Biological Data Class 1: a brief recap of basic concepts Prof. dr Bram Vanschoenwinkel Teacher information Bram Vanschoenwinkel: Prof. in Ecology at VUB (October 2013) - Community ecology lab - Spatial models, habitat selection, connec...
Analysis of Biological Data Analysis of Biological Data Class 1: a brief recap of basic concepts Prof. dr Bram Vanschoenwinkel Teacher information Bram Vanschoenwinkel: Prof. in Ecology at VUB (October 2013) - Community ecology lab - Spatial models, habitat selection, connectivity - Biodiversity, Ecosystem services Island model systems: ponds, wetlands, inselbergs Research website: www.insularecology.com 2 Part 1: Scientia vincit tenebras ?! Analysis of Biological Data Prof. dr Bram Vanschoenwinkel An inconvenient truth... 4 1. Science used to be a select club of experts... Friedrich Schiller, Alexander & Wilhelm von Humboldt, Johann Wolgang von Goethe 5 2. Now every other large town on the planet has a ‘university’ and delivers PhDs 6 Source: WHED. Dates marked when the number of universities in the world doubled. 3. Not everything was worthy of publication Darwin waited more than 20 years before publishing the Origin of Species You published if you had a discovery Now there are crappy journals willing to publish everything 7 https://www.nature.com/news/the-pressure-to-publish-pushes-down-quality-1.19887 8 4. Public distrust in science on the rise... There is always a “scientist” willing to sell his soul to support a ridiculous claim. Media fail to correctly understand /deliberately miscite scientific publications Scientists oversell their results... Criminal fraud among scientists. 9 5. Politicians disregard scientific evidence Politicians are often scientifically illiterate or choose to disregard scientificially proven facts based on ideology The Flemish minister of “Agriculture & Environment” is a lawyer In politics it is easy to get a job you are not remotely qualified for 10 Prof. Diederik Stapel Dutch award winning social psychologist Manipulated data Fabricated data 130 papers, of which at least 20 are with fake data Papers were retracted by journals Suspended from his university “ I wanted too much, too fast, in a system with few checks and balances, where people work alone I took the wrong turn” 11 What can we do? 12 What can we do? Actively teach critical thinking in schools and universities Expose pseudoscience and attack superstition! Put pressure on politicians to implement evidence-based policy Scientists must expose and excommunicate frauds We should not train more scientists but more capable scientists 13 A little exercise... 14 1. Do you believe vaccinating children holds serious health risks? A. Yes B. No 15 2. Do you believe consumption of GMO crops results in health risks for humans? A. Yes B. No 16 3. Do you believe life on earth evolved via natural selection? A. Yes B. No 17 4. Do you believe humans are responsible for global warming? A. Yes B. No 18 Is there room for improvement? 19 Are these answers based on evidence? What is evidence? 20 “People almost invariably arrive at their beliefs not on the basis of proof but on the basis of what they find attractive.” ― Blaise Pascal (1623-1662) , De l'art de persuader 21 Not all things people believe are supported by evidence “You can make good luck potions out of the body parts of albino humans” “ Rhino horn will enhance your libido” Human society is still plagued with superstitious claims and misbeliefs 22 Not all things people believe are supported by evidence “Homeopathy will help you cure cancer” “Electricity will help you to lose weight” Human society is still plagued with superstitious claims and misbeliefs 23 Not all things people believe are supported by evidence “AIDS can be cured by eating vegetables” “Jews, African people and gypsies belong to an inferior race” 24 Often “inconvenient” evidence is ignored “The global climate is not changing” “The climate is not changing because “Ok, it is changing but people are not responsible” we just had a cold winter” Politicians, demagogues, large corporations, conspiracy theorists, religious fundamentalists often make statements that are not supported by evidence. 25 In science, what you believe is irrelevant, what matters is what the evidence says Without solid evidence, you cannot make a statement that has any value. 26 A. Science: knowledge based using rigorous reproducible tests, using the scientific method real evidence A. Pseudoscience is a claim, belief or practice which is incorrectly presented as scientific, but does not adhere to a valid scientific method, cannot be reliably tested, or otherwise lacks scientific status no evidence B. Superstition is the belief in supernatural causality—that one event causes another without any natural process linking the two events—such as astrology, religion, omens, witchcraft, and prophecies, that contradicts natural science no evidence 27 In this course we will try to figure out how to evaluate evidence 28 1. Does vaccinating children hold serious health risks? A. All evidence indicates it is true B. Most evidence indicates it is true C. All evidence indicates it is false D. Most evidence indicates it is false E. There is mixed evidence F. I have no idea 29 2. Does consumption of GMO crops result in health risks for humans? A. All evidence indicates it is true B. Most evidence indicates it is true C. All evidence indicates it is false D. Most evidence indicates it is false E. There is mixed evidence F. I have no idea 30 3. Did life on earth evolve via natural selection? A. All evidence indicates it is true B. Most evidence indicates it is true C. All evidence indicates it is false D. Most evidence indicates it is false E. There is mixed evidence F. I have no idea 31 4. Are humans responsible for global warming? A. All evidence indicates it is true B. Most evidence indicates it is true C. All evidence indicates it is false D. Most evidence indicates it is false E. There is mixed evidence F. I have no idea 32 5. Is an invisible allmighty alien monster present in this room? A. All evidence indicates it is true B. Most evidence indicates it is true C. All evidence indicates it is false D. Most evidence indicates it is false E. There is mixed evidence F. I have no idea 33 6. Is there a God? A. All evidence indicates it is true B. Most evidence indicates it is true C. All evidence indicates it is false D. Most evidence indicates it is false E. There is mixed evidence F. I have no idea 34 35 Many people believe blindly without evidence “Tell people there's an invisible man in the sky who created the universe, and the vast majority will believe you. Tell them the paint is wet, and they have to touch it to be sure.” - George Carlin 36 The issue of untestable hypotheses The God hypothesis is an untestable hypothesis and therefore worthless from a scientific point of view. There is no proof that supreme beings exist, but we cannot design experiments to prove they don’t exist. We can scientifically disproof supernatural claims with evidence A scientist cannot test hypotheses about things that cannot be measured or detected. About such entities we are per definition agnostic. Processes or entities which are untestable - e.g. because they cannot be measured by definition – are not part of the domain of Science. 37 Real evidence or unreliable evidence “ I know a woman that went to a holy place in Lourdes and after she returned she recovered from a chronic disease, therefore this must really be a holy place” “ I have seen fossils that look just like animals that are alive today therefore there is no evolution” “People induce mutations in crops to make them resistant to pests. Mutations can lead to cancer. Therefore by eating these crops I am at higher risk of getting cancer” “Have you seen the ocean? It is endless. Therefore, to think that we could exhaust it by overfishing is simply ridiculous” 38 Real evidence or unreliable evidence “ I know a woman that went to a holy place in Lourdes and after she returned she recovered from a chronic disease, therefore this must really be a holy place” anecdote, not a replicated test “ I have seen fossils that look just like animals that are alive today therefore there is no evolution” cherry picking “People induce mutations in crops to make them resistant to pests. Mutations can lead to cancer. Therefore by eating these crops I am at higher risk of getting cancer” false causality “Have you seen the ocean? It is endless. Therefore, to think that we could exhaust it by overfishing is simply ridiculous” false assumption, false causality 39 Real evidence or unreliable evidence “ Evolution is the result of random mutations. Have you seen the complexity of the human eye? Such a complex organ cannot possibly evolve due to random events!” “I read in a journal / an expert said that climate change could be the result of a geological process, therefore we must not take actions to cut carbon emissions.” “ An Australian aboriginal who inadvertently farted while hiking in the outback suddenly noticed that this was followed by a full solar eclipse. From that day onward his tribe suspected him of having magical powers” “ An English trader who inadvertently invested 1 million pound in an internet start up company noticed that this was followed by a stock ‘bubble’ and a huge profit. From that day onward he got a big promotion and his colleagues suspected him of having a superior intelligence” 40 Real evidence or unreliable evidence “ Evolution is the result of random mutations. Have you seen the complexity of the human eye? Such a complex organ cannot possibly evolve due to random events!” false initial assumption “I read in a journal / an expert said that climate change could be the result of a geological process, therefore we must not take actions to cut carbon emissions.” cherry picking evidence / appeal to authority “ An Australian aboriginal who inadvertently farted while hiking in the outback suddenly noticed that this was followed by a full solar eclipse. From that day onward his tribe suspected him of having magical powers” fake causality “ An English trader who inadvertently invested 1 million pound in an internet start up company noticed that this was followed by a stock ‘bubble’ and a huge profit. From that day onward he got a big promotion and his colleagues suspected him of having a superior intelligence” 41 Alternative facts 42 The power of ignorant/biased opinon makers 43 44 Scientific debunking: 45 The study was retracted... but the story persists 12 children with austistic symptoms 8 of the twelve children developed the symptoms shortly after their vaccination Why retracted? Speculative nature of conclusions Financially supported by lawyers No controls (children without autism) Manipulating data 46 Fear mongering 47 Appealing to authority 48 Meta analysis 1800 studies found no effects 49 It is our mission as scientists to promote skeptical thinking and fight superstition and pseudo-science. Goal of this course: Learn to perform surveys, design experiments and perform analyses to look for solid evidence for the existence and importance of processes in nature. 50 Part 2: What’s on the menu? Analysis of Biological Data Prof. dr Bram Vanschoenwinkel Aim of the course: 1. have a working knowledge of the different types of basic and more advanced statistical approaches available to analyse biological data. 2. be able to choose appropriate statistical methods to analyze biological data 3. be able to perform statistical analyses in R 4. be able to correctly interpret results of statistical analyses 5. have basic knowledge about experimental design, the experimental method and research ethics. Ultimately this should prepare you to independently analyse data during your MSc thesis (PhD thesis) and during your later professional career. 52 What is in this course... Included: Inferential probabilistic statistics (using P values). Information theory based alternatives (e.g. using AIC) + a small taste of: Bayesian approaches Machine learning 53 A rough guide on how to fail this course miserably 1. Don’t attend the classes, much of it is stuff you’ve seen already. 2. Don’t finish the practical sessions in class, you’ll figure it out later on your own as you’ve always done. 3. Write down a few relevant key words and hope for the best during the oral exam. 4. Don’t bother understanding the background or the underlying mathematics, “if it is smaller than 0.05, we’re good, right? “ 5. Preferably give a simple answer with one possible solution 6. Be flexible with definitions... (predictor, response, term, level, factor, parameter...). It’s all data right? 7. Seduce a stats savy colleague to work on that annoying statistical report with you and ace it! 8. Never ask questions otherwise the prof will discover you’re an idiot. 54 Potential handbooks Basic: BSc, MSc level: Discovering Statistics using R – Andy Field, Jeremy Miles, Zoe Field, 2012 Advanced: MSc, PhD level: Biostatistical Desgn and Analysis using R, Murray Logan, 2010 55 Useful books and extra information Numerical Ecology with R – Daniel Borcard, 2011 Numerical Ecology– Legendre and Legendre, 2012 Nicholas Taleb’s books: - Fooled by randomness / The Black Swan 56 Background knowledge required? 1. At least one university level course in mathematics 2. At least one (introductory) course in statistics 3. Good knowledge of MS Excell (particularly data manipulation and plotting basic graphs) If this is not the case than extra self study will be required in order to be able to follow and pass this class. Make sure you understand the foundations (first courses) or you risk to get lost and don’t understand the later classes. Make sure you understand everything before the christmas break. 57 Extra information Useful websites www.statsmethods.net Quick R All you need to know to get started in R www.discoveringstatistics.com Statistics Hell https://chat.openai.com/ Chat GPT (use for coding tips!) 58 Replacement of traditional packages by one statistical platform with full functionality 59 Structure of the examination 1. Practical assignment: pairs of students should analyze a small dataset, test a limited set of hypotheses and write a short scientific report and a one page paper critique 2. Oral examination online with short (10 min) written preparation. - two questions - oral evaluation of the scientific report One overall grade / 20. Don’t memorize formulae, derivations. Try to understand the procedure. Evaluation will be mainly based on insight, but understanding statistics requires some memorization. 60 Part 3: Getting ready for lift-off 1. Why do biologists need statistics? 2. Quantifying variation in Nature 3. Measures of centrality and variance Analysis of Biological Data 4. Foundations of statistical inference Prof. dr Bram Vanschoenwinkel Analysis of Biological Data 1. Why do biologists need statistics? Why do we need statistics? Studies ofsingle individuals, habitats or experimental trials usually not relevant, nothing can reliably be concluded from it Many observations are needed in order to generalize patterns But observations are subject to variation Results (observations, experimental results) based on a limited amount of data (sample) which should be representative. need for generalisation to the real world (population) Is the variation we see meaningful or is it simply due to chance? e.g. sampling artefact 63 Why do we need statistics? Biology is more than simply collecting information e.g. describing which animals are present in the Serengeti or which proteins are present in the blood of a fluke It is what you do with this information that counts! Research with many independent observations is -in general- more valuable than information from one site or sample. 64 Why do we need statistics? “Not my problem: I’ll collect the data and find a statistician later to help me” A researcher should understand the entire research process, not outsource the most critical part: analysis and interpretation! 65 Why do we need statistics? Statistical analysis (SA) can be considered beautiful becaue it is an objective measure of truth. It does however involve subjective decisions which should be stated explicitly! We construct predictive and validative models of scientific theories which can help us do the right thing and prevent ineffective or harmful interventions 66 Why do we need statistics? Incorrect analyses can lead to: 1. damage to environment (e.g. toxicity tests) 2. loss of human lives (e.g. drug development trials) 3. squandering of resources (time, money) 4. reduce the value of discoveries 5. promote public distrust in science 6. slow down the rate of scientific progress 67 The scientific method Definition: the systematic observation, measurement and experiment and the formulation, testing and modification of hypotheses. + retest! 68 Aristotle (4 cent. BC) already believed in deriving general principles from observations The scientific method Scientists test the reliability of claims by subjecting them to a systematic investigation using some form of the scientific method A theory is judged based on different criteria: 1. explanatory power (e.g. past events) 2. predictive power (e.g. future events) 3. parsimony (Ockham’s razor) 4. falsifiability (Karl Popper) 5. repeatability of empirical results (under controlled conditions) 6. (scientific concensus among scientists) Example of a ‘busted’ scientific theory: medieval alchemists claimed it was possible to transform lead into gold 69 The scientific method Examples: theories / phenomena that could not be confirmed based on the scientific method vampires No scientific proof: Telepathy, homeopathy, Jesus appearances on toast, fortune telling, mind reading, creation Still no proof: Extra terrestrial life, negative effects of GMO’s on humans. 70 1. Explanatory power 60% can be explained Evapotranspiration is a good predictor of tree species R² = 60 richness in a certain area. A way to measure how good a fit is, is the coefficiënt of determination R² scatter = residuals residuals = what is not explained total sum of squares = how residuals will become smaller/less if model is well done much variation before making the R2=1- what is not explained 71 model SS = sum of squared distance = sum (prediction-observation)² 2. Predictive power Is a model calibrated based on a certain dataset able to predict patterns in another independent dataset? a predictive model you use the 90 available data to get model 80 temporary pond (cm) parameters. In a modeling context 70 Water level in a this can be considered calibration. 60 But if you then calculate the 50 accuracy of this model based on the 40 same data you do not measure the 30 predictive power of the model. 20 To do this you need to test accuracy 10 of the model on an independent 0 dataset not used for calibration. Avoid circular reasoning! Example: divide dataset in a training Calibration Validation dataset for model calibration and a testing dataset dataset dataset for validation. calibrated based on past and used to predict future 72 3. Parsimony Ockham’s razor: if there are many explanations the simplest one is to be preferred. Attributed to William of Ockham; he did not invent the concept, but lived by it. William of Ockham (1287-1347) Example: - Information criteria (e.g. Akaike’s information criterion) - Maximum parsimony trees in phylogenetics 73 4. Falsification principle Karl Popper distinguished among science and pseudoscience as follows: what is unfalsifiable is classified as unscientific, and the practice of declaring an unfalsifiable theory to be scientifically true is pseudoscience. e.g. the existence of an invisible supreme being cannot be falsified. Example: the theory that all swans are white can be falsified by observing one black swan. A theory can be (provisionally) accepted as the truth if it has not been falsified. Falsification still forms the basis of modern experimental research. e.g. rejecting a null hypothesis 74 Sir Karl Raimund Popper (1902-1994) 4. Falsification principle H0: the null hypothesis: there is no difference, there is no effect HA: the alternative hypothesis: there is a difference, there is an effect Instead of trying to confirm the alternative hypothesis it is mathematically much easier to falsify the null hypothesis! We will try to show that a certain pattern is unlikely assuming the null hypothesis is true! H0 will be rejected! 75 5. Repeatability For a theory to be ‘true’, the same results should appear when an experiment is reproduced. That is why the methods section of a paper should include sufficient detail. Lack of reproducibility has revealed instances of scientific fraud in the past. 76 Good scientists are skeptics Skepticism: any questioning attitude towards knowledge, facts, or opinions/beliefs stated as facts, or doubt regarding claims that are taken for granted elsewhere. Epistemology: field that studies knowledge, with regards to its methods, validity and scope A skeptic believes that empirical investigation of reality leads to the truth. 77 Steps in scientific investigations 1. Defining the aims and approach of the study a priori definition of the goals of the study, based on the actual knowledge gaps in the field of research, i.e. defining the research hypothesis (not a posteriori!) defining the methodology for the analysis 2. Data collection sampling design, experimental design 3. Data processing digitizing the data (into a spreadsheet or input table of a statistical software) graphical inspection of the data elimination/correction of outliers or improbable data replacing unlikely data by "missing values" 4. Choosing a statistical method (technique, test) inspired by the initial goals, and adapted to the kind of variables at hand starting with simple techniques and gradually applying more sophisticated techniques if warranted checking all conditions for the application of a given statistical method 5. Reporting and discussing results presenting results in graphs and tables in the most illustrative way discussing the results (critical appraisal of results, situate results within scientific literature drawing conclusion and possible applications/perspectives for the future 78 2. Quantifying variation in Nature 79 Analyzing variation in nature Variation in nature is shaped by ecological and evolutionary processes. e.g. niche filtering, natural selection These include both deterministic and stochastic processes. Different sources of information, each of which has specific advantages and disadvantages. In order to make reliable conclusions the information must be processed and analysed using a correct statistical procedure. Statistics allow to assess to what extent observed patterns in nature or in experiments are either real or due to chance. !!But watch out, bad design, bias and experimental mistakes cannot be fixed post Color polymorphism in sea shells hoc using statistics! Results from the interplay of ecological and evolutionary processes 80 Variation in composition / abundances Species in a community Alleles or traits in a population 81 Chemical variation DNA or RNA sequences Genomic, transcriptomic and proteomic information Isotopic signatures 82 same rabbits, practical but ignores natural variations too close, stacked on top, not same place = not independent experiments, Cases, variables, data not real replicates = pseudoreplication Cases (sampling/experimental unit) subjects e.g. individuals in an experiment objects e.g. samples, replicates Variables characteristics/traits that are measured/observed in a number of cases e.g. pH, salinity, body size, mortality Data numerical values obtained for the variables under study e.g. Weights of 100 rabbits, water quality in 200 lakes 83 Cases, variables, data Lines = cases Columns = variables 84 Types of variables Independent variables: = predictor variables, the variable which we think is a cause (X1, X2, X3,...) Dependent variables: = the response variable, the outcome variable, the variable we expect will depend on the independent variables (Y1, Y2, Y3,...) dependent variable Response = Predictor = independent variable 85 Types of variables 1. Categorical variables: Binary two categories e.g. (yes,no) or (0,1) or (HIV positive, HIV negative) Nominal > two categories e.g. (a, b, c, d,...) or (fox, badger, rat) (1, 10, 5, 2, 3) here 5 is not 3 larger than 2 or 1 it is just a category Ordinal (level 1, level 2, level 3) order matters, not quantitative difference, 2. Continuous variables: (e.g. 1, 1.23, 15, 6.01, 7.81,...) here 15 is 15 times larger than 1 quantitative differences matter Discrete: continuous variable that can only take on certain values e.g. Integers (1, 2, 3, 4, 5) no decimals, using for counting animals for ex Scale: continuous varable that can take on all values within a certain range 86 How do we obtain data? Field measurements, surveys: - sacrifice control over realism - gradients not always independent Laboratory experiments: - sacrifice realism over control - orthogonal treatments (e.g. full factorial design) - artificial environment artefacts Field experiments - can sometimes combine both realism and control Databases: e.g. IUCN, Genbank - often lack detailed important background information - coarse resolution 87 Sources of information If you perform a simplified laboratory experiment, you can test the potential validity of a theory. The results, however, do not necessarily imply that the same processes occur in nature. Laboratory experiment doesn’t capture full complexity of - reductionist approach environment - control, standardised conditions - straightforward - lack of realism not sure if results apply to field conditions Possible conclusion: If you add phosphorous the diversity of planktonic organisms that can survive increases in an aquarium experiment 88 Sources of information If you perform an experiment under semi natural conditions, you can test the potential validity of a theory. The results, however, do not necessarily imply that the same processes occur in nature. Field experiment - reductionist approach - control, standardised conditions - straightforward - more realistic! Possible conclusion: We found a hump shaped relationship between productivity and diversity in artificial ponds. 89 Sources of information If you study a lake you can make statements about this lake Case study - realistic! -Holistic approach - conclusions limited to the studied system! Lake Louise, Canadian Rockies Possible conclusion: Productivity in this lake is quite low, which might account for its high biodiversity 90 Sources of information If you study 100 ponds in the Norwegian arctic you can make conclusions about these ponds Possible conclusion: nutrient load promotes biodiversity in Norwegian arctic ponds 91 Sources of information If you do a meta analysis and jointly investigate results of many studies on lakes and ponds around the world, you can make general conclusions about overall patterns. Experimental confirmation leads to stronger conclusions and helps to link patterns to processes. Example conclusion: overall nutrient load in lakes and ponds first seems to promote diversity, but can reduce diversity when concentrations are too high 92 Sources of information If you study a process in the fruitfly you can make statements about the process this model species (be careful with extrapolation) Model species - realistic - conclusions limited to the studied species Possible conclusion: inhibition of the IPTY receptor in the fruitfly results in growth defects. Even though these results could reveal a general process (purpose of a model species) this does not mean the same process is also relevant in other organisms! 93 n ∑x i m=x= i =1 n centrality value mean/medium - which number is reprensentative for the information = summerization 3. Measures of centrality and variance 94 Foundations of statistical inference Identification of a problem: Green ponds are smelly and don’t contain much life. Develop a theory: Diversity and ecosystem services in ponds are driven by nutrients Develop a hypothesis: If we add too much phosphate, ponds should become green Null hypothesis (H0): adding phosphate does not impact water transparency To test hypotheses we need to gather information (experiments, surveys, modeling) and measure variables 95 Descriptive vs. inferential statistics descriptive statistics - ordering of data - presenting data in graphs and tables - summarising data in statistical parameters (mean, standard deviation, correlation coefficient,...) inferential statistics - determining the confidence interval of statistical parameters - testing hypotheses - drawing conclusions 96 Central measures: the mean “A statistic is a number that summarizes variation” Arithmetic mean = m = a very simple statistic summarizing information in a dataset with n observations. Takes into account all observations Sensitive to outliers n ∑x i m=x= i =1 only works with 1 (hump) n n = the number of observations Not very useful for: - multimodal distributions - assymmetrical distributions 97 Central measures: the median The numerical value that splits a distribution into 2 halves Arrange values from lowest to highest and pick the middle one If there is an even number of observations, there is no single middle value mean of the 2 middle values Synonym: percentile 50 (P50) Useful for: - unimodal distributions - symmetrical and asymmetrical (skewed) distributions - only depends on the middle value(s) - not sensitive to outliers ( mean) - limited possibilities for further statistical analysis - the median and mean coincide in symmetrical distributions 98 Variance and standard deviation Measures of spread or dispersion! 2 variance: 2 s = ∑ i ( x − x ) (n − 1)every point/response - mean (how far from center of value) squared (to make it positive) n - 1: degrees of freedom if sum is large = many point away from center of value if sum is small = many points close to center of value standard deviation = s = SD: 2 2 2 s= ∑ i ∑ i ∑ i /n ( x − x ) x − ( x ) = (n − 1) (n − 1) SD and variance tells you something about how well the mean represents the data Note: divided by n – 1 instead of n Bessel’s correction 99 Landmarks of a distribution: percentiles A percentile is the value of a variable below which a certain percentage of the observations fall so, the 20th percentile is interquartile range the value below which 20 percent of the observations may be found general formulation: Pk = k (n + 1) nd value in the ordered series 100 P25 (Q1) the first quartile P50 (Q2) the second quartile (= median) P75 (Q3) the third quartile P75 - P25 = interquartile distance IQR interquartile range 100 Box (and whisker) plots box plot: graphical summary of a distribution displays distributions without making assumptions about the underlying distribution, non-parametric (not based on means and standard deviations) indicates the degree of dispersion and skewness in the data maximum value that is not an outlier P75 box length (= IQR) = interquartile distance = P75 - P25 contains 50% of the data median P25 minimum value that is not an outlier 101 outliers: data at more than 1.5 box-lengths below P25 or above P75 Box (and whisker) plots box plot: graphical summary of a distribution displays distributions without making assumptions about the underlying distribution, non-parametric (not based on means and standard deviations) indicates the degree of dispersion and skewness in the data Avoid using boxplots, they are the lazy graphic of the superficial scientist, obscuring all sorts of useful info! Using them in your exam task is a 102 mortal sin. Mean +- standard deviation or standard error It’s better to plot means with either their standard deviation or standard error But what is the difference between these two concepts? standard error: is smaller than standard deviation./ 103 Populations vs. sample Population: is the complete group of objects / can be finite or infinite E.g.: salmon population in the North Atlantic, the human population, pigeons in Brussels Sample: part of a population that will be examined. Purpose is to draw conclusions which are valid for the population E.g.: 200 salmon from two populations in the Atlantic, full genomes of 100 people, DNA from feathers from 85 pigeons collected in Brussels Purpose of statistics is to draw conclusions which are valid for the entire population 104 Standard error statistical parameters (such as the mean) are calculated on sample data, and estimate the true population value for that parameter different samples taken from the same population may yield slightly different values for that statistical parameter thisdispersion can be quantified by the standard deviation of the calculated values of the statistical parameter in repeated sampling standard error explains how good the sample is = standard deviation between all the subsamples. low = representative subsamples are necessary to create standard error 105 Standard error the standard deviation of a statistical parameter resulting from resampling from the same population, is called standard error of the parameter the dispersion (i.e. the spread) of the values of the statistical parameter in repeated sampling is a measure of the precision of the estimation of the true population value 106 Standard error Standard error = the standard deviation of the means of k different subsamples 107 Standard error (example) sample x mean and s standard deviation population µ mean and σ standard deviation variation of the means population: N = 30000, µ = 0, σ = 1 3000 samples of size n = 10 distribution of the means average of the 3000 estimations of the population mean = 0.012 standard deviation of the 3000 estimations of the population mean = 0.315 108 = standard error of the mean Standard error (example) sample If you calculate the mean based on x mean and s standard deviation different sub samples of the population, the population standard deviation of these means is the µ mean and σ standard deviation standard error of the mean variation of the means population: N = 30000, µ = 0, σ = 1 600 samples of size n = 50 distribution of the means Average of the 600 estimations of the population mean = 0.008 standard deviation of the 3000 estimations of the population mean = 0.149 109 = standard error of the mean Standard error – a mathematical short cut! Usually we are not able to perform repeated sampling in order to get information about the dispersion of the statistical parameter. Hence, we will calculate it – even from a single experiment without repeats – from the available data. for datasets without subsamples s it can be shown that the standard error of sm = the mean can be estimated in an unbiased n way as follows. small if nbr of observations is high If we randomly take one sample out of the 1.02 3000 samples of size n = 10 (see previous sm = = 0.322 example), with a standard deviation s = 1.02 10 then the standard error sm or se = 0.322 All statistical parameters (means, standard deviations, correlation coefficients, regression coefficients, etc…) estimated from a sample are bound to a standard error They estimate the true population value of that parameter up to a certain precision only 110 4. Foundations of statistical inference 111 112 A thought experiment! Can licking a frog improve your exam results? A case study with Incilius alvarius 113 Note: the course titular cannot be held responsible for idiots who decide to test this hypothesis on themselves Why do we need statistics? Exam result Elias licked a frog and did well 114 Why do we need statistics? Exam result Elias licked a frog and did well Joren did not lick a frog and scored lower No frog licking = control 115 Why do we need statistics? Exam result No frog licking Frog licking = control 116 Why do we need statistics? Exam result No frog licking Frog licking = control 117 Why do we need statistics? We need to have enough frog lickers and controls to be sure that the response is representative of what is to be expected for students in general We need to be able to exclude the probability that this difference is meaningless e.g. it happened by accident because of the limited number of replicates we drew from the statistical population! Exam result Control 118 Why do we need statistics? Large variance in the groups less sure about the effect! (i.e. too much statistical noise) Exam result Exam result Control Control 119 Why do we need statistics? Small difference in the means of the groups less sure about the effect! (i.e. small effect size) Exam result Exam result Control Control 120 Why do we need statistics? Limited replications in each group less sure of the effect (i.e. too little power) Exam result Exam result Control Control 121 Why do we need statistics? Large variance in the groups less sure about the effect! (i.e. too much statistical noise) Small difference in the means of the groups less sure about the effect! (i.e. small effect size) Limited replications in each group less sure of the effect (i.e. too little power) Exam result Control 122 Probability – not all observations are equally likely! Caption:This photo represents the bell curve of women's and men's heights. It was created in 123 1994 by Linda Strausbaugh, professor of molecular and cell biology at the University of Connecticut. The normal distribution Gauss curve (x −m)2 − 1 2s 2 formulation y= e s 2π gauss curve - integrate it under a curve with a surface area of 1 - used for observations surface area 1 = easier to separate, 0,5 on each side from the mean can exist for any type of data inflection point inflection point percentage variable Johann Friedrich Gauss (1777-1855) One of the world’s most influential mathematicians (princeps mathematicorum) 124 Examples 125 Standard normal distribution Normal distribution with m = 0 and s = 1 You can approach this distribution by subtracting the mean from your data and dividing by the standard deviation (x − x ) standard deviation score (SDS) = z-score z= Surface area under the curve = 1 s N (0,1) observation with such a z Probability of having an score in your dataset easier to calculate probabilities or standardized tests z score or equivalent test statistic 126 surface under the standard normal curve is 1 Intelligence in humans N (100,15) Human IQ IQ in humans is (close to) normal, we attribute a value of 100 to the mean, a test is made so the SD is 15 What is the probability that there will be a human with an IQ higher than mine? 127 How likely is it that people are smarter than you? Actual IQ distribution Standard normal distribution observation with such a z N (0,1) Probability of having an score in your dataset N (100,15) Human IQ z score Surface under the must be calculated Surface under the curve = 1 Probabilities unknown Probabilities found in tables 128 How likely is it that people are smarter than you? Actual IQ distribution Standard normal distribution observation with such a z N (0,1) Probability of having an score in your dataset N (100,15) Human IQ z score Transform IQ to z scores by subtracting the mean and dividing by SD! (x − x ) z= 129 s Note: the exact probilities will also depend on how much replication you have! Assigning a probability to an observation! Table with values of the surface under the right tail of the Total surface area under the curve = 1 standard normal distribution = total probability Z score on the horizontal axis surface under the N (0,1) Curve = the probability of above (at the right of) a give encountering this Z score in the population z-value (x − x ) z= s second decimal place of z How likely is it that someone has a higher IQ than 120? Z = (120-100)/15 = 1,33 In the table we find for Z = 1.33, P = 0.09 If you have a 120 IQ, 9% of people are at least as intelligent as you! 130 Note: we’re going to use the same trick to check how likely a null hypothesis is! A quick recap! We have used the standard normal curve to check how likely an observation is in a population. We can also use the standard normal curve to check how likely a certain property is given a certain assumption (e.g. how likely is it that frog licking will improve your exam performance) Such a property is the test statistic: it captures how likely a certain effect is under the null hypothesis (i.e. the assumption there is no effect, there is no difference…). When the (absolute value of) a test statistic is large, H0 is less likely! If a high value of a test statistic is unlikely under the null hypothesis, the alternative hypothesis (“there is an effect”) is likely to be true 131 Part 4: Ready for launch Let’s start testing! 1. The t test family 1.1 The unpaired t-test for independent samples One categorical predictor with two levels and one continuous response 133 Test statistic A number made by statisticians that is large when the null hypothesis is unlikely and small when the null hypothesis is likely! A P value tells us how likely a test statistic is under the null hypothesis test -statistics Examples: t, F, chi² based on which table scientists used to have to look 134 regect very large and very small data cannot be negative Test statistic Test statistic measures the deviation of the null hypothesis (= no effect) The larger the absolute value of the test statistic, the less likely a null hypothesis is true Example: the test statistic t can be used to test the differences between the means of two groups. In the formula you see it is simply the difference between the means corrected for s and n in each group m1 − m2 s= low variance = the absolute t will be larger t= t distribution is comparable to standard normal distribution N(0,1) if 2 2 s1 s2 sample size is large (>30) so we can use the z score table to assess + n1 n2 probabilities! n- large = absolute t will be larger We use a probability distribution to assess the chance of finding a test statistic with a certain value, while assuming that H0 = true General structure of a test statistic: If you understand this fundamental principle you understand the principle of most commonly used statistical tests! It is similar for other commonly used test statistics (t, F, z,...) 135 Probability curve – z score table Table with values of the surface under the right tail of the standard normal distribution standard notmal distribution, mean zero, standard deviation of 1 - use table below surface under the N (0,1) above (at the right of) a give prob of finding t that is at least as large as the mean you have found (z?) z-value second decimal place of z 136 Example: cold tolerance in a grass The CO2 uptake of 12 plants was measured, half of them were chilled (cold shock) the night before the experiment, half were not. Source: the CO2 dataset from R’s datasets package Null hypothesis (H0): the treatment has no effect on CO2 uptake Alternative hypothesis (Ha): the treatment has an effect on CO2 uptake library(datasets) data(package = "datasets") data(CO2) Echinochloa crus-galli 137 Potvin, C., Lechowicz, M. J. and Tardif, S. (1990) “The statistical analysis of ecophysiological response curves obtained from experiments involving repeated measures”, Ecology, 71, 1389–1400. Cold tolerance in a grass It looks like there might be a difference, but how do we know if it is likely to be real? We create a test statistic and check whether it is likely under the null hypothesis. 138 data_msd % group_by(Treatment) %>% summarise_at(vars(uptake), list(mean = mean, sd = sd)) %>% as.data.frame() # Create a bar plot for means and standard deviations ggplot(data_msd, aes(x = Treatment, y = mean, fill = Treatment)) + geom_bar(stat = "identity", position = "dodge", color = "black") + geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), position = position_dodge(0.9), width = 0.2, color = "black") + labs(title = "Mean and Standard Deviation of CO2 Uptake by Treatment", x = "Treatment", y = "CO2 Uptake", fill = "Treatment") + theme(text = element_text(size = 14), # Set text size for all elements axis.title = element_text(size = 16), # Set axis title font size legend.title = element_text(size = 16), # Set legend title font size legend.text = element_text(size = 14)) # Set legend text font size 139 The t test statistic Large samples: n1 AND n2 > 30 (= ordinary t test for independent samples) m1 − m2 numerator = difference in the means = effect size! t= s12 s22 + denominator = error of the difference n1 n2 Small samples: n1 OR n2 < 30 (= the Student t test for independent samples) with really small sample, formula will change a bit (n1 −1)s1 + (n2 −1)s 22 2 m1 − m 2 t= with s= 1 1 n1 + n2 − 2 s + n1 n2 140 The t test statistic Large samples: n1 AND n2 > 30 m1 − m2 Effect size if large, t is larger! t= s12 s22 Variances if small, t is larger! + n1 n2 No. of replicates if large, t is larger! Test statistic if large, null hypothesis is less likely and there is probably a real effect 141 Note: formula for unpaired, independent t test Calculating a P value If we assume the plants don’t differ in their CO2 uptake, the mean and SD of the uptake will be the same: they have the same distribution. We can calculate the range of test statistics that is likely under the null hypothesis by simply ‘drawing’ virtual plants from this distribution, putting them in two groups and calculate the t test statistic (and repeating this) Chilled Control plants plants Draw n2 control plant CO2 Draw n1 chilled uptakes from m1 ± s²1 = m2 ± s²2 plant CO2 distribution uptakes from distribution We assume the plants have Calculate range of possible test statistics 142 similar CO2 uptake rates under the null hypothesis! Calculating a P value We also calculate the actual observed test statistic based on the CO2 uptake rates measured in our experiment in plants that were chilled and in a group of control plants that was not chilled. We are then going to compare the observed test statistic to the range of test statistics that can be found under the null hypothesis Chilled Control plants plants Draw n2 control plant CO2 Draw n1 chilled m2 ± s²2 uptakes from m1 ± s²1 plant CO2 distribution uptakes from distribution difference calculated between observed and null-hypothesis Calculate observed test statistic 143 in the dataset Likely test statistic when null hypothesis is true what test stat would be like under null hypthesis (mean = 0 and at center) when there is difference the prob of null hyp is decreasing. pick treshold, 5% = rejection zone area, Unlikely test Unlikely test statistic statistic when when H0 is true H0 is true Observed test statistic point which matches the 5% surface area Range of possible test statistics under the null hypothesis Note: we may decide to reject the null hypothesis if our test statistic from the experiment is more extreme than 95% of the test statistics that could be found under the null hypothesis Perform the independent t test in R t_test_result 1.96 n1 n2 1 s12 2 1 s22 2 + >> significant (p < 0.05) n1 − 1 n1 n2 − 1 n2 t t-Student t t-Student distribution distribution for n1 + n2 – 2 for df’ degrees of freedom degrees of freedom (Welch test) 174 Transformations Before applying any transformation, it's important to ensure that the data does not contain zero or negative values, as logarithms and square roots are undefined for such values. If your data contains zeros or negatives, you might need to add a constant to the data to make it positive before applying the transformation. After transforming the data, you can then assess the normality of the transformed values using the methods mentioned earlier (e.g., Shapiro-Wilk test, histograms, QQ plots). Keep in mind that while transformations can help achieve approximate normality, they might not always result in perfectly normal data. Additionally, transformations should be chosen based on the scientific context and the nature of the data you're working with. shift the data = add small number to all of the values 175 A histogram 176 A histogram of log transformed data Log transformation does not lead to a more normal distribution 177 Skewness (symmetry) symmetrical median = skewness = positive skewness X² or X³ median = transformation skewness = negative skewness Log or SQRT transformation median = skewness = 178 Problematic distributions Bimodal and multimodal distributions, cannot be normalised using transformations! Check for potential underlying error! E.g. Were these data measured in two species instead of one, or these males and females? bimodel distribution problematic bc mean will not be representative can data be separated into two? non parametric tests Frequency 179 Body size (mm) 1.5 Non parametric alternative for a t test The Mann-Whitney Wilcoxon test 180 Parametric statistics Data adhere to an ‘idealistic’ probability distribution which has parameters (such as mean and SD). This adherence is reflected in assumptions - the data or residuals have to be normally distributed - variances in groups have to be equal -... If these assumptions are met then parameteric methods generally are much more powerful than non parametric methods. Examples: Pearsson correlation, ANOVA, GLM 181 Non parametric statistics Non parametric statistical methods make no assumptions about the distribution of data But usually less powerful than parametric methods Examples: Spearman correlation, Kruskal Wallis test, Mann Whitney Wilcoxon test Flexible non paramatric modules such as GLM are lacking. 182 Non parametric alternatives There are only simple non parametric alternatives for relatively simple parametric techniques such as correlation, t-tests, ANOVA not for ANCOVA, multiple regression. Complex alternatives are being developed for more complex models. If parametric assumptions are met you can choose which technique to use (parametric or nonparametric). In that case you chould choose the parametric technique because it has more statistical power to detect patterns! Transformation to ranks results in loss of information and lower power 183 Mann – Whitney Wilcoxon test Also known as Wilcoxon test or Mann Whitney U test (MWW test). Non parametric equivalent of the unpaired t-test Allows to test differences in a response variable between two groups without having to know the underlying distribution The test assumes that one distribution is a horizontal shift of the other distribution Group A Group B main issue- main assumption that the distribution is the same in both groups 184 Mann – Whitney Wilcoxon test Null hypothesis H0: Group A = Group B distribution is the same 185 Mann – Whitney Wilcoxon test uses ranks and not original data test is performed on the ranks As an example we can take a sample from both groups We will sort the data but note from which group each observation is derived Then we can replace the data values by their rank! 186 1. Mann – Whitney Wilcoxon test We can sum all the ranks from group A W = 1 + 3 + 4 + 7 = 15 If some values are present more than once a mean rank should be assigned! Rank 1 2.5 2.5 4 5 6 Value 3.4 5.5 5.5 7.5 8 9 This test statistic U is going to be large if all the values of A are more to the right from values from group B and vice versa. We will reject H0 if W is too large or too small. P value is not calculated using the classic t distribution. 187 CO2 uptake data wilcox.test(chilled_data$uptake, nonchilled_data$uptake) wilcox.test(chilled_data$uptake, nonchilled_data$uptake) Wilcoxon rank sum test with continuity correction data: chilled_data$uptake and nonchilled_data$uptake W = 576.5, p-value = 0.006358 alternative hypothesis: true location shift is not equal to 0 Note: it is also possible to run this test based on the similar U statistic which is a slight variation of W 188 Conclusions: A P value gives an indication of how likely a certain effect would be under the null hypothesis. Specifically, it gives the total probability of a t test statistic at least as extreme as the one that you found in your dataset – given that there is actually no effect (H0). A P value is small when the effect is consistent, meaning the difference is found consistently across your replicates. However it does not tell you how large an effect is. For this, you have to look at estimates of effect size (see later)! For a t test a good measure of effect size is simply the difference between the means of the two groups! Assumptions of tests are important, when violated you typically cannot use this test. Try to transform first, then use a non parametric alternative 189 Take home Statistics are necessary to make sense of data But the design of the experiment and the quality of the data are absolutely essential to do good science “To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of” Ronald Fisher 190 Take home It is important to think a priori about whether your hypothesis is clearly directional or whether both options (pos. or neg. effect) are possible Picking a one sided test over a two sided test (when justified) can make the difference between a significant and a non significant result! The standard option in statistical tests is typically two sided. 191 Exam questions Explain the differences between science, pseudoscience and superstition? Give a biological example each time. Can you explain exact what het purpose is of statistics? What is the differences between a SD and SE and how is it derived? What is the difference between a one and two sided test? Give a clear biological example for each of these tests. When applied on the same data, will a one sided test more likely yield a signficant result than a two sided test. 192 Exam questions Give the exact definition of a P value, explain how you derive it from a probability distribution. What does that probablity curve represent? What is a test statistic and what does it measure? Does a low P value always indicate a large effect? What are your options when assumptions of the t test are violated? Is two-sidedness specific for a t test are is this also possible for e.g. ANOVA which relies on the F statistic. What is the difference between a type I and a type II error? Which of the two is ‘worse’? Why do you think we chose an alpha of 5% as a standard? What was Aristotle’s main contribution to scientific thinking? What is the difference between a variable, a factor a parameter, a level, a coefficient, a case... What would be the downside of relying on a stricter alpha when performing significance tests? 193 Exam questions When both are possible, which test is more likely going to generate significant results: a one sided or a two sided test? What is the general principle of a test statistic and how can we make statements about whether the patterns we observed are indeed ‘real’ or due to chance? What is the difference between a standard deviation and a standard error? How is it possible that most analyses in published literature report standard errors even though an experiment or survey has only been performed once? What are residuals and what type of information do they provide? What is the difference between science, pseudo science and superstition Explain different sources of evidence scientists can use to verify theories? 194 Extra information 195 Measurement error Measuring variables in nature always results in error Accuracy: is the degree of closeness of a measurement or observation to its actual (true) value. Difficult to estimate if the true value is unknown High accuracy, low precision Precision: is the degree to which repeated measurements under identical conditions show the same (reproducibility or repeatability) Validity: Combination of accuracy and precision. Does an instrument actually measure what it is supposed to measure. Reliability: can an instrument be interpreted consistently across different situations. Exercise: apply these concepts to a device that measures chlorophyll in water. Low accuracy, high precision 196 Measurement error 197 Types of variables – an alternative classification Type Values Synonym Examples Categorical variable non-numerical data qualitative variable gender, civil status, classification features attribute religion, birth place, sometimes binary nominal variable species, type of medication Continuous variable numerical data scale variable length, weight, volume, result of a measurement surface, temperature, pH, real numbers time, angles, … infinite # values Discrete variable numerical data discontinous variabele number of children in a result of counting family, number of integer numbers segments in a worm, finite # values number of deciduous teeth, … Ordinal variable limited number of values ordinal scale variable social class with a natural ordening levels of aggressivity differences between values educational level have no meaning political parties from left to right * Note: ordinal variable can be considered an –in between type- of a categorical and continuous variable 198 Probability curve – z score table P1 --> z = -2.3263 surface under the P2.5 --> z = -1.9600 N (0,1) to the right of a P3 --> z = -1.8808 given z-value P5 --> z = -1.6449 P10 --> z = -1.2816 P25 --> z = -0.6745 P50 --> z=0 P75 --> z = +0.6745 P90 --> z = +1.2816 P95 --> z = +1.6449 P97 --> z = +1.8808 We can use the standard normal curve (surface area = 1) as P97.5 --> z = +1.9600 a probability curve. The surface to the right of z is the P99 --> z = +2.3263 cumulative probability of measuring a z at least as extreme as z. second decimal place of z z-value above which 25% of the data are located 199 Null hypothesis and principle of hypothesis testing In science we have to formulate and test hypotheses examples: males are larger than females administration of a particular drug will lower people's cholesterol Global temperature is higher now than 20 years ago there are inter-specific differences in anti predator behaviour the null hypothesis (H0) typically proposes a general or default position, such as there is no difference there is no effect there is no association there is no increase or decrease hypothesis testing works by collecting data and measuring how probable the outcome is, assuming that the null hypothesis is true if the outcome is very improbable (usually defined as occurring less than 5% of the time), then the experimenter concludes that the null hypothesis is false, i.e. there is a significant deviation from the null hypothesis 200 13. Type I vs. Type II errors A very strict conservative test will not easily reject the null hypothesis, this means it has a low risk of Type I errors (e.g. Shapiro Wilk test)... but a higher risk of Type II errors. A more liberal test will quite readily reject the null hypothesis and has a higher risk of Type I errors (e.g. Kolmogorov Smirnov test)... but a lower risk of type II errors. There is a tradeoff between the two errors! You can’t optimise both When statisticians develop a new test they investigate whether the test has a desirable Type I / Type II error ratio. The power of a test (1 - beta) can be calculated a priori (but only if you already have data)! 201 13. Type I vs. Type II errors Example: Type I error: is the probability of rejecting a null hypothesis that is true (= α); a false positive. e.g. You say a condor is sick but he is healthy Lead (Pb) concentration in the blood of healthy Californian condors is normally distributed with a mean of 180 and a SD of 20. A. Given that condors with Pb levels over 225 are diagnosed as not healthy; what is the probability of a type one error? (i.e. considering a healthy condor as sick) z=(225-180)/20=2.25; the corresponding tail area is 0.0122 = 1,2% probability of a type I error. B. At what level (in excess of 180) should condors be diagnosed as not healthy if you want the probability of a type one error to be 2%? 2% in the tail corresponds to a z-score of 2.05; 2.05 × 20 = 41; 180 + 41 = 221. Lead poisoning via ingested bullets from 202 animal carcasses is a health threat in the endangered Californian condor 13. Type I vs. Type II errors Example: Type II error: is the probability of accepting a null hypothesis that is false (= β), a false negative e.g. You say a condor is healthy but he is sick Given that sick condors have a mean Pb level of 300 with a standard deviation of 30 (= Ha). A. If only condors with a Pb level over 225 are diagnosed as predisposed to get sick, what is the probability of a type II error (accepting H0 = the condor is healthy, while he is sick). z=(225-300)/30=-2.5 which corresponds to a tail area of 0.0062 = 0.6% which is the probability of a type II error. B. above which Pb level should you diagnose condors as sick if you want the probability of a type II error to be 1%? 1% in the tail corresponds to a z-score of 2.33 (or - 2.33); -2.33 × 30 = -70; 300 - 70 = 230. 203 Extra background: the P value The p-value is a fundamental concept in hypothesis testing and statistical inference. The probability of observing a test statistic as extreme as or more extreme than the one calculated from the sample data, assuming that the null hypothesis is true. In other words, the p-value helps you determine whether the observed data is consistent with a specific hypothesis or if it deviates significantly from what you would expect under that hypothesis. The answer to your hypothesis is captured by the test statistic and the P value is derived from it 204 Extra background: the P value By focusing on the probability of values more extreme than the observed one, the p-value helps you assess how surprising or unusual your data is, assuming that the null hypothesis is true. If the observed data falls in the tail of the distribution (i.e., the extreme values), it suggests that the null hypothesis might not be a good explanation for the observed data, and you might have evidence to support the alternative hypothesis. 205 Extra background: the P value These are some incorrect definitions invented by students: The chance that your results are random The probability that your result is real A number – which when smaller than 5% - means that you can reject the null hypothesis The probability that what you find is due to random processes It is a critical threshold that we use to determine whether something is significant. The probability distribution gives you the mean and the SD of your result. If you think you invent your own definition of a P value, this is a recipe for disaster. Not knowing in great deal what this property is, how it is actually defined and derived is a mortal sin on the exam. 206 P values – historically provided in tables Surface under the N (0,1) above (at the right of) a give z-value 207 t-Student one-sample test: m < > µ Comparison of sample mean m with the population value, or theoretical value μ Each time, there is only 1 set of data for which the mean value has to be compared to a reference value = "one-sample test" Examples: Comparison of the mean blood pressure of a group of athletes with the population average. Comparison of the mean Pb-level in fish, exposed to Pb pollution, with a given threshold value. Comparison of the mean radioactivity, measured with a cheap geigerteller with the real value measured with a high precision device. 208 t-Student one-sample test: m < > µ Comparison of sample mean m with the population value, or theoretical value μ If H0 is true, i.e. that the sample with mean m is taken from a population with mean μ, then the statistical quantity for a “one-sample” test is as follows: m−µ t= s n numerator: Expresses the difference between the sample mean and the theoretical value denominator: Is the error of the difference in the numerator. Since μ is a theoretical value, the error of μ is supposed to be zero and only the error of m has to be considered For large samples (n > 30) Under H0 the quantity t has a standard normal distribution N(0,1) table For small samples (n ≤ 30) Under H0 the quantity t has a t-Student distribution with ν = n – 1 degrees of freedom t-Student table 209 Statistical hypothesis testing - example data: height of 50 adult male gorillas in Virunga National Park (in cm) parameters: m = 177.5 s = 5.0 n = 50 hypothesis: The mean stature of male gorillas in this Park is higher than the average height reported for gorillas (µ = 176.0) H0: there is no difference in height between male gorillas in the park and the general population 210 Are male gorillas in the Virunga National Park unusually large? Statistical hypothesis testing - example m−µ 177.5 − 176.0 The test statistic for this test is t= t= = 2.12 s n 5.0 50 This t-value expresses the difference between m and µ, taking into account the standard error of m It can be shown that under the given conditions of the example the t-values are distributed according to the standard normal distribution N(0,1) so we assume t = z From the table of N(0,1) we see that right from z = 2.12 the surface under N(0,1) is 0.01700 (or 1.7%) Adopting an a priori defined significance level of α = 0.05 (or equivalently 5%) For t > 2.12 Surface area = 0.017 1.7 % < 5 % therefore, we reject H0, and consider the difference between m and µ statistically significant at the alpha level of 5% 211 z = 2.12 One sided testing one-sided rejection region for α = 5% A one-sided test is a statistical hypothesis test in which the values for which we can H0 H0 reject the null hypothesis are located in accepted rejected one tail of the probability distribution Also referred to as a one-tailed test of significance critical or hypothesis is "bigger/smaller", rejection acceptance region region "increasing/decreasing", … In a one-sided test, there is an a priori deviation under H0 Test statistic hypothesis concerning the direction of the difference Critical test statistic for significance cut off Examples: corresponds to a surface area of α percent at one tail is the sample mean larger than a of the probability distribution theoretical value Are males larger than females? Is their a significant decrease in a trend? 212 Two-sided testing two-side rejection region for α = 5% A two-sided test is a statistical hypothesis H0 H0 H0 test in which the values for which we can rejected accepted rejected reject the null hypothesis are located in both tails of the probability distribution A two-sided test is also referred to as a critical or critical or two-tailed test of significance rejection acceptance rejection region region region Possible hypotheses are: there is "no difference", "no relation", … (without assumptions about the direction of the Test statistic deviation under H0 difference) Critical test statistics for significance cut off In a two-sided test, there is no a priori correspond to surface areas of hypothesis concerning the direction of the difference (most tests are two sided!) α/2 percent at both tails e.g. Is the size of males different from the of the probability distribution size of females. 213 Comparison of 2 means m1 and m2 in independent samples Small samples: n1 OR n2 < 30 (t Student test) m1 − m 2 (n1 −1)s12 + (n2 −1)s 22 t= with s= 1 1 n1 + n2 − 2 s + n1 n2 under H0 the statistical quantity t has a t-Student distribution if │t│ > critical value at n1 + n2 – 2 degrees of freedom, then there is less than 5% chance that the difference from H0 is due to randomness H0 is rejected and it is concluded that the difference between m1 and m2 is significant at the 5% level Note: for small sampling sizes, R automatically uses a slightly different t distribution, this is called the Student t distribution. 214 t Student distribution vs. standard normal distribution tails of the t-distribution are thicker and extend out further than those of the Z distribution. This indicates that for a given confidence level, t- scores are larger than Z scores. As n increases in size, the shape of the t- distribution begins to resemble a normal distribution and the t-scores become smaller. As n approaches 30, the t-score associated with a given confidence level approaches the Z score for that confidence level. At large sample sizes the standard normal Z distribution is a good approximation for t t Student distribution is useful for small sample sizes! 215 One sided percentage t- Student distribution Two sided percentage df m1 − m 2 t= 1 1 s + n1 n2 William Sealy Gossett (1876-1937) Employee of the Guinness Brewery Developed a cheap way to monitor the quality of stout Published under the name Student 216 Background: principle of the MWW test: Ranking the Data: for each sample, combine the data from both groups and rank them in ascending order, regardless of which group they belong to. Ties (equal values) get the average rank. Calculating the Sum of Ranks: for each group, which is used to calculate the Mann-Whitney W statistic. Calculating the W Statistic: The W statistic is calculated as the smaller of the two sums of ranks. It represents the number of times a randomly selected observation from the first group has a smaller rank than a randomly selected observation from the second group. Calculating Expected W under Null Hypothesis: If the null hypothesis is true (both groups come from the same population or have the same distribution), the W statistic is expected to be approximately equal for both groups. The expected value of W can be calculated under this assumption. Calculating the Test Statistic: The test statistic is calculated by comparing the observed W statistic to the expected W under the null hypothesis. The test statistic follows an approximate normal distribution for large sample sizes, allowing for the calculation of a p-value. Interpreting the P-value: The calculated p-value represents the probability of obtaining the observed W statistic (or a more extreme value) under the null hypothesis. If the p-value is small (typically less than a chosen significance level, such as 0.05), the null hypothesis is rejected, indicating that there is evidence of a significant difference between the two groups. 217