ENV330 Lectures on Experimental Design PDF

Summary

These lecture notes cover experimental design methodology, including various types of designs, factors, and levels. Topics include completely randomized designs, randomized block designs, matched pairs, and more complex designs like split-plots. The notes provide examples and explanations of different design approaches. Key concepts like blocking and covariates are discussed in detail.

Full Transcript

ENV330 LECTURE 3 MORE ON BASIC EXPERIMENTAL DESIGNS TYPES OF EXPERIMENTAL DESIGNS 1. Is the predictor variable(s) consciously altered? (Manipulative vs. Observational) 2. Are experimental units randomly assigned to treatments? 3. What data will we have about the responses and explana...

ENV330 LECTURE 3 MORE ON BASIC EXPERIMENTAL DESIGNS TYPES OF EXPERIMENTAL DESIGNS 1. Is the predictor variable(s) consciously altered? (Manipulative vs. Observational) 2. Are experimental units randomly assigned to treatments? 3. What data will we have about the responses and explanatory variables? (e.g., categorical factors; quantitative responses, etc.) The characteristics determine both the general analytical approach, and the types of inferences that can be drawn RECALL: 5 COMPONENTS OF (MOST) EXPERIMENTAL DESIGNS  controlled conditions  experimental controls  randomization  replication  blocking  completely randomized designs (1 factor)  randomized block designs (1 factor) CONTROLLED  pairs designs (1 factors) EXPERIMENTS  2+ factor designs WITH  subsets & spatial arrays CATEGORICAL EXPLANATORY  randomized blocks VARIABLES  Latin square (FACTORS)  split plot  covariates  (repeated measures) 1. COMPLETELY RANDOMIZED DESIGN (FACTOR = DIET) half randomly assigned to Diet A (control) compare means & variance half randomly assigned to Diet B what if we suspect that the sex of the mouse might affect the results? 2. RANDOMIZED BLOCK DESIGN males (FACTOR = DIET; BLOCK = SEX) Diet A compare means & variance Diet B sex = blocking variable females Diet A compare means & variance Diet B MATCHED PAIRS ARE BLOCKED DESIGNS Diet A Diet B compare results  before/after comparison (BACI) using the same experimental units  can be thought of as a type of blocking – each individual is a “block” receiving 2 treatments (diet A and diet B)  controls for individual variability BACI (BEFORE, AFTER, CONTROL, IMPACT) DESIGNS EXPERIMENTAL VS BLOCKING FACTORS  Experimental Factors: factors we IMPOSE on the experimental units to determine causal effects of explanatory variables on our variable of interest  Blocking Factors: factors that affect experimental units and contribute to variability among them -- things we try to control for in our experimental design FACTORS VS LEVELS  Factor = categorical variable that the experimenter manipulates  Level = the range of categories within a factor  e.g., the depicted experiment: 2 factors (water treatment; inoculation), with 2 levels each (water: stress/well-watered; inoculation: inoculated/non- inoculated) and 1 blocking factor (sex) MULTIPLE EFFECTS: POTENTIAL OUTCOMES Suppose you test the effects of 2 factors on a variable. What kind of results can you have?  Neither factor has an effect.  Factor 1 has an effect; Factor 2 doesn’t.  Factor 1 doesn’t have an effect’ Factor 2 does.  Both factors have an additive effect.  Both factors have an interactive effect MULTIPLE EFFECTS: POTENTIAL OUTCOMES  Suppose we want to look at the effect Diet of diet AND exercise on the expression of some marker (Marker X) of diabetes Marker X in mice  Two different diets Exercise  Two different exercise regimes MULTIPLE EFFECTS: POTENTIAL OUTCOMES  Neither diet (A & B) nor exercise (30 min/day on exercise wheel vs. no exercise) had an effect on marker X.  Diet B lowered marker X compared to Diet A, but exercise had no effect.  Exercise lowered marker X compared to no exercise, but diet had no effect.  Mice that exercised AND ate diet B had lowered marker X compared to all other categories of mice. (ADDITIVE)  Mice on Diet B showed lowered marker X if they exercised, but mice on Diet A showed lower markers of they did NOT exercise. (INTERACTION) Neither diet nor Diet has an exercise has an effect effect Diet A Diet B No Ex Ex No Ex Ex Exercise has an effect Diet & exercise have additive effects No Ex Ex No Ex Ex Diet and exercise have interactive effects Diet A Diet B No Ex Ex “Mice on Diet A showed higher expression of marker X with exercise; however, this pattern was reversed in mice on Diet B (Fig 1).”  completely randomized designs (1 factor)  randomized block designs (1 factor) CONTROLLED  pairs designs (1 factors) EXPERIMENTS  2+ factor designs WITH  spatial arrays CATEGORICAL EXPLANATORY  randomized blocks VARIABLES  Latin square (FACTORS)  split plot  covariates  (repeated measures) SPATIAL ARRAYS: RANDOMIZED BLOCKS  If our experimental units are spatially distributed, we use blocking to account for spatial heterogeneity  Suppose you are studying the effects of fertilizer on seed production in 2 types of rice plant variety Experimental unit is an AREA seed production fertilizer SPATIAL ARRAYS: RANDOMIZED BLOCKS  2 factors: variety (A) & fertilizer (B)  Factor A: 2 levels; Factor B: 3 levels  6 combinations need to be tested:  A1B1 plant variety A1 & A2  A1B2  A1B3 seed production  A2B1  A2B2 fertilizer B1, B2, B3  A2B3 SPATIAL ARRAYS: RANDOMIZED BLOCKS  2 factors: variety (A) & fertilizer (B)  Factor A: 2 levels; Factor B: 3 levels  6 combinations need to be tested plant variety A1 & A2 pretty heterogeneous! seed production fertilizer B1, B2, B3 SPATIAL ARRAYS: RANDOMIZED BLOCKS A1B1 A2B1 A1B2 A1B3 A2B3 A2B2 A1B2 A1B3 A1B1 A2B2 A2B1 A2B3 A2B2 A1B1 A2B1 A1B2 A2B3 A1B3 A1B3 A2B3 A1B2 A2B1 A1B1 A2B2  4 blocks in which every treatment is randomly allocated  randomized block = spatial unit that includes all treatments & represents one replicate of each of those treatments RCBD = RANDOMIZED COMPLETE BLOCK DESIGN  All treatments are assigned once within each block = randomized complete block design (RCBD)  are placed to ensure more variation AMONG blocks than WITHIN blocks  should be small enough to ensure that they are relatively homogenous  should be large enough to ensure that the treatments within blocks are well separated in space  this is applicable to LOTS of designs (not just in a semi-natural setting)  imagine a greenhouse – temperature, sun exposure is not completely uniform A1B1 A2B1 A1B2 A1B3 A2B3 A2B2 A1B2 A1B3 A1B1 A2B2 A2B1 A2B3  align your blocks ACROSS a gradient so that they are A2B2 A1B1 A2B1 A1B2 A2B3 A1B3 INTERNALLY HOMOGENOUS A1B3 A2B3 A1B2 A2B1 A1B1 A2B2 correct incorrect  not always unidirectional and or linear gradients – whenever you have a spatial area that you think is internally homogenous AND different from other internally homogenous areas, those should be blocks VARIATIONS: LATIN SQUARE  every row has all the treatments; every column has all the treatments  has to be square grid  works well if you think you have a couple of gradients LATIN SQUARE one gradient second gradient SPLIT-PLOTS DESIGNS  Usually a 2-factor design  One of the factors is “easy” to change or vary  One of the factors is “hard” to change or vary  Suppose we have 4 fields and want to look at the effects of irrigation & fertilizer on crop yield: WHICH DESIGN MAKES MORE SENSE? A = Irrigation (Method 1 vs Method 2) B = Fertilizer (Fertilizer 1 vs. Fertilizer 2) OR A = Fertilizer (Fertilizer 1 vs. Fertilizer 2) B = Irrigation (Method 1 vs Method 2) Suppose you want to study the effects of different levels of CO2 and different soil temperatures on plant growth… how would you set up your greenhouses?  2 factors – CO2 level (ambient or enhanced) and soil temperatures (ambient, heated to 25° and heated to 30°)  “hard” factor = CO2 level  “easy” factor = soil temps  Hard factor = main (whole) plot  Easy factor = split plot split-plot designs can be complex! BLOCKS CAN BE ANY FACTOR THAT SHOWS INTERNAL HOMOGENEITY  plots of habitat  a room in a greenhouse  group of aquaria tanks  day of the week  litter or nest mates  measurements taken on instrument X  data collected by technician Y statistically, the variation AMONG blocks is removed from the experimental error term in an ANOVA, increasing the precision of the experiment THE VALUE OF BLOCKING: SUMMARY  controls for heterogeneity  “block what you can; randomize what you cannot.”  (RCBD) is a better design than a completely randomized design (CRD) if there is more heterogeneity among blocks than within blocks  RCBD requires fewer replicates to have the same power as a CRD (Why?? Less variability)  completely randomized designs (1 factor)  randomized block designs (1 factor) CONTROLLED  pairs designs (1 factors) EXPERIMENTS  2+ factor designs WITH  spatial arrays CATEGORICAL EXPLANATORY  randomized blocks VARIABLES  Latin square (FACTORS)  split plot  covariates  (repeated measures) COVARIATE  a “nuisance” variable that is a continuous (numeric) variable – NOT a factor (recall, factors have LEVELS)  doesn’t vary in a way that makes it amenable to blocking (not in neat, homogeneous groups)  e.g., organic matter in soil – might vary all over this field; may influence our fertilizer results Covariates statistically control for individual variation  suppose your response variable is % of leaf covered by tar spot  % coverage might be correlated with leaf size (the smaller the leaf, the more likely that a single tar spot takes up a large proportion of the leaf)  instead, “size of diseased area” is the dependent variable, and size of non- diseased area can be one of the predictor variables Covariates are useful in controlling for “starting conditions”  e.g., testing two different exercise regimes for building muscle mass  initial body composition likely influences how much muscle you can gain in a month  can treat initial muscle mass as a covariate REPEATED MEASURES DESIGNS  we often sample the same individual, same site, etc. over time  statistically awkward to deal with  problem with autocorrelation (whatever the measurement was last time is going to influence the measurement this time – are the samples independent?)  we’ll talk about this later!  Data structures are often hierarchical – fish in lakes – trees in forests – wolves in packs – streams in watersheds – birds in nests Statistical Models The dependent variable is some function of one or more independent variables (factors and/or covariates) Models are mathematical representations of our hypotheses Independent variables (predictors; causes; hypotheses) X Y Factors Covariates –Independent Factors –Independent Covariates –Control Factors –Control Covariates –Blocking Variables X2 X1 Y 3 types of factors: – Independent Factors: our hypothesis (causal relationship with dependent variable) – Control Factors: variables (of interest) that we need to control to test the hypothesis (confounding variables) – statistically, we are interested in their coefficients – Blocking Factors: variables we need to control for as well, but their values are “arbitrary” and not of interest crop example: Crop Yield = dependent variable Crop Variety = independent factor Soil Nitrogen = control covariate Farm = blocking variable Crop crop example: Yield Crop Yield = dependent variable Crop Variety = independent factor Soil Nitrogen = control covariate Variety Soil N Farm = blocking variable Independent Control Variables Blocking Variables Variables The causal variable A variable that needs to be A factor that you need to in your hypothesis; controlled so that you can properly control for, but you might the “variable of see the effect of the independent not be interested in the interest”; the focus variable specific estimates of the of your hypothesis parameters you are interested in specific can be a continuous estimates of the parameters – it’s measured values could be variable, or a factor not your main hypothesis, but you swapped for other values, with levels (ordinal still might be interested in and you’d still be testing the or nominal) quantifying the effects of a control same hypothesis variable on your dependent variable can only be a factor can be a covariate or a factor FIXED FIXED FIXED OR RANDOM FIXED & RANDOM EFFECTS Fixed factors: the factor levels are informative and are chosen by the investigator specifically because they have a unique and important meaning. Depending on the study, these COULD be:  “Male” and “female”  “Predator” and “prey”  “Drug A,” “Drug B,” and “Drug C”  “Control” and “treatment”  “Before” and “after”  variants on a predictor, for example, “high,” “medium,” and “low” FIXED & RANDOM EFFECTS Random factors: the selected factor levels often can be considered a subset from a population of levels, and the factor levels are not specifically important. (We are unlikely to care about differences among factor levels).  Time intervals in a time sequence  A collection of seed types (or some other levels) chosen randomly from a population of seed types (or some other population)  Subjects on whom repeated measurements are made Whether a factor is fixed or random depends on the inference you wish to draw Study: the effects of PFOS on survival and reproduction in aquatic insects  I hypothesize that this pollutant will have a negative  I hypothesize that this pollutant will have a negative effect on many types of benthic organisms. I want effect on many types of benthic organisms. There to look at more than one species, to make sure I’m are some species I am specifically interested in – for not accidentally picking a super-sensitive or super- example, I want to know if Hexagenia is more resistant species, but I don’t particularly care which sensitive than Chironomus; there’s a few other ones I use. If I repeated the experiment, I might pick comparisons I would like to quantify. If I redid the different species; I just want a representation of experiment, I’d still pick the same species, because I more than one type of insect in this study. I’ll treat am specifically interested in them. I’ll treat species species as a RANDOM factor. as a FIXED factor. Fixed Factors Random Factors You are specifically interested in the You are trying control for non-independence from effects of these factors – these are part a nested or hierarchical structure of your causal hypothesis (the point of the study) The categories/ranges are a “complete The categories are a subset of things that could be set” of what you want to draw sampled; they are not exhaustive inferences about for this study If you repeated the experiment, you If you repeated the experiment, you wouldn’t would use the exact same factors and necessarily use the same factors/ranges levels/ranges It doesn’t matter how many levels are in Ideally, random factors should have several levels – each factor (at least two, or it’s not a you are estimating your “mean slope” for your fixed variable ) effects from these random factors, so the more you have, the better (rule of thumb: at least 5) SUMMARY:  Most hypothesis testing seeks to establish causal relationships between independent variables and dependent variables  To understand the causal relationships, we need to design experiments that allow us to detect the specific contribution of each variable to the response (resolve confounding relationships)  Blocking is a way of controlling for “background” heterogeneity (confounding variables that we are not specifically interested in modeling)  Statistical inference is improved if we correctly designate explanatory variables as random or fixed effects ENV330 LECTURE 2 FUNDAMENTALS OF EXPERIMENTAL DESIGN INTRO TO EXPERIMENTAL DESIGN  what is an experiment?  basic experimental designs RECALL: CAUSALITY  understanding causal relationships is a main goal of scientific inquiry  Lurking variables: when X appears to influence Y, but only because both X & Y are influenced by Z (no real relationship between X & Y) RECALL:  Confounding variables: X influences Y, Z influences Y, LURKING & and Z & X are ALSO causally linked CONFOUNDING  Collinear variables: Z and X influence Y, there is a causal VARIABLES link between Z & X, but Z and X cannot be separated from each other either statistically or experimentally (e.g. you cannot hold Z constant while X varies) (spurious relationships – completely random “noise” that looks like a signal) RECALL: LURKING & CONFOUNDING how would you diagram these VARIABLES relationships? 3 LEVELS OF CAUSALITY  Association  Intervention  Counterfactuals  Association: seeing patterns  Y is correlated with X  P(Y|X) 3 LEVELS OF  Intervention: “What if…” CAUSALITY  manipulation, intervention, experimentation  P(Y|(do)X)  Counterfactuals: “What if X HADN’T….”  thought experiments WHAT IS AN EXPERIMENT?  Wikipedia: “a procedure carried out to support, refute, or validate a hypothesis.  Experiments provide insight into cause- and-effect by demonstrating what outcome occurs when a particular factor is manipulated.” WHAT IS A CONTROLLED OR PLANNED EXPERIMENT?  carefully controlled conditions  able to control factors that might be confounding variables  conducted in a lab, a greenhouse, mesocosm, etc. (environmental realism?) 5 COMPONENTS OF (MOST) EXPERIMENTAL DESIGNS:  controlled conditions  experimental controls  randomization  replication  blocking 1. CONTROLLED CONDITIONS  ideally, only conditions of interest differ among experimental units  e.g. if testing the effects of fertilizers, use same soil same pots, same light, same water, same temperature, same plant variety, etc. etc. CONTROLLING CONDITIONS  Variable controls: try to reduce variation – even if its affecting all the treatments equally!  e.g. build a fence structure to minimize fluctuations in wind affecting an entire experimental plot  Procedural controls: reduce measurement error (e.g., taking measurements twice and averaging the reading) Early test of hypothesis Extending hypothesis to “Real world” “Proof of Concept” the real world Multiple effects, Isolated, contained can’t control as many interactions, Simple design variables, but is more contingencies (weather, Highly controlled reflective of “real life” “year effects”) not very realistic Multiple effects, least control interactions logistically complex greatest realism  often move left to right as we know things about the system 2. EXPERIMENTAL CONTROLS  experimental units that DON’T receive the treatment of interest  identical to treated units EXCEPT for treatment  compared to treated units  experimental unit = smallest entity to which a treatment is applied 1. CONTROLS HELP ESTABLISH CAUSALITY  P(Y|(do)X) + = vs. + =  negative controls: a group that experiences a treatment that ISN’T expected to produce results (in most cases, receives NO treatment) POSITIVE &  positive controls: a group that experiences a treatment NEGATIVE that is KNOWN to produce results that are similar to CONTROLS those predicted by your experimental hypothesis.  (placebo: false treatment (the subject THINKS a treatment was applied) – makes sense in the context of behaviourally influenced results, perceptions etc.) negative control positive control treatment carrier solution only carrier solution carrier solution + + known fertilizer test fertilizer 2. RANDOMIZATION  “evens out” the effects of confounding variables  avoids bias in the data  2 main types:  random sampling  random assignment RANDOM ASSIGNMENT:  how subjects are assigned by researcher to treatments  because individuals show variability, random assignment ensures that different characteristics are represented (equally) in the treatment and control groups 4. REPLICATION:  independent units of a treatment treated identically  necessary for the response to be representative of the population of interest REPLICATION  How many replicates?  absolute minimum = 3 (need to express some variation around the mean!)  need to know something about the variation in the system (how variable?? what kind of distribution??)  pilot study/power analysis is useful here! 5. BLOCKING  If there are variables that are known or suspected to affect the response variable, group experimental units into blocks based on these variables, and randomize units within each block to treatment groups.  e.g., consider an experiment that looks at the effect of study environment (library, one’s own room, outdoors) on academic performance:  one approach is a completely randomized design: recruit volunteers, randomly assign to 1 of 3 treatments  But what if we suspect that gender might have an effect on the results? SPATIAL BLOCKING:  this field affected by slope aspect, drainage, etc.  create internally homogenous blocks and randomly apply treatments within BLOCKING VS EXPLANATORY VARIABLES  Explanatory variables (factors) are conditions we impose on our experimental units (e.g., fertilizer, where people study, etc.)  Blocking variables are confounding) characteristics that the experimental units come with (e.g., slope and aspect; gender), that we would like to control for  Blocking is like stratified sampling (except in an experimental setting) where we make decisions about our random assignments to treatment groups (as opposed to completely random sampling) ARE UNPLANNED EXPERIMENTS REALLY EXPERIMENTS?  unplanned, observational, mensurative “experiments” – are they really experiments?  controlled conditions? controls? replicates? randomization? blocking?  sampling designs are really key to designing good observational studies it might not be an experiment if….  the conditions aren’t controlled  it isn’t replicated (e.g. case studies; only one reference site)  it isn’t randomized (e.g. organisms exposed to a condition were not randomly assigned)  can these clear-cut patches be considered replicates?  are they randomized?  interpreting results from applying experimental analysis to unplanned experiments can be tricky! carefully controlled conditions (ideally, the only thing that differs among the experimental units is the condition of interest) experimental controls: experimental units with known IN SUM: conditions that are compared to treated units AN randomization: experimental units are assigned to EXPERIMENT treatments randomly to avoid bias HAS replication: multiple experimental units are assigned to treatments to be representative of the population of interest if a variable is known to affect the outcome, treatment is applied to blocks based on that characteristic PART II: BASIC EXPERIMENTAL DESIGNS: GROUP 1 ANOVA – TYPE DESIGNS  completely randomized designs  randomized block designs  pairs designs BASIC  pre- and post-treatment, same experimental EXPERIMENTAL units DESIGNS:  randomized treatments, same experimental units  matched pairs QUESTION: HOW DOES DIET B (UNKNOWN) COMPARE TO DIET A (KNOWN) IN CONTROLLING DIABETES IN MICE? 1. COMPLETELY RANDOMIZED DESIGN half randomly assigned to Diet A compare means & variance half randomly assigned to Diet what if we suspect B that the sex of the mouse might affect the results? males 2. RANDOMIZED BLOCK DESIGN Diet A compare means & variance Diet B sex = blocking variable females Diet A compare means & variance Diet B MATCHED PAIRS DESIGN: PRE & POST  before/after comparison using the same experimental units Diet A Diet B compare results order of diet is not Diet A Diet B compare results randomised Diet A Diet B compare results MATCHED PAIRS DESIGN (SAME UNITS)  compare 2 treatments using the same experimental units Diet A order of diet compare results Diet B is randomised Diet A compare results Diet B (if you think that order might have an effect, might be better to pair similar subjects) Diet B compare results Diet A MATCHED PAIRS (SIMILAR UNITS)  sort your experimental units into similar pairs randomly assign Diet A Diet B Diet A Diet B Diet B Diet A 2+ FACTORS  what if you want to look at multiple factors affecting your dependent variable?  e.g. the effect of diet AND exercise on diabetes in mice?  the effect of fertilizer AND soil pH on plant growth?  need to look at main effects AND interaction effects COMMON GARDEN EXPERIMENTS  seeds are collected from different populations across a geographic gradient and grown in a “common garden” to see if there are specific traits that are local adaptations  If there’s no genetic difference among the populations, all plants grow equally well in all gardens  if there’s a genetic difference, plants growing in the same garden (under the same conditions) will grow differently, depending on the population they came from main effects plots Plant height Plant height Dependent (response) variable = Plant height Independent (predictor; Low Mid explanatory) variables: Low Mid Garden Garden Garden Garden origin of population from low elevation Plant height Plant height from mid elevation garden location (low, mid) Low Mid Low Mid Garden Garden Garden Garden no effect of origin or effect of garden Plant height Plant height origin but not garden Low Mid Low Mid Garden Garden Garden Garden effect of effect of garden, but origin AND not origin garden Plant height Plant height Low Mid Low Mid Garden Garden Garden Garden INTERACTION: origin X garden interaction The response to Plant height environment (garden) depends on genotype (population of origin) G X E interaction Low Mid Garden Garden “best "genotype/phenotype depends on the environment “best” genotype/ phenotype depends on the environment main effects plots Plant height Plant height Dependent (response) variable = Plant height  soil pH  Independent (predictor; explanatory) variables:  soil pH  species: species 1 Plant height species 2 Plant height soil pH (4 different levels)  soil pH   soil pH  Soil pH doesn’t affect Soil pH doesn’t plant growth in either affect plant Plant height growth in either Plant height species species – but species differ in height  soil pH   soil pH  Both species Both species grow higher in grow higher in higher soil pH higher soil pH Plant height Plant height – AND species differ in height consistently over that pH range  soil pH   soil pH  INTERACTION: species X soil pH interaction The response to soil pH depends on the species Plant height Species 1 grows better in low pH Species 2 grows better in higher pH  soil pH  this phenomenon underlies the structuring of ecological communities potential community members will vary over site conditions  effect of adding nutrients on parasite load in amphibians (trematode parasite Ribeiroia ondatrae) snails parasite nutrients algae load in tadpoles parasite load in snails  effect of adding nutrients on parasite load in amphibians (trematode parasite Ribeiroia ondatrae) snails  2 hypothesized pathways: parasite  more nutrients  more algae (higher snail nutrients algae load in tadpoles density) (high infection rate among snails)  more parasites in tadpoles parasite load in  more nutrients  more algae (reduced snail snails mortality)  (increased snail size & vigor)  higher parasite loads in snails  more parasites in tadpoles  2 factor experiment:  2 levels of nutrients (ambient & elevated)  3 levels of parasites eggs ( none, low, high)  36 mesocosms (6 replicates per 6 treatment snails combinations) of snails, tadpoles, zooplankton & algae parasite nutrients algae load in tadpoles parasite load in snails  both factors (initial parasite level and nutrient level) affected tadpole infections  which mechanism/pathway?  evidence for more snails  evidence for higher loads per snail  both causal pathways are important! snails parasite nutrients algae load in tadpoles parasite load in snails ASSIGNMENT #1:  Locate a green space that you can access (e.g. the soccer field on campus)  Design a controlled experiment testing either:  the efficacy of fertilizers on plant growth  the efficacy of a pesticide on controlling an invasive species  the effect of mowing on root development  Due September 29th at 1159pm (submit on Quercus) ENV330: Experimental Design in Environmental Science Land Acknowledgment We wish to acknowledge this land on which the University of Toronto operates. For thousands of years, it has been the traditional land of the Huron-Wendat, the Seneca, and most recently, the Mississaugas of the Credit River. Today, this meeting place is still the home to many Indigenous people from across Turtle Island and we are grateful to have the opportunity to work on this land. https://native-land.ca/ Structure of the course Synchronous meetings: lectures + activities some asynchronous lecture content; short reading assignments 3 assignments based on field exercises 1 group project Final exam Assessment Scheme: Assignments + Exam Final exam (40%) is a 3h IN PERSON exam. Biomonitoring of Sawmill Creek Using the CABIN Protocol 3h field day 2 groups (Oct 6th and Oct 20th) Major Themes Sampling Approaches and Treatment Designs Statistical Foundations of Good Designs Collecting, Exploring and Displaying Data Contaminants in the Environment. Major Themes Improving Statistical & Environmental Statistical Analytical Problems/ Inference & Methods Questions Open Science Statistical & Analytical Methods Observational vs. Regression Trees Manipulative Experiments Linear Models & Randomized & Non- Repeated Measure Randomized Designs Designs Sampling Methods Ordination Methods Types of Errors Classification & Environmental Problems/Questions Baseline Monitoring Contaminants in the Environmental Assessment Environment Bioassessment Models Ecotoxicology Reference Condition Environmental Health & Spatial & Temporal Epidemiology Variability Improving Statistical Inference & Open Science Problems with NHST Likelihood models, AICs Causality & Path Analysis The Open Science Frequentist vs. Bayesian Movement Inference Meta-Analysis Sensitivity, Specificity & Positive Predictive Value what is unique about experimental design in environmental science? what IS environmental science? interdisciplinaryassessment of the impact of human activities on integrated biotic and abiotic systems oriented towards solving anthropogenic problems environmental science is rooted in systems thinking how do we DO environmental science? weneed to tease apart the effects of human activities on systems from the natural variation in those systems weneed to identify stressors and responses …. but that’s hard!! environmental systems are complex, hierarchical, interdependent anthropogenic effects (signal) and natural variation (noise) vary over multiple spatial and temporal scales, and are affected by multiple factors What do environmental scientists do? monitoring studies research environmental impact assessments Monitoring: Research: “watching an environmental resource” answers a specific question estimates current status estimates a specific parameter detects trends seeks causal pathways/relationships usually long term/large scale short/long term observational studies observational OR manipulative studies Predictive (prospective): describing the probable environmental impacts of a proposed activity (risk assessment) impact assessment: Postdictive (retrospective): aimed at quantifying the actual impacts of an activity (retrospective risk assessment) some research studies are purely descriptive (not testing a hypothesis) … but meaningful descriptive studies often LEAD TO hypothesis tests two main approaches to testing hypotheses: controlled experiments randomized, controlled trials correlational studies also called observational or mensurative studies “naturalexperiments”, “mensurative studies”, “correlational studies” nothing is “done” to the system; it’s only observed observational can be purely descriptive (e.g. monitoring) studies canaddress a research question by exploiting the natural variation (spatial; temporal) in the data to suggest causal relationships Simple to design and conduct — reduces the number of potential errors in design Observational “real world” effects — examining existing conditions studies: susceptible to confounding factors and reverse causation “controlled experiments” system is manipulated: all or part is subjected to specific treatments (including a control) manipulative designed to address a research studies question by controlling variation in other factors to identify and describe causal relationships randomized, controlled experiments are an effective way to test predictions about causal relationships experimental units are randomly assigned to treatments other variables are controlled treatments are replicated can be more complicated/expensive to design, set up changing one factor may have unintended effects e.g. herbicide application to kill some plants but not others may also affect insect populations, killing soil biota and changing growth parameters manipulation may not be realistic e.g., does cutting trees natural gap formation? may not be ethical or practical (e.g., intentionally introducing persistent toxic compounds??) observational studies vs. manipulative studies – which is best? Trick question! Both have value in environmental science haveenough understanding of natural patterns of spatial and temporal variability in ecosystems in order to define/recognize “baseline” conditions anticipate what will be the relevant environmental response variables (and what variables will constitute “noise”) scientists need designand conduct appropriate studies to be able to: under time and under budget conduct analyses that are relevant and interpretable beable to communicate their findings to a wide audience of stakeholders Part 2: Why do we look for it? Causality How do we establish it? above all, scientists strive to understand causality what IS causality? nobody really knows…. Hill’s Criteria for Causality Strength: A relationship is likely to be causal if the correlation coefficient is large and statistically significant. Consistency: A relationship is more likely to be causal if it can be replicated. Specificity: A relationship is likely to be causal if there is no other likely explanation. Hill, A. B. (1965). "The Environment and Disease: Association or Causation?" Proc R Soc Med 58 (5): 295–300 Hill’s Criteria for Causality Sequence: A relationship is more likely to be causal if the effect always occurs after the cause. Dose Response: A relationship is more likely to be causal if a greater exposure to the suspected cause leads to a greater effect. Plausible: A relationship is more likely to be causal if there is a plausible mechanism between the cause and the effect Hill, A. B. (1965). "The Environment and Disease: Association or Causation?" Proc R Soc Med 58 (5): 295–300 Hill’s Criteria for Causality Coherence: A relationship is likely to be causal if it is compatible with related facts and theories. Experimental evidence: A relationship is likely to be causal if it can be verified experimentally. Analogous: A relationship is likely to be causal if there are proven relationships between similar causes and effects Hill, A. B. (1965). "The Environment and Disease: Association or Causation?" Proc R Soc Med 58 (5): 295–300 controlled experiments: the best way to establish causality…? plant N Photosynthetic carbon N fertilizer growth absorption enzymes fixation ? Scenario 1: Fertilizer Nitrogen Photosynthetic addition Absorption Enzymes Scenario 2: Fertilizer Photosynthetic Nitrogen addition Enzymes Absorption Scenario 3: all 3 causal scenarios Nitrogen create an association Absorption between fertilizer Fertilizer application, increased addition enzymatic activity and Photosynthetic nitrogen absorption Enzymes establishing causality using correlational studies spatial & temporal patterns of variables can lead us to INFER a cause-effect relationship Cislaghi and Nimis, “Lichens, air pollution and lung cancer”. Nature, 387:463–464, 1997 does lichen biodiversity cause lung cancer? does lung cancer affect lichen biodiversity? air pollution affects BOTH lichen biodiversity and lung cancer ? ? Types of variables: predictor (independent) response (dependent) lurking Types of relationships: causal reversed confounded/collinear spurious inferring causal relationships: X Y independent dependent variable variable predictor response e.g. eutrophication Phosphorus The “Lurking Variable” does buying ice cream increase shark attacks? do shark attacks inspire ice cream sales? (L) SEASON is the LURKING variable Lurking Variable: an variable that has a CAUSAL effect on two (or more) dependent variables, leading to an INCORRECT inference about a causal relationship between the two dependent variables L “P” X R Is air pollution a “lurking” variable?? No! A variable is “lurking” if you aren’t aware of it/ don’t consider it in your experimental design! the “false predictor” can be used as a PROXY variable in some studies People who ride the bus are sicker than those who don’t. Why? This is NOT a “lurking” variable situation soc/ec is a true predictor of bus ridership soc/ec is a true predictor of health riding a bus exposes people to disease risk, thus is also a true predictor of health bus ridership and soc/ec status are CONFOUNDED Confounded Variable: Two predictor variables are CONFOUNDED if they BOTH have an effect on a dependent variable, but are also related to each other. This makes it difficult to understand the causal relationship(s). confounded P P R confounded variables CAN be dealt with if they can be controlled for experimentally OR there is natural variation that can be exploited income and bus ridership are NOT mechanistically linked, thus these collinear variables can be dealt with statistically in an observational study, or by assigning appropriate treatments in a manipulative study Collinearity: high temp confounded variables that CANNOT be separated low fish O2 death reverse causation does stress from pine bark beetle infestation make trees susceptible to fungal rot, or does fungal rot make it easier for pine bark beetles to attack a tree? spurious correlations nolurking variable, just sheer coincidence! as environmental scientists, we are trying to detect causal ? relationships between a stressor and a response sometimes, we see a putative causal agent and look for response ? sometimes, we see the response and look for the causal agent not just a binary yes/no – want to MODEL the relationship (look for thresholds, response curves etc.) ? causal paths can be complex…. …sometimes VERY complex!! In-Class Exercise How do we know smoking causes cancer? correlation does not prove causation however, correlation implies causation…. lots of ways to collect and examine the data how can we tell if there is a causal relationship??? Next Fundamentals of Experimental Designs time: FIELD DAY – dress appropriately for a walk on campus!

Use Quizgecko on...
Browser
Browser