ENV330 Lecture 3 Blocking, Split Plots, etc. PDF

ENV330 LECTURE 3 MORE ON BASIC EXPERIMENTAL DESIGNS TYPES OF EXPERIMENTAL DESIGNS 1. Is the predictor variable(s) consciously altered? (Manipulative vs. Observational) 2. Are experimental units randomly assigned to treatments? 3. What data will we have about the responses and explanatory variables? (e.g., categorical factors; quantitative responses, etc.) The characteristics determine both the general analytical approach, and the types of inferences that can be drawn RECALL: 5 COMPONENTS OF (MOST) EXPERIMENTAL DESIGNS  controlled conditions  experimental controls  randomization  replication  blocking  completely randomized designs (1 factor)  randomized block designs (1 factor) CONTROLLED  pairs designs (1 factors) EXPERIMENTS  2+ factor designs WITH  subsets & spatial arrays CATEGORICAL EXPLANATORY  randomized blocks VARIABLES  Latin square (FACTORS)  split plot  covariates  (repeated measures) 1. COMPLETELY RANDOMIZED DESIGN (FACTOR = DIET) half randomly assigned to Diet A (control) compare means & variance half randomly assigned to Diet B what if we suspect that the sex of the mouse might affect the results? 2. RANDOMIZED BLOCK DESIGN males (FACTOR = DIET; BLOCK = SEX) Diet A compare means & variance Diet B sex = blocking variable females Diet A compare means & variance Diet B MATCHED PAIRS ARE BLOCKED DESIGNS Diet A Diet B compare results  before/after comparison (BACI) using the same experimental units  can be thought of as a type of blocking – each individual is a “block” receiving 2 treatments (diet A and diet B)  controls for individual variability BACI (BEFORE, AFTER, CONTROL, IMPACT) DESIGNS EXPERIMENTAL VS BLOCKING FACTORS  Experimental Factors: factors we IMPOSE on the experimental units to determine causal effects of explanatory variables on our variable of interest  Blocking Factors: factors that affect experimental units and contribute to variability among them -- things we try to control for in our experimental design FACTORS VS LEVELS  Factor = categorical variable that the experimenter manipulates  Level = the range of categories within a factor  e.g., the depicted experiment: 2 factors (water treatment; inoculation), with 2 levels each (water: stress/well-watered; inoculation: inoculated/non- inoculated) and 1 blocking factor (sex) MULTIPLE EFFECTS: POTENTIAL OUTCOMES Suppose you test the effects of 2 factors on a variable. What kind of results can you have?  Neither factor has an effect.  Factor 1 has an effect; Factor 2 doesn’t.  Factor 1 doesn’t have an effect’ Factor 2 does.  Both factors have an additive effect.  Both factors have an interactive effect MULTIPLE EFFECTS: POTENTIAL OUTCOMES  Suppose we want to look at the effect Diet of diet AND exercise on the expression of some marker (Marker X) of diabetes Marker X in mice  Two different diets Exercise  Two different exercise regimes MULTIPLE EFFECTS: POTENTIAL OUTCOMES  Neither diet (A & B) nor exercise (30 min/day on exercise wheel vs. no exercise) had an effect on marker X.  Diet B lowered marker X compared to Diet A, but exercise had no effect.  Exercise lowered marker X compared to no exercise, but diet had no effect.  Mice that exercised AND ate diet B had lowered marker X compared to all other categories of mice. (ADDITIVE)  Mice on Diet B showed lowered marker X if they exercised, but mice on Diet A showed lower markers of they did NOT exercise. (INTERACTION) Neither diet nor Diet has an exercise has an effect effect Diet A Diet B No Ex Ex No Ex Ex Exercise has an effect Diet & exercise have additive effects No Ex Ex No Ex Ex Diet and exercise have interactive effects Diet A Diet B No Ex Ex “Mice on Diet A showed higher expression of marker X with exercise; however, this pattern was reversed in mice on Diet B (Fig 1).”  completely randomized designs (1 factor)  randomized block designs (1 factor) CONTROLLED  pairs designs (1 factors) EXPERIMENTS  2+ factor designs WITH  spatial arrays CATEGORICAL EXPLANATORY  randomized blocks VARIABLES  Latin square (FACTORS)  split plot  covariates  (repeated measures) SPATIAL ARRAYS: RANDOMIZED BLOCKS  If our experimental units are spatially distributed, we use blocking to account for spatial heterogeneity  Suppose you are studying the effects of fertilizer on seed production in 2 types of rice plant variety Experimental unit is an AREA seed production fertilizer SPATIAL ARRAYS: RANDOMIZED BLOCKS  2 factors: variety (A) & fertilizer (B)  Factor A: 2 levels; Factor B: 3 levels  6 combinations need to be tested:  A1B1 plant variety A1 & A2  A1B2  A1B3 seed production  A2B1  A2B2 fertilizer B1, B2, B3  A2B3 SPATIAL ARRAYS: RANDOMIZED BLOCKS  2 factors: variety (A) & fertilizer (B)  Factor A: 2 levels; Factor B: 3 levels  6 combinations need to be tested plant variety A1 & A2 pretty heterogeneous! seed production fertilizer B1, B2, B3 SPATIAL ARRAYS: RANDOMIZED BLOCKS A1B1 A2B1 A1B2 A1B3 A2B3 A2B2 A1B2 A1B3 A1B1 A2B2 A2B1 A2B3 A2B2 A1B1 A2B1 A1B2 A2B3 A1B3 A1B3 A2B3 A1B2 A2B1 A1B1 A2B2  4 blocks in which every treatment is randomly allocated  randomized block = spatial unit that includes all treatments & represents one replicate of each of those treatments RCBD = RANDOMIZED COMPLETE BLOCK DESIGN  All treatments are assigned once within each block = randomized complete block design (RCBD)  are placed to ensure more variation AMONG blocks than WITHIN blocks  should be small enough to ensure that they are relatively homogenous  should be large enough to ensure that the treatments within blocks are well separated in space  this is applicable to LOTS of designs (not just in a semi-natural setting)  imagine a greenhouse – temperature, sun exposure is not completely uniform A1B1 A2B1 A1B2 A1B3 A2B3 A2B2 A1B2 A1B3 A1B1 A2B2 A2B1 A2B3  align your blocks ACROSS a gradient so that they are A2B2 A1B1 A2B1 A1B2 A2B3 A1B3 INTERNALLY HOMOGENOUS A1B3 A2B3 A1B2 A2B1 A1B1 A2B2 correct incorrect  not always unidirectional and or linear gradients – whenever you have a spatial area that you think is internally homogenous AND different from other internally homogenous areas, those should be blocks VARIATIONS: LATIN SQUARE  every row has all the treatments; every column has all the treatments  has to be square grid  works well if you think you have a couple of gradients LATIN SQUARE one gradient second gradient SPLIT-PLOTS DESIGNS  Usually a 2-factor design  One of the factors is “easy” to change or vary  One of the factors is “hard” to change or vary  Suppose we have 4 fields and want to look at the effects of irrigation & fertilizer on crop yield: WHICH DESIGN MAKES MORE SENSE? A = Irrigation (Method 1 vs Method 2) B = Fertilizer (Fertilizer 1 vs. Fertilizer 2) OR A = Fertilizer (Fertilizer 1 vs. Fertilizer 2) B = Irrigation (Method 1 vs Method 2) Suppose you want to study the effects of different levels of CO2 and different soil temperatures on plant growth… how would you set up your greenhouses?  2 factors – CO2 level (ambient or enhanced) and soil temperatures (ambient, heated to 25° and heated to 30°)  “hard” factor = CO2 level  “easy” factor = soil temps  Hard factor = main (whole) plot  Easy factor = split plot split-plot designs can be complex! BLOCKS CAN BE ANY FACTOR THAT SHOWS INTERNAL HOMOGENEITY  plots of habitat  a room in a greenhouse  group of aquaria tanks  day of the week  litter or nest mates  measurements taken on instrument X  data collected by technician Y statistically, the variation AMONG blocks is removed from the experimental error term in an ANOVA, increasing the precision of the experiment THE VALUE OF BLOCKING: SUMMARY  controls for heterogeneity  “block what you can; randomize what you cannot.”  (RCBD) is a better design than a completely randomized design (CRD) if there is more heterogeneity among blocks than within blocks  RCBD requires fewer replicates to have the same power as a CRD (Why?? Less variability)  completely randomized designs (1 factor)  randomized block designs (1 factor) CONTROLLED  pairs designs (1 factors) EXPERIMENTS  2+ factor designs WITH  spatial arrays CATEGORICAL EXPLANATORY  randomized blocks VARIABLES  Latin square (FACTORS)  split plot  covariates  (repeated measures) COVARIATE  a “nuisance” variable that is a continuous (numeric) variable – NOT a factor (recall, factors have LEVELS)  doesn’t vary in a way that makes it amenable to blocking (not in neat, homogeneous groups)  e.g., organic matter in soil – might vary all over this field; may influence our fertilizer results Covariates statistically control for individual variation  suppose your response variable is % of leaf covered by tar spot  % coverage might be correlated with leaf size (the smaller the leaf, the more likely that a single tar spot takes up a large proportion of the leaf)  instead, “size of diseased area” is the dependent variable, and size of non- diseased area can be one of the predictor variables Covariates are useful in controlling for “starting conditions”  e.g., testing two different exercise regimes for building muscle mass  initial body composition likely influences how much muscle you can gain in a month  can treat initial muscle mass as a covariate REPEATED MEASURES DESIGNS  we often sample the same individual, same site, etc. over time  statistically awkward to deal with  problem with autocorrelation (whatever the measurement was last time is going to influence the measurement this time – are the samples independent?)  we’ll talk about this later!  Data structures are often hierarchical – fish in lakes – trees in forests – wolves in packs – streams in watersheds – birds in nests Statistical Models The dependent variable is some function of one or more independent variables (factors and/or covariates) Models are mathematical representations of our hypotheses Independent variables (predictors; causes; hypotheses) X Y Factors Covariates –Independent Factors –Independent Covariates –Control Factors –Control Covariates –Blocking Variables X2 X1 Y 3 types of factors: – Independent Factors: our hypothesis (causal relationship with dependent variable) – Control Factors: variables (of interest) that we need to control to test the hypothesis (confounding variables) – statistically, we are interested in their coefficients – Blocking Factors: variables we need to control for as well, but their values are “arbitrary” and not of interest crop example: Crop Yield = dependent variable Crop Variety = independent factor Soil Nitrogen = control covariate Farm = blocking variable Crop crop example: Yield Crop Yield = dependent variable Crop Variety = independent factor Soil Nitrogen = control covariate Variety Soil N Farm = blocking variable Independent Control Variables Blocking Variables Variables The causal variable A variable that needs to be A factor that you need to in your hypothesis; controlled so that you can properly control for, but you might the “variable of see the effect of the independent not be interested in the interest”; the focus variable specific estimates of the of your hypothesis parameters you are interested in specific can be a continuous estimates of the parameters – it’s measured values could be variable, or a factor not your main hypothesis, but you swapped for other values, with levels (ordinal still might be interested in and you’d still be testing the or nominal) quantifying the effects of a control same hypothesis variable on your dependent variable can only be a factor can be a covariate or a factor FIXED FIXED FIXED OR RANDOM FIXED & RANDOM EFFECTS Fixed factors: the factor levels are informative and are chosen by the investigator specifically because they have a unique and important meaning. Depending on the study, these COULD be:  “Male” and “female”  “Predator” and “prey”  “Drug A,” “Drug B,” and “Drug C”  “Control” and “treatment”  “Before” and “after”  variants on a predictor, for example, “high,” “medium,” and “low” FIXED & RANDOM EFFECTS Random factors: the selected factor levels often can be considered a subset from a population of levels, and the factor levels are not specifically important. (We are unlikely to care about differences among factor levels).  Time intervals in a time sequence  A collection of seed types (or some other levels) chosen randomly from a population of seed types (or some other population)  Subjects on whom repeated measurements are made Whether a factor is fixed or random depends on the inference you wish to draw Study: the effects of PFOS on survival and reproduction in aquatic insects  I hypothesize that this pollutant will have a negative  I hypothesize that this pollutant will have a negative effect on many types of benthic organisms. I want effect on many types of benthic organisms. There to look at more than one species, to make sure I’m are some species I am specifically interested in – for not accidentally picking a super-sensitive or super- example, I want to know if Hexagenia is more resistant species, but I don’t particularly care which sensitive than Chironomus; there’s a few other ones I use. If I repeated the experiment, I might pick comparisons I would like to quantify. If I redid the different species; I just want a representation of experiment, I’d still pick the same species, because I more than one type of insect in this study. I’ll treat am specifically interested in them. I’ll treat species species as a RANDOM factor. as a FIXED factor. Fixed Factors Random Factors You are specifically interested in the You are trying control for non-independence from effects of these factors – these are part a nested or hierarchical structure of your causal hypothesis (the point of the study) The categories/ranges are a “complete The categories are a subset of things that could be set” of what you want to draw sampled; they are not exhaustive inferences about for this study If you repeated the experiment, you If you repeated the experiment, you wouldn’t would use the exact same factors and necessarily use the same factors/ranges levels/ranges It doesn’t matter how many levels are in Ideally, random factors should have several levels – each factor (at least two, or it’s not a you are estimating your “mean slope” for your fixed variable ) effects from these random factors, so the more you have, the better (rule of thumb: at least 5) SUMMARY:  Most hypothesis testing seeks to establish causal relationships between independent variables and dependent variables  To understand the causal relationships, we need to design experiments that allow us to detect the specific contribution of each variable to the response (resolve confounding relationships)  Blocking is a way of controlling for “background” heterogeneity (confounding variables that we are not specifically interested in modeling)  Statistical inference is improved if we correctly designate explanatory variables as random or fixed effects

ENV330 Lecture 3 Blocking, Split Plots, etc. PDF

Document Details

Tags

Related

Summary

Full Transcript