ARRIVE Guidelines 2.0 PDF
Document Details

Uploaded by RenewedReasoning
Delaware Valley University
2020
Tags
Related
- Introduction to Neuroimaging 06-10-2023 Animal Research and Research Ethics PDF
- Guide for the Care and Use of Agricultural Animals in Research and Teaching PDF 2020
- McGill PSYCH 306 Research Methods in Psychology Chapter 4 PDF
- McGill PSYC 306 Research Methods In Psychology PDF
- Guia Brasileiro para Produção, Manutenção e Utilização de Animais em Ensino e Pesquisa Científica PDF
- Ethical Research: Why It Matters PDF
Summary
This document from PLOS Biology provides an explanation and elaboration of the ARRIVE guidelines 2.0 for reporting animal research. It aims to improve the transparency and reproducibility of biomedical research by detailing important aspects of experimental design and reporting. The document includes recommendations and best practices for improving the standards in animal studies.
Full Transcript
PLOS BIOLOGY COMMUNITY PAGE Reporting animal research: Explanation and elaboration for the ARRIVE guidelines 2.0...
PLOS BIOLOGY COMMUNITY PAGE Reporting animal research: Explanation and elaboration for the ARRIVE guidelines 2.0 Nathalie Percie du Sert ID1*, Amrita Ahluwalia ID2,3, Sabina Alam ID4, Marc T. Avey ID5, Monya Baker6, William J. Browne ID7, Alejandra Clark ID8, Innes C. Cuthill ID9, Ulrich Dirnagl ID10, Michael Emerson11, Paul Garner ID12, Stephen T. Holgate13, David W. Howells ID14, Viki Hurst1, Natasha A. Karp ID15, Stanley E. Lazic16, Katie Lidster1, Catriona J. MacCallum ID17, Malcolm Macleod ID18, Esther J. Pearl ID1, Ole H. Petersen19, Frances Rawle ID20, Penny Reynolds ID21, Kieron Rooney ID22, Emily S. Sena ID18, Shai D. Silberberg23, Thomas Steckler ID24, Hanno Würbel ID25 a1111111111 1 NC3Rs, London, United Kingdom, 2 The William Harvey Research Institute, London, United Kingdom, a1111111111 3 Barts Cardiovascular CTU, Queen Mary University of London, London, United Kingdom, 4 Taylor & Francis a1111111111 Group, London, United Kingdom, 5 Health Science Practice, ICF, Durham, North Carolina, United States of a1111111111 America, 6 Nature, San Francisco, California, United States of America, 7 School of Education, University a1111111111 of Bristol, Bristol, United Kingdom, 8 PLOS ONE, Cambridge, United Kingdom, 9 School of Biological Sciences, University of Bristol, Bristol, United Kingdom, 10 QUEST Center for Transforming Biomedical Research, Berlin Institute of Health & Department of Experimental Neurology, Charite Universitätsmedizin Berlin, Berlin, Germany, 11 National Heart and Lung Institute, Imperial College London, London, United Kingdom, 12 Centre for Evidence Synthesis in Global Health, Clinical Sciences Department, Liverpool School of Tropical Medicine, Liverpool, United Kingdom, 13 Clinical and Experimental Sciences, University of OPEN ACCESS Southampton, Southampton, United Kingdom, 14 Tasmanian School of Medicine, University of Tasmania, Hobart, Australia, 15 Data Sciences & Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Citation: Percie du Sert N, Ahluwalia A, Alam S, Cambridge, United Kingdom, 16 Prioris.ai Inc, Ottawa, Canada, 17 Hindawi Ltd, London, United Kingdom, Avey MT, Baker M, Browne WJ, et al. (2020) 18 Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom, 19 Academia Reporting animal research: Explanation and Europaea Knowledge Hub, Cardiff University, Cardiff, United Kingdom, 20 Medical Research Council, elaboration for the ARRIVE guidelines 2.0. PLoS London, United Kingdom, 21 Statistics in Anesthesiology Research (STAR) Core, Department of Biol 18(7): e3000411. https://doi.org/10.1371/ Anesthesiology, College of Medicine, University of Florida, Gainesville, Florida, United States of America, journal.pbio.3000411 22 Discipline of Exercise and Sport Science, Faculty of Medicine and Health, University of Sydney, Sydney, Australia, 23 National Institute of Neurological Disorders and Stroke, Bethesda, Maryland, United States of Academic Editor: Isabelle Boutron, University Paris America, 24 Janssen Pharmaceutica NV, Beerse, Belgium, 25 Veterinary Public Health Institute, Vetsuisse Descartes, FRANCE Faculty, University of Bern, Bern, Switzerland Published: July 14, 2020 * [email protected] Copyright: This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. Abstract The work is made available under the Creative Improving the reproducibility of biomedical research is a major challenge. Transparent and Commons CC0 public domain dedication. accurate reporting is vital to this process; it allows readers to assess the reliability of the Funding: This work was supported by the National findings and repeat or build upon the work of other researchers. The ARRIVE guidelines Centre of the Replacement, Refinement & Reduction on Animals in Research (NC3Rs, https:// (Animal Research: Reporting In Vivo Experiments) were developed in 2010 to help authors www.nc3rs.org.uk/). NPdS, KL, VH, and EJP are and journals identify the minimum information necessary to report in publications describing employees of the NC3Rs. in vivo experiments. Despite widespread endorsement by the scientific community, the Competing interests: I have read the journal’s impact of ARRIVE on the transparency of reporting in animal research publications has policy and the authors of this manuscript have the been limited. We have revised the ARRIVE guidelines to update them and facilitate their use following competing interests: AA is the editor in chief of the British Journal of Pharmacology. WJB, in practice. The revised guidelines are published alongside this paper. This explanation and ICC, and ME are authors of the original ARRIVE elaboration document was developed as part of the revision. It provides further information guidelines. WJB serves on the Independent about each of the 21 items in ARRIVE 2.0, including the rationale and supporting evidence Statistical Standing Committee of the funder CHDI for their inclusion in the guidelines, elaboration of details to report, and examples of good foundation. AC is a Senior Editor for PLOS ONE. AC, CJM, MM, and ESS were involved in the reporting from the published literature. This document also covers advice and best practice PLOS Biology | https://doi.org/10.1371/journal.pbio.3000411 July 14, 2020 1 / 65 PLOS BIOLOGY Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2.0 IICARus trial. ME, MM, and ESS have received in the design and conduct of animal studies to support researchers in improving standards funding from NC3Rs. ME sits on the MRC ERPIC from the start of the experimental design process through to publication. panel. STH is chair of the NC3Rs board; trusteeship of the BLF, Kennedy Trust, DSRU, and CRUK; member of Governing Board, Nuffield Council of Bioethics, member Science Panel for Health (EU H2020); founder and NEB Director Synairgen; consultant Novartis, Teva, and AZ; and chair MRC/ See S1 Annotated byline for individual authors’ positions at the time this article was GSK EMINENT Collaboration. VH, KL, EJP, and submitted. NPdS are NC3Rs staff; role includes promoting the ARRIVE guidelines. SEL and UD are on the See S1 Annotated References for further context on the works cited in this article. advisory board of the UK Reproducibility Network. CJM has shareholdings in Hindawi, is on the publishing board of the Royal Society, and on the EU Open Science policy platform. UD, MM, NPdS, Introduction CJM, ESS, TS, and HW are members of EQIPD. Transparent and accurate reporting is essential to improve the reproducibility of scientific MM is a member of the Animals in Science research; it enables others to scrutinise the methodological rigour of the studies, assess how Committee and on the steering group of the UK Reproducibility Network. NPdS and TS are reliable the findings are, and repeat or build upon the work. associate editors of BMJ Open Science. OHP is However, evidence shows that the majority of publications fail to include key information vice president of Academia Europaea, editor in and there is significant scope to improve the reporting of studies involving animal research [1– chief of Function, senior executive editor of the 4]. To that end, the UK National Centre for the 3Rs (NC3Rs) published the ARRIVE (Animal Journal of Physiology, and member of the Board of Research: Reporting In Vivo Experiments) guidelines in 2010. The guidelines are a checklist of the European Commission’s SAPEA (Science Advice for Policy by European Academies). FR is information to include in a manuscript to ensure that publications contain enough information an NC3Rs board member and has shareholdings in to add to the knowledge base. The guidelines have received widespread endorsement from GSK. FR and NAK have shareholdings in the scientific community and are currently recommended by more than a thousand journals, AstraZeneca. PR is a member of the University of with further endorsement from research funders, universities, and learned societies worldwide. Florida Institutional Animal Care and Use Studies measuring the impact of ARRIVE on the quality of reporting have produced mixed Committee and editorial board member of Shock. results [6–11], and there is evidence that in vivo scientists are not sufficiently aware of the ESS is editor in chief of BMJ Open Science. SDS’s role is to provide expertise and does not represent importance of reporting the information covered in the guidelines and fail to appreciate the the opinion of the NIH. TS has shareholdings in relevance to their work or their research field. Johnson & Johnson. SA, MTA, MB, PG, DWH, and As a new international working group—the authors of this publication—we have revised the KR declared no conflict of interest. guidelines to update them and facilitate their uptake; the ARRIVE guidelines 2.0 are published Abbreviations: AAALAC, American Association for alongside this paper. We have updated the recommendations in line with current best prac- Accreditation of Laboratory Animal Care; ARRIVE, tice, reorganised the information, and classified the items into two sets. The ARRIVE Essential Animal Research: Reporting of In Vivo 10 constitute the minimum reporting requirement, and the Recommended Set provides further Experiments; AVMA, American Veterinary Medical context to the study described. Although reporting both sets is best practice, an initial focus on Association; AWERB, Animal Welfare and Ethical Review Body; DOI, digital object identifier; EBI, the most critical issues helps authors, journal staff, editors, and reviewers use the guidelines in European Bioinformatics Institute; EDA, practice and allows a pragmatic implementation. Once the Essential 10 are consistently reported Experimental Design Assistant; GLP, Good in manuscripts, items from the Recommended Set can be added to journal requirements over Laboratory Practice; IACUC, Institutional Animal time until all 21 items are routinely reported in all manuscripts. Full methodology for the revi- Care and Use Committee; NC3Rs, National Centre sion and the allocation of items into sets is described in the accompanying publication. for the 3Rs; NCBI, National Center for A key aspect of the revision was to develop this explanation and elaboration document to Biotechnology Information; PHISPS, Population; Hypothesis; Intervention; Statistical Analysis Plan; provide background and rationale for each of the 21 items of ARRIVE 2.0. Here, we present Primary; Outcome Measure; Sample Size additional guidance for each item and subitem, explain the importance of reporting this infor- Calculation; RRID, Research Resource Identifier; mation in manuscripts that describe animal research, elaborate on what to report, and provide SAMPL, Statistical Analyses and Methods in the supporting evidence. The guidelines apply to all areas of bioscience research involving living Published Literature; SPF, Specific Pathogen Free. animals. That includes mammalian species as well as model organisms such as Drosophila or Caenorhabditis elegans. Each item is equally relevant to manuscripts centred around a single animal study and broader-scope manuscripts describing in vivo observations along with other types of experiments. The exact type of detail to report, however, might vary between species and experimental setup; this is acknowledged in the guidance provided for each item. PLOS Biology | https://doi.org/10.1371/journal.pbio.3000411 July 14, 2020 2 / 65 PLOS BIOLOGY Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2.0 We recognise that the purpose of the research influences the design of the study. Hypothe- sis-testing research evaluates specific hypotheses, using rigorous methods to reduce the risk of bias and a statistical analysis plan that has been defined before the study starts. In contrast, exploratory research often investigates many questions simultaneously without adhering to strict standards of rigour; this flexibility is used to develop or test novel methods and generate theories and hypotheses that can be formally tested later. Both study types make valuable contri- butions to scientific progress. Transparently reporting the purpose of the research and the level of rigour used in the design, execution, and analysis of the study enables readers to decide how to use the research, whether the findings are groundbreaking and need to be confirmed before building on them, or whether they are robust enough to be applied to other research settings. To contextualise the importance of reporting information described in the Essential 10, this document also covers experimental design concepts and best practices. This has two main pur- poses: First, it helps authors understand the relevance of this information for readers to assess the reliability of the reported results, thus encouraging thorough reporting. Second, it supports the implementation of best practices in the design and conduct of animal research. Consulting this document at the start of the process when planning an in vivo experiment will enable researchers to make the best use of it, implement the advice on study design, and prepare for the information that will need to be collected during the experiment to report the study in adherence with the guidelines. To ensure that the recommendations are as clear and useful as possible to the target audi- ence, this document was road tested alongside the revised guidelines with researchers prepar- ing manuscripts describing in vivo research. Each item is written as a self-contained section, enabling authors to refer to particular items independently, and a glossary (Box 1) explains common statistical terms. Each subitem is also illustrated with examples of good reporting from the published literature. Explanations and examples are also available from the ARRIVE guidelines website: https://www.arriveguidelines.org. Box 1. Glossary Bias: The over- or underestimation of the true effect of an intervention. Bias is caused by inadequacies in the design, conduct, or analysis of an experiment, resulting in the intro- duction of error. Descriptive and inferential statistics: Descriptive statistics are used to summarise the data. They generally include a measure of central tendency (e.g., mean or median) and a measure of spread (e.g., standard deviation or range). Inferential statistics are used to make generalisations about the population from which the samples are drawn. Hypothe- sis tests such as ANOVA, Mann-Whitney, or t tests are examples of inferential statistics. Effect size: Quantitative measure of differences between groups, or strength of relation- ships between variables. Experimental unit: Biological entity subjected to an intervention independently of all other units, such that it is possible to assign any two experimental units to different treat- ment groups. Sometimes known as unit of randomisation. External validity: Extent to which the results of a given study enable application or gen- eralisation to other studies, study conditions, animal strains/species, or humans. False negative: Statistically nonsignificant result obtained when the alternative hypothe- sis (H1) is true. In statistics, it is known as the type II error. PLOS Biology | https://doi.org/10.1371/journal.pbio.3000411 July 14, 2020 3 / 65 PLOS BIOLOGY Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2.0 False positive: Statistically significant result obtained when the null hypothesis (H0) is true. In statistics, it is known as the type I error. Independent variable: Variable that either the researcher manipulates (treatment, con- dition, time) or is a property of the sample (sex) or a technical feature (batch, cage, sam- ple collection) that can potentially affect the outcome measure. Independent variables can be scientifically interesting, or nuisance variables. Also known as predictor variable. Internal validity: Extent to which the results of a given study can be attributed to the effects of the experimental intervention, rather than some other, unknown factor(s) (e.g., inadequacies in the design, conduct, or analysis of the study introducing bias). Nuisance variable: Variables that are not of primary interest but should be considered in the experimental design or the analysis because they may affect the outcome measure and add variability. They become confounders if, in addition, they are correlated with an independent variable of interest, as this introduces bias. Nuisance variables should be considered in the design of the experiment (to prevent them from becoming confound- ers) and in the analysis (to account for the variability and sometimes to reduce bias). For example, nuisance variables can be used as blocking factors or covariates. Null and alternative hypotheses: The null hypothesis (H0) is that there is no effect, such as a difference between groups or an association between variables. The alternative hypothesis (H1) postulates that an effect exists. Outcome measure: Any variable recorded during a study to assess the effects of a treatment or experimental intervention. Also known as dependent variable, response variable. Power: For a predefined, biologically meaningful effect size, the probability that the statis- tical test will detect the effect if it exists (i.e., the null hypothesis is rejected correctly). Sample size: Number of experimental units per group, also referred to as n. Definitions are adapted from [14,15] and placed in the context of animal research. ARRIVE Essential 10 The ARRIVE Essential 10 (Box 2) constitute the minimum reporting requirement to ensure that reviewers and readers can assess the reliability of the findings presented. There is no rank- ing within the set; items are presented in a logical order. Box 2. ARRIVE Essential 10 1. Study design 2. Sample size 3. Inclusion and exclusion criteria 4. Randomisation PLOS Biology | https://doi.org/10.1371/journal.pbio.3000411 July 14, 2020 4 / 65 PLOS BIOLOGY Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2.0 5. Blinding 6. Outcome measures 7. Statistical methods 8. Experimental animals 9. Experimental procedures 10. Results Item 1. Study design For each experiment, provide brief details of study design including: 1a. The groups being compared, including control groups. If no control group has been used, the rationale should be stated. Explanation. The choice of control or comparator group is dependent on the experimental objective. Negative controls are used to determine whether a difference between groups is caused by the intervention (e.g., wild-type animals versus genetically modified animals, placebo versus active treatment, sham surgery versus surgical intervention). Positive controls can be used to support the interpretation of negative results or determine if an expected effect is detectable. It may not be necessary to include a separate control with no active treatment if, for exam- ple, the experiment aims to compare a treatment administered by different methods (e.g., intraperitoneal administration versus oral gavage) or animals that are used as their own con- trol in a longitudinal study. A pilot study, such as one designed to test the feasibility of a proce- dure, might also not require a control group. For complex study designs, a visual representation is more easily interpreted than a text description, so a timeline diagram or flowchart is recommended. Diagrams facilitate the iden- tification of which treatments and procedures were applied to specific animals or groups of animals and at what point in the study these were performed. They also help to communicate complex design features such as whether factors are crossed or nested (hierarchical/multilevel designs), blocking (to reduce unwanted variation, see Item 4. Randomisation), or repeated measurements over time on the same experimental unit (repeated measures designs); see [16– 18] for more information on different design types. The Experimental Design Assistant (EDA) is a platform to support researchers in the design of in vivo experiments; it can be used to gen- erate diagrams to represent any type of experimental design. For each experiment performed, clearly report all groups used. Selectively excluding some experimental groups (for example, because the data are inconsistent or conflict with the narra- tive of the paper) is misleading and should be avoided. Ensure that test groups, compara- tors, and controls (negative or positive) can be identified easily. State clearly if the same control group was used for multiple experiments or if no control group was used. Examples Subitem 1a—Example 1 ‘The DAV1 study is a one-way, two-period crossover trial with 16 piglets receiving amoxicillin and placebo at period 1 and only amoxicillin at period 2. Amoxicillin was administered orally with a single dose of 30 mg.kg-1. Plasma amoxicillin concentrations PLOS Biology | https://doi.org/10.1371/journal.pbio.3000411 July 14, 2020 5 / 65 PLOS BIOLOGY Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2.0 were collected at same sampling times at each period: 0.5, 1, 1.5, 2, 4, 6, 8, 10 and 12 h’. Subitem 1a—Example 2 Fig 1. Reproduced from reference. https://doi.org/10.1371/journal.pbio.3000411.g001 1b. The experimental unit (e.g., a single animal, litter, or cage of animals). Explanation. Within a design, biological and technical factors will often be organised hier- archically, such as cells within animals and mitochondria within cells, or cages within rooms and animals within cages. Such hierarchies can make determining the sample size difficult (is it the number of animals, cells, or mitochondria?). The sample size is the number of experi- mental units per group. The experimental unit is defined as the biological entity subjected to an intervention independently of all other units, such that it is possible to assign any two experimental units to different treatment groups. It is also sometimes called the unit of rando- misation. In addition, the experimental units should not influence each other on the outcomes that are measured. Commonly, the experimental unit is the individual animal, each independently allocated to a treatment group (e.g., a drug administered by injection). However, the experimental unit may be the cage or the litter (e.g., a diet administered to a whole cage, or a treatment adminis- tered to a dam and investigated in her pups), or it could be part of the animal (e.g., different drug treatments applied topically to distinct body regions of the same animal). Animals may also serve as their own controls, receiving different treatments separated by washout periods; PLOS Biology | https://doi.org/10.1371/journal.pbio.3000411 July 14, 2020 6 / 65 PLOS BIOLOGY Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2.0 here, the experimental unit is an animal for a period of time. There may also be multiple exper- imental units in a single experiment, such as when a treatment is given to a pregnant dam and then the weaned pups are allocated to different diets. See [17,24,25] for further guidance on identifying experimental units. Conflating experimental units with subsamples or repeated measurements can lead to artifi- cial inflation of the sample size. For example, measurements from 50 individual cells from a single mouse represent n = 1 when the experimental unit is the mouse. The 50 measurements are subsamples and provide an estimate of measurement error and so should be averaged or used in a nested analysis. Reporting n = 50 in this case is an example of pseudoreplication. It underestimates the true variability in a study, which can lead to false positives and invalidate the analysis and resulting conclusions [26,27]. If, however, each cell taken from the mouse is then randomly allocated to different treatments and assessed individually, the cell might be regarded as the experimental unit. Clearly indicate the experimental unit for each experiment so that the sample sizes and sta- tistical analyses can be properly evaluated. Examples Subitem 1b—Example 1 ‘The present study used the tissues collected at E15.5 from dams fed the 1X choline and 4X choline diets (n = 3 dams per group, per fetal sex; total n = 12 dams). To ensure statis- tical independence, only one placenta (either male or female) from each dam was used for each experiment. Each placenta, therefore, was considered to be an experimental unit’. Subitem 1b—Example 2 ‘We have used data collected from high-throughput phenotyping, which is based on a pipeline concept where a mouse is characterized by a series of standardized and vali- dated tests underpinned by standard operating procedures (SOPs).... The individual mouse was considered the experimental unit within the studies’. Subitem 1b—Example 3 ‘Fish were divided in two groups according to weight (0.7–1.2 g and 1.3–1.7 g) and ran- domly stocked (at a density of 15 fish per experimental unit) in 24 plastic tanks holding 60 L of water’. Subitem 1b—Example 4 ‘In the study, n refers to number of animals, with five acquisitions from each [corticos- triatal] slice, with a maximum of three slices obtained from each experimental animal used for each protocol (six animals each group)’. Item 2. Sample size 2a. Specify the exact number of experimental units allocated to each group, and the total number in each experiment. Also indicate the total number of animals used. Explanation. The sample size relates to the number of experimental units in each group at the start of the study and is usually represented by n (see Item 1. Study design for further PLOS Biology | https://doi.org/10.1371/journal.pbio.3000411 July 14, 2020 7 / 65 PLOS BIOLOGY Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2.0 guidance on identifying and reporting experimental units). This information is crucial to assess the validity of the statistical model and the robustness of the experimental results. The sample size in each group at the start of the study may be different from the n numbers in the analysis (see Item 3. Inclusion and exclusion criteria); this information helps readers identify attrition or if there have been exclusions and in which group they occurred. Reporting the total number of animals used in the study is also useful to identify whether any were reused between experiments. Report the exact value of n per group and the total number in each experiment (including any independent replications). If the experimental unit is not the animal, also report the total number of animals to help readers understand the study design. For example, in a study inves- tigating diet using cages of animals housed in pairs, the number of animals is double the num- ber of experimental units. Example Subitem 2a –example 1 Fig 2. Reproduced from reference. https://doi.org/10.1371/journal.pbio.3000411.g002 2b. Explain how the sample size was decided. Provide details of any a priori sample size calculation, if done. Explanation. For any type of experiment, it is crucial to explain how the sample size was determined. For hypothesis-testing experiments, in which inferential statistics are used to esti- mate the size of the effect and to determine the weight of evidence against the null hypothesis, the sample size needs to be justified to ensure experiments are of an optimal size to test the research question [33,34] (see Item 13. Objectives). Sample sizes that are too small (i.e., under- powered studies) produce inconclusive results, whereas sample sizes that are too large (i.e., overpowered studies) raise ethical issues over unnecessary use of animals and may produce triv- ial findings that are statistically significant but not biologically relevant. Low power has three effects: first, within the experiment, real effects are more likely to be missed; second, when an effect is detected, this will often be an overestimation of the true effect size ; and finally, when low power is combined with publication bias, there is an increase in the false positive rate PLOS Biology | https://doi.org/10.1371/journal.pbio.3000411 July 14, 2020 8 / 65 PLOS BIOLOGY Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2.0 in the published literature. Consequently, low-powered studies contribute to the poor internal validity of research and risk wasting animals used in inconclusive research. Study design can influence the statistical power of an experiment, and the power calculation used needs to be appropriate for the design implemented. Statistical programmes to help per- form a priori sample size calculations exist for a variety of experimental designs and statistical analyses, both freeware (web-based applets and functions in R) and commercial software [38– 40]. Choosing the appropriate calculator or algorithm to use depends on the type of outcome measures and independent variables, and the number of groups. Consultation with a statisti- cian is recommended, especially when the experimental design is complex or unusual. When the experiment tests the effect of an intervention on the mean of a continuous out- come measure, the sample size can be calculated a priori, based on a mathematical relationship between the predefined, biologically relevant effect size, variability estimated from prior data, chosen significance level, power, and sample size (see Box 3 and [17,41] for practical advice). If you have used an a priori sample size calculation, report the analysis method (e.g., two-tailed Student t test with a 0.05 significance threshold) the effect size of interest and a justification explaining why an effect size of that magnitude is relevant the estimate of variability used (e.g., standard deviation) and how it was estimated the power selected Box 3. Information used in a power calculation Sample size calculation is based on a mathematical relationship between the following parameters: effect size, variability, significance level, power, and sample size. Questions to consider are the following: The primary objective of the experiment—What is the main outcome measure? The primary outcome measure should be identified in the planning stage of the experi- ment; it is the outcome of greatest importance, which will answer the main experimental question. The predefined effect size—What is a biologically relevant effect size? The effect size is estimated as a biologically relevant change in the primary outcome measure between the groups under study. This can be informed by similar studies and involves scientists exploring what magnitude of effect would generate interest and would be worth taking forward into further work. In preclinical studies, the clinical relevance of the effect should also be taken into consideration. What is the estimate of variability? Estimates of variability can be obtained From data collected from a preliminary experiment conducted under identical condi- tions to the planned experiment, e.g., a previous experiment in the same laboratory, testing the same treatment under similar conditions on animals with the same characteristics PLOS Biology | https://doi.org/10.1371/journal.pbio.3000411 July 14, 2020 9 / 65 PLOS BIOLOGY Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2.0 From the control group in a previous experiment testing a different treatment From a similar experiment reported in the literature Significance threshold—What risk of a false positive is acceptable? The significance level or threshold (α) is the probability of obtaining a false positive. If it is set at 0.05, then the risk of obtaining a false positive is 1 in 20 for a single statistical test. However, the threshold or the p-values will need to be adjusted in scenarios of mul- tiple testing (e.g., by using a Bonferroni correction). Power—What risk of a false negative is acceptable? For a predefined, biologically meaningful effect size, the power (1 − β) is the probability that the statistical test will detect the effect if it genuinely exists (i.e., true positive result). A target power between 80% and 95% is normally deemed acceptable, which entails a risk of false negative between 5% and 20%. Directionality—Will you use a one- or two-sided test? The directionality of a test depends on the distribution of the test statistics for a given analysis. For tests based on t or z distributions (such as t tests), whether the data will be analysed using a one- or two-sided test relates to whether the alternative hypothesis is directional or not. An experiment with a directional (one-sided) alternative hypothesis can be powered and analysed with a one-sided test with the goal of maximising the sensi- tivity to detect this directional effect. Controversy exists within the statistics community on when it is appropriate to use a one-sided test. The use of a one-sided test requires justification of why a treatment effect is only of interest when it is in a defined direction and why they would treat a large effect in the unexpected direction no differently from a nonsignificant difference. Following the use of a one-sided test, the investigator can- not then test for the possibility of missing an effect in the untested direction. Choosing a one-tailed test for the sole purpose of attaining statistical significance is not appropriate. Two-sided tests with a nondirectional alternative hypothesis are much more common and allow researchers to detect the effect of a treatment regardless of its direction. Note that analyses such as ANOVA and chi-squared are based on asymmetrical distribu- tions (F-distribution and chi-squared distribution) with only one tail. Therefore, these tests do not have a directionality option. There are several types of studies in which a priori sample size calculations are not appro- priate. For example, the number of animals needed for antibody or tissue production is deter- mined by the amount required and the production ability of an individual animal. For studies in which the outcome is the successful generation of a sample or a condition (e.g., the produc- tion of transgenic animals), the number of animals is determined by the probability of success of the experimental procedure. In early feasibility or pilot studies, the number of animals required depends on the pur- pose of the study. When the objective of the preliminary study is primarily logistic or opera- tional (e.g., to improve procedures and equipment), the number of animals needed is generally small. In such cases, power calculations are not appropriate and sample sizes can PLOS Biology | https://doi.org/10.1371/journal.pbio.3000411 July 14, 2020 10 / 65 PLOS BIOLOGY Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2.0 be estimated based on operational capacity and constraints. Pilot studies alone are unlikely to provide adequate data on variability for a power calculation for future experi- ments. Systematic reviews and previous studies are more appropriate sources of information on variability. If no power calculation was used to determine the sample size, state this explicitly and provide the reasoning that was used to decide on the sample size per group. Regardless of whether a power calculation was used or not, when explaining how the sample size was determined take into consideration any anticipated loss of animals or data, for example, due to exclusion criteria established upfront or expected attrition (see Item 3. Inclusion and exclusion criteria). Examples Subitem 2b—Example 1 ‘The sample size calculation was based on postoperative pain numerical rating scale (NRS) scores after administration of buprenorphine (NRS AUC mean = 2.70; noninfer- iority limit = 0.54; standard deviation = 0.66) as the reference treatment... and also Glasgow Composite Pain Scale (GCPS) scores... using online software (Experimental design assistant; https://eda.nc3rs.org.uk/eda/login/auth). The power of the experiment was set to 80%. A total of 20 dogs per group were considered necessary’. Subitem 2b—Example 2 ‘We selected a small sample size because the bioglass prototype was evaluated in vivo for the first time in the present study, and therefore, the initial intention was to gather basic evidence regarding the use of this biomaterial in more complex experimental designs’. Item 3. Inclusion and exclusion criteria 3a. Describe any criteria used for including or excluding animals (or experimental units) during the experiment, and data points during the analysis. Specify if these criteria were established a priori. If no criteria were set, state this explicitly. Explanation. Inclusion and exclusion criteria define the eligibility or disqualification of animals and data once the study has commenced. To ensure scientific rigour, the criteria should be defined before the experiment starts and data are collected [8,33,48,49]. Inclusion criteria should not be confused with animal characteristics (see Item 8. Experimental animals) but can be related to these (e.g., body weights must be within a certain range for a particular procedure) or related to other study parameters (e.g., task performance has to exceed a given threshold). In studies in which selected data are reanalysed for a different purpose, inclusion and exclusion criteria should describe how data were selected. Exclusion criteria may result from technical or welfare issues such as complications anticipated during surgery or circumstances in which test procedures might be compro- mised (e.g., development of motor impairments that could affect behavioural measure- ments). Criteria for excluding samples or data include failure to meet quality control PLOS Biology | https://doi.org/10.1371/journal.pbio.3000411 July 14, 2020 11 / 65 PLOS BIOLOGY Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2.0 standards, such as insufficient sample volumes, unacceptable levels of contaminants, poor histological quality, etc. Similarly, how the researcher will define and handle data outliers during the analysis should also be decided before the experiment starts (see subitem 3b for guidance on responsible data cleaning). Exclusion criteria may also reflect the ethical principles of a study in line with its humane endpoints (see Item 16. Animal care and monitoring). For example, in cancer studies, an ani- mal might be dropped from the study and euthanised before the predetermined time point if the size of a subcutaneous tumour exceeds a specific volume. If losses are anticipated, these should be considered when determining the number of animals to include in the study (see Item 2. Sample size). Whereas exclusion criteria and humane endpoints are typically included in the ethical review application, reporting the criteria used to exclude animals or data points in the manuscript helps readers with the interpretation of the data and provides crucial information to other researchers wanting to adopt the model. Best practice is to include all a priori inclusion and exclusion/outlier criteria in a preregis- tered protocol (see Item 19. Protocol registration). At the very least, these criteria should be documented in a laboratory notebook and reported in manuscripts, explicitly stating that the criteria were defined before any data was collected. Example Subitem 3a—Example 1 ‘The animals were included in the study if they underwent successful MCA occlusion (MCAo), defined by a 60% or greater drop in cerebral blood flow seen with laser Dopp- ler flowmetry. The animals were excluded if insertion of the thread resulted in perfora- tion of the vessel wall (determined by the presence of sub-arachnoid blood at the time of sacrifice), if the silicon tip of the thread became dislodged during withdrawal, or if the animal died prematurely, preventing the collection of behavioral and histological data’. 3b. For each experimental group, report any animals, experimental units, or data points not included in the analysis and explain why. If there were no exclusions, state so. Explanation. Animals, experimental units, or data points that are unaccounted for can lead to instances in which conclusions cannot be supported by the raw data. Reporting exclu- sions and attritions provides valuable information to other investigators evaluating the results or who intend to repeat the experiment or test the intervention in other species. It may also provide important safety information for human trials (e.g., exclusions related to adverse effects). There are many legitimate reasons for experimental attrition, some of which are anticipated and controlled for in advance (see subitem 3a on defining exclusion and inclusion criteria), but some data loss might not be anticipated. For example, data points may be excluded from analyses because of an animal receiving the wrong treatment, unexpected drug toxicity, infec- tions or diseases unrelated to the experiment, sampling errors (e.g., a malfunctioning assay that produced a spurious result, inadequate calibration of equipment), or other human error (e.g., forgetting to switch on equipment for a recording). PLOS Biology | https://doi.org/10.1371/journal.pbio.3000411 July 14, 2020 12 / 65 PLOS BIOLOGY Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2.0 Most statistical analysis methods are extremely sensitive to outliers and missing data. In some instances, it may be scientifically justifiable to remove outlying data points from an analysis, such as obvious errors in data entry or measurement with readings that are outside a plausible range. Inappropriate data cleaning has the potential to bias study outcomes ; providing the reasoning for removing data points enables the distinction to be made between responsible data cleaning and data manipulation. Missing data, common in all areas of research, can impact the sensitivity of the study and also lead to biased estimates, distorted power, and loss of information if the missing values are not random. Analysis plans should include methods to explore why data are missing. It is also important to consider and justify analysis methods that account for missing data [55,56]. There is a movement toward greater data sharing (see Item 20. Data access), along with an increase in strategies such as code sharing to enable analysis replication. These practices, how- ever transparent, still need to be accompanied by a disclosure on the reasoning for data clean- ing and whether methods were defined before any data were collected. Report all animal exclusions and loss of data points, along with the rationale for their exclu- sion. For example, this information can be summarised as a table or a flowchart describing attrition in each treatment group. Accompanying this information should be an explicit description of whether researchers were blinded to the group allocations when data or animals were excluded (see Item 5. Blinding and ). Explicitly state when built-in models in statistics packages have been used to remove outliers (e.g., GraphPad Prism’s outlier test). Examples Subitem 3b—Example 1 ‘Pen was the experimental unit for all data. One entire pen (ZnAA90) was removed as an outlier from both Pre-RAC and RAC periods for poor performance caused by illness unrelated to treatment.... Outliers were determined using Cook’s D statistic and removed if Cook’s D > 0.5. One steer was determined to be an outlier for day 48 liver biopsy TM and data were removed’. Subitem 3b—Example 2 ‘Seventy-two SHRs were randomized into the study, of which 13 did not meet our inclu- sion and exclusion criteria because the drop in cerebral blood flow at occlusion did not reach 60% (seven animals), postoperative death (one animal: autopsy unable to identify the cause of death), haemorrhage during thread insertion (one animal), and disconnec- tion of the silicon tip of the thread during withdrawal, making the permanence of reper- fusion uncertain (four animals). A total of 59 animals were therefore included in the analysis of infarct volume in this study. In error, three animals were sacrificed before their final assessment of neurobehavioral score: one from the normothermia/water group and two from the hypothermia/pethidine group. These errors occurred blinded to treatment group allocation. A total of 56 animals were therefore included in the analysis of neurobehavioral score’. Subitem 3b—Example 3 PLOS Biology | https://doi.org/10.1371/journal.pbio.3000411 July 14, 2020 13 / 65 PLOS BIOLOGY Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2.0 Fig 3. Reproduced from reference. https://doi.org/10.1371/journal.pbio.3000411.g003 3c. For each analysis, report the exact value of n in each experimental group. Explanation. The exact number of experimental units analysed in each group (i.e., the n number) is essential information for the reader to interpret the analysis; it should be reported unambiguously. All animals and data used in the experiment should be accounted for in the PLOS Biology | https://doi.org/10.1371/journal.pbio.3000411 July 14, 2020 14 / 65 PLOS BIOLOGY Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2.0 data presented. Sometimes, for good reasons, animals may need to be excluded from a study (e.g., illness or mortality), or data points excluded from analyses (e.g., biologically implausible values). Reporting losses will help the reader to understand the experimental design process, replicate methods, and provide adequate tracking of animal numbers in a study, especially when sample size numbers in the analyses do not match the original group numbers. For each outcome measure, indicate numbers clearly within the text or on figures and pro- vide absolute numbers (e.g., 10/20, not 50%). For studies in which animals are measured at dif- ferent time points, explicitly report the full description of which animals undergo measurement and when. Examples Subitem 3c—Example 1 ‘Group F contained 29 adult males and 58 adult females in 2010 (n = 87), and 32 adult males and 66 adult females in 2011 (n = 98). The increase in female numbers was due to maturation of juveniles to adults. Females belonged to three matrilines, and there were no major shifts in rank in the male hierarchy. Six mid to low ranking individuals died and were excluded from analyses, as were five mid-ranking males who emigrated from the group at the beginning of 2011’. Subitem 3c—Example 2 ‘The proportion of test time that animals spent interacting with the handler (sniffed the gloved hand or tunnel, made paw contact, climbed on, or entered the handling tunnel) was measured from DVD recordings. This was then averaged across the two mice in each cage as they were tested together and their behaviour was not independent.... Mice handled with the home cage tunnel spent a much greater proportion of the test interact- ing with the handler (mean ± s.e.m., 39.8 ± 5.2 percent time of 60 s test, n = 8 cages) than those handled by tail (6.4 ± 2.0 percent time, n = 8 cages), while those handled by cupping showed intermediate levels of voluntary interaction (27.6 ± 7.1 percent time, n = 8 cages)’. Item 4. Randomisation 4a. State whether randomisation was used to allocate experimental units to control and treat- ment groups. If done, provide the method used to generate the randomisation sequence. Explanation. Using appropriate randomisation methods during the allocation to groups ensures that each experimental unit has an equal probability of receiving a particular treatment and provides balanced numbers in each treatment group. Selecting an animal ‘at random’ (i.e., haphazardly or arbitrarily) from a cage is not statistically random, as the process involves human judgement. It can introduce bias that influences the results, as a researcher may (con- sciously or subconsciously) make judgements in allocating an animal to a particular group, or because of unknown and uncontrolled differences in the experimental conditions or animals in different groups. Using a validated method of randomisation helps minimise selection bias and reduce systematic differences in the characteristics of animals allocated to different groups [62–64]. Inferential statistics based on nonrandomised group allocation are not valid [65,66]. Thus, the use of randomisation is a prerequisite for any experiment designed to test a PLOS Biology | https://doi.org/10.1371/journal.pbio.3000411 July 14, 2020 15 / 65 PLOS BIOLOGY Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2.0 hypothesis. Examples of appropriate randomisation methods include online random number generators (e.g., https://www.graphpad.com/quickcalcs/randomize1/) or a function like Rand () in spreadsheet software such as Excel, Google Sheets, or LibreOffice. The EDA has a dedi- cated feature for randomisation and allocation concealment. Systematic reviews have shown that animal experiments that do not report randomisation or other bias-reducing measures such as blinding are more likely to report exaggerated effects that meet conventional measures of statistical significance [67–69]. It is especially important to use randomisation in situations in which it is not possible to blind all or parts of the experi- ment, but even with randomisation, researcher bias can pervert the allocation. This can be avoided by using allocation concealment (see Item 5. Blinding). In studies in which sample sizes are small, simple randomisation may result in unbalanced groups; here, randomisation strategies to balance groups such as randomising in matched pairs [70–72] and blocking are encouraged. Reporting the precise method used to allocate animals or experimental units to groups enables readers to assess the reliability of the results and identify potential limitations. Report the type of randomisation used (simple, stratified, randomised complete blocks, etc.; see Box 4), the method used to generate the randomisation sequence (e.g., computer-gen- erated randomisation sequence, with details of the algorithm or programme used), and what was randomised (e.g., treatment to experimental unit, order of treatment for each animal). If this varies between experiments, report this information specifically for each experiment. If randomisation was not the method used to allocate experimental units to groups, state this explicitly and explain how the groups being compared were formed. Box 4. Considerations for the randomisation strategy Simple randomisation All animals/samples are simultaneously randomised to the treatment groups without considering any other variable. This strategy is rarely appropriate, as it cannot ensure that comparison groups are balanced for other variables that might influence the result of an experiment. Randomisation within blocks Blocking is a method of controlling natural variation among experimental units. This splits up the experiment into smaller subexperiments (blocks), and treatments are ran- domised to experimental units within each block [17,66,73]. This takes into account nui- sance variables that could potentially bias the results (e.g., cage location, day or week of procedure). Stratified randomisation uses the same principle as randomisation within blocks, only the strata tend to be traits of the animal that are likely to be associated with the response (e.g., weight class or tumour size class). This can lead to differences in the practical implementation of stratified randomisation as compared with block randomisation (e.g., there may not be equal numbers of experimental units in each weight class). Other randomisation strategies Minimisation is an alternative strategy to allocate animals/samples to treatment group to balance variables that might influence the result of an experiment. With minimisation, the treatment allocated to the next animal/sample depends on the characteristics of those animals/samples already assigned. The aim is that each allocation should minimise PLOS Biology | https://doi.org/10.1371/journal.pbio.3000411 July 14, 2020 16 / 65 PLOS BIOLOGY Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2.0 the imbalance across multiple factors. This approach works well for a continuous nuisance variable such as body weight or starting tumour volume. Examples of nuisance variables that can be accounted for in the randomisation strategy Time or day of the experiment Litter, cage, or fish tank Investigator or surgeon—different level of experience in the people administering the treatments, performing the surgeries, or assessing the results may result in varying stress levels in the animals or duration of anaesthesia Equipment (e.g., PCR machine, spectrophotometer)—calibration may vary Measurement of a study parameter (e.g., initial tumour volume) Animal characteristics (e.g., sex, age bracket, weight bracket) Location—exposure to light, ventilation, and disturbances may vary in cages located at different height or on different racks, which may affect important physiological processes Implication for the analysis If blocking factors are used in the randomisation, they should also be included in the anal- ysis. Nuisance variables increase variability in the sample, which reduces statistical power. Including a nuisance variable as a blocking factor in the analysis accounts for that vari- ability and can increase the power, thus increasing the ability to detect a real effect with fewer experimental units. However, blocking uses up degrees of freedom and thus reduces the power if the nuisance variable does not have a substantial impact on variability. Examples Subitem 4a—Example 1 ‘Fifty 12-week-old male Sprague-Dawley rats, weighing 320–360g, were obtained from Guangdong Medical Laboratory Animal Center (Guangzhou, China) and randomly divided into two groups (25 rats/group): the intact group and the castration group. Random numbers were generated using the standard = RAND() function in Microsoft Excel’. Subitem 4a—Example 2 ‘Animals were randomized after surviving the initial I/R, using a computer based ran- dom order generator’. Subitem 4a—Example 3 ‘At each institute, phenotyping data from both sexes is collected at regular intervals on age-matched wildtype mice of equivalent genetic backgrounds. Cohorts of at least seven homozygote mice of each sex per pipeline were generated.... The random allocation of mice to experimental group (wildtype versus knockout) was driven by Mendelian Inher- itance’. PLOS Biology | https://doi.org/10.1371/journal.pbio.3000411 July 14, 2020 17 / 65 PLOS BIOLOGY Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2.0 4b. Describe the strategy used to minimise potential confounders such as the order of treatments and measurements, or animal/cage location. If confounders were not con- trolled, state this explicitly. Explanation. Ensuring there is no systematic difference between animals in different groups apart from the experimental exposure is an important principle throughout the con- duct of the experiment. Identifying nuisance variables (sources of variability or conditions that could potentially bias results) and managing them in the design and analysis increases the sen- sitivity of the experiment. For example, rodents in cages at the top of the rack may be exposed to higher light levels, which can affect stress. Reporting the strategies implemented to minimise potential differences that arise between treatment groups during the course of the experiment enables others to assess the internal validity. Strategies to report include standardising (keeping conditions the same, e.g., all sur- geries done by the same surgeon), randomising (e.g., the sampling or measurement order), and blocking or counterbalancing (e.g., position of animal cages or tanks on the rack), to ensure groups are similarly affected by a source of variability. In some cases, practical constraints prevent some nuisance variables from being randomised, but they can still be accounted for in the analysis (see Item 7. Statistical methods). Report the methods used to minimise confounding factors alongside the methods used to allocate animals to groups. If no measures were used to minimise confounders (e.g., treatment order, measurement order, cage or tank position on a rack), explicitly state this and explain why. Examples Subitem 4b—Example 1 ‘Randomisation was carried out as follows. On arrival from El-Nile Company, animals were assigned a group designation and weighed. A total number of 32 animals were divided into four different weight groups (eight animals per group). Each animal was assigned a temporary random number within the weight range group. On the basis of their position on the rack, cages were given a numerical designation. For each group, a cage was selected randomly from the pool of all cages. Two animals were removed from each weight range group and given their permanent numerical designation in the cages. Then, the cages were randomized within the exposure group’. Subitem 4b—Example 2 ‘... test time was between 08.30am to 12.30pm and testing order was randomized daily, with each animal tested at a different time each test day’. Subitem 4b—Example 3 ‘Bulls were blocked by BW into four blocks of 905 animals with similar BW and then within each block, bulls were randomly assigned to one of four experimental treat- ments in a completely randomized block design resulting in 905 animals per treatment. Animals were allocated to 20 pens (181 animals per pen and five pens per treatment)’. PLOS Biology | https://doi.org/10.1371/journal.pbio.3000411 July 14, 2020 18 / 65 PLOS BIOLOGY Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2.0 Item 5. Blinding Describe who was aware of the group allocation at the different stages of the experiment (during the allocation, the conduct of the experiment, the outcome assessment, and the data analysis). Explanation. Researchers often expect a particular outcome and can unintentionally influ- ence the experiment or interpret the data in such a way as to support their preferred hypothesis. Blinding is a strategy used to minimise these subjective biases. Although there is primary evidence of the impact of blinding in the clinical literature that directly compares blinded versus unblinded assessment of outcomes , there is limited empirical evidence in animal research [83,84]. There are, however, compelling data from sys- tematic reviews showing that nonblinded outcome assessment leads to the treatment effects being overestimated, and the lack of bias-reducing measures such as randomisation and blind- ing can contribute to as much as 30%–45% inflation of effect sizes [67,68,85]. Ideally, investigators should be unaware of the treatment(s) animals have received or will be receiving, from the start of the experiment until the data have been analysed. If this is not pos- sible for every stage of an experiment (see Box 5), it should always be possible to conduct at least some of the stages blind. This has implications for the organisation of the experiment and may require help from additional personnel—for example, a surgeon to perform interventions, a technician to code the treatment syringes for each animal, or a colleague to code the treat- ment groups for the analysis. Online resources are available to facilitate allocation concealment and blinding. Box 5. Blinding during different stages of an experiment During allocation Allocation concealment refers to concealing the treatment to be allocated to each indi- vidual animal from those assigning the animals to groups, until the time of assignment. Together with randomisation, allocation concealment helps minimise selection bias, which can introduce systematic differences between treatment groups. During the conduct of the experiment When possible, animal care staff and those who administer treatments should be unaware of allocation groups to ensure that all animals in the experiment are handled, monitored, and treated in the same way. Treating different groups differently based on the treatment they have received could alter animal behaviour and physiology and pro- duce confounds. Welfare or safety reasons may prevent blinding of animal care staff, but in most cases, blinding is possible. For example, if hazardous microorganisms are used, control animals can be considered as dangerous as infected animals. If a welfare issue would only be tol- erated for a short time in treated but not control animals, a harm-benefit analysis is needed to decide whether blinding should be used. During the outcome assessment The person collecting experimental measurements or conducting assessments should not know which treatment each sample/animal received and which samples/animals are grouped together. Blinding is especially important during outcome assessment, particularly PLOS Biology | https://doi.org/10.1371/journal.pbio.3000411 July 14, 2020 19 / 65 PLOS BIOLOGY Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2.0 if there is a subjective element (e.g., when assessing behavioural changes or reading histo- logical slides). Randomising the order of examination can also reduce bias. If the person assessing the outcome cannot be blinded to the group allocation (e.g., obvi- ous phenotypic or behavioural differences between groups), some, but not all, of the sources of bias could be mitigated by sending data for analysis to a third party who has no vested interest in the experiment and does not know whether a treatment is expected to improve or worsen the outcome. During the data analysis The person analysing the data should know which data are grouped together to enable group comparisons but should not be aware of which specific treatment each group received. This type of blinding is often neglected but is important, as the analyst makes many semisubjective decisions such as applying data transformation to outcome mea- sures, choosing methods for handling missing data, and handling outliers. How these decisions will be made should also be decided a priori. Data can be coded prior to analysis so that the treatment group cannot be identified before analysis is completed. Specify whether blinding was used or not for each step of the experimental process (see Box 5) and indicate what particular treatment or condition the investigators were blinded to, or aware of. If blinding was not used at any of the steps outlined in Box 5, explicitly state this and pro- vide the reason why blinding was not possible or not considered. Examples Item 5—Example 1 ‘For each animal, four different investigators were involved as follows: a first investigator (RB) administered the treatment based on the randomization table. This investigator was the only person aware of the treatment group allocation. A second investigator (SC) was responsible for the anaesthetic procedure, whereas a third investigator (MS, PG, IT) performed the surgical procedure. Finally, a fourth investigator (MAD) (also unaware of treatment) assessed GCPS and NRS, mechanical nociceptive threshold (MNT), and sedation NRS scores’. Item 5—Example 2 ‘... due to overt behavioral seizure activity the experimenter could not be blinded to whether the animal was injected with