Biostatistics & Research Methodology Course (BRM 497) PDF
Document Details
Uploaded by EasyPyramidsOfGiza
Taibah University
2024
Tags
Related
- Research Methodology in Dentistry. The Scientific Method (2023-2024) PDF
- Preventative Medicine & Public Health Handout (October 2023 PLE Batch) PDF
- Research Methodology, Instrumentation, and Biostatistics PDF
- Biomedical Research & Biostatistics Introduction PDF
- Research Methods & Biostatistics MCQ PDF
- Research Methods PDF
Summary
This document is from a course on biostatistics and research methodology. It provides an introduction to epidemiology, its key components, and practical examples that exemplify the various concepts.
Full Transcript
Taibah University College of Medical Rehabilitation Sciences Respiratory Therapy Department 1446 (2024-2025) Biostatistics & Research Methodology Course (BRM 497) Introduction Definition of epidemiolo...
Taibah University College of Medical Rehabilitation Sciences Respiratory Therapy Department 1446 (2024-2025) Biostatistics & Research Methodology Course (BRM 497) Introduction Definition of epidemiology The word epidemiology comes from the Greek words epi, meaning on or upon, demos, meaning people, and logos, meaning the study of. In other words, the word epidemiology has its roots in the study of what befalls a population. Many definitions have been proposed, but the following definition captures the underlying principles and public health spirit of epidemiology: Epidemiology is the study (scientific, systematic, data-driven) of the distribution (frequency, pattern) and determinants (causes, risk factors) of health-related states or events (not just diseases) in specified population populations (patient is community, individuals viewed collectively), , and the application of (since epidemiology is a discipline within public health) this study to the control of health problems. Key terms in this definition reflect some of the important principles of epidemiology. Exercise Below are three key terms taken from the definition of epidemiology, followed by a list of activities that an epidemiologist might perform. Match the term to the activity that best describes it. You should match only one term per activity. 1.Distribution 2.Determinants 3.Application 4.____ 1. Compare food histories between persons with Staphylococcus food poisoning and those without 5.____ 2. Compare the frequency of brain cancer among anatomists with frequency in general population 6.____ 3. Mark on a map the residences of all children born with birth defects within 2 miles of a hazardous waste site 7.____ 4. Graph the number of cases of congenital syphilis by year for the country 8.____ 5. Recommend that close contacts of a child recently reported with meningococcal meningitis receive Rifampin 9.____ 6. Tabulate the frequency of clinical signs, symptoms, and laboratory findings among children with chickenpox in Cincinnati, Ohio Exercise Below are three key terms taken from the definition of epidemiology, followed by a list of activities that an epidemiologist might perform. Match the term to the activity that best describes it. You should match only one term per activity. A. Distribution B. Determinants C. Application 1.Determinants Compare food histories between persons with Staphylococcus food poisoning and those without 2.Determinants Compare the frequency of brain cancer among anatomists with frequency in general population 3.Distribution Mark on a map the residences of all children born with birth defects within 2 miles of a hazardous waste site 4.Distribution Graph the number of cases of congenital syphilis by year for the country 5.Application Recommend that close contacts of a child recently reported with meningococcal meningitis receive Rifampin 6.Distribution Tabulate the frequency of clinical signs, symptoms, and laboratory findings among children with chickenpox in Cincinnati, Ohio Research Research: refers to a search for knowledge a scientific and systematic search for pertinent information on a specific topic research is an art of scientific investigation. a careful investigation or inquiry specially through search for new facts in any branch of knowledge.” “systematized effort to gain new knowledge.” Research definition the systematic investigation into and study of materials and sources in order to establish facts and reach new conclusions. Examine a phenomenon Provide professional credibility Guide clinical practice Purpose of Scientific Guide clinical decisions Research Program evaluation Inform policy Research Methods versus Methodology Methodology is the systematic, theoretical analysis of the methods applied to a field of study. It comprises the theoretical analysis of the body of methods and principles associated with a branch of knowledge. A methodology does not set out to provide solutions - it is, therefore, not the same thing as a method. Research methods may be understood as all those methods/techniques that are used for conduction of research. Research methods or techniques*, thus, refer to the methods the researchers use in performing research operations methods can be put into the following three groups: ❖Methods which are concerned with the collection of data. These methods will be used where the data already available are not sufficient to arrive at the required solution; ❖Statistical techniques which are used for establishing relationships between the data and the unknowns; ❖Methods which are used to evaluate the accuracy of the results obtained Research design The preparation of the research design, appropriate for a particular research problem, involves usually the consideration of the following: ❖The means of obtaining the information; ❖ The availability and skills of the researcher and his staff (if any); ❖Explanation of the way in which selected means of obtaining information will be organized and the reasoning leading to the selection ❖The time available for research ❖The cost factor relating to research, i.e., the finance available for the purpose. Research design What is the study about? Why the study is being made ? Where will the study be carried out? What is the type of data required ? Where can the required data be found? What periods of time the study will include ? What will be the sample design? What techniques of data collection will be used? How will the data be analysed ? In what style will the report be prepared ? Research steps Literature review/Gap Formulate hypothesis or Define research problem analysis research question Design your research (study type, target Execution of project & Analysis of data population, sampling data collection technique & size, methods and techniques, Interpretation of data and Reporting testing hypothesis Research Literature review /gap analysis Identify and Discussion prioritize Problem Aim and Reporting objectives Planning Data analysis Methodology Data Execution presentation Good quality research A good research method should lead to ❖ Originality/ Novelty ❖ Contribution to knowledge ❖Significance ❖Technical soundness ❖Critical assessment of existing work Study design A study design is a specific plan or protocol for conducting the study, which allows the investigator to translate the conceptual hypothesis into an operational one. Research study design is a framework, or the set of methods and procedures used to collect and analyze data on variables specified in a particular research problem. Anatomy of research report How to read a scientific paper? ◦ Reading a scientific paper is a skill you can learn ◦ It takes time to develop and it’s a continuous process ◦ Reading a scientific paper is different than reading an article about science in a newspaper ◦ You need to take notes and you may read the paper’s sections in different orders than the way presented How to read a scientific paper? ◦ Sometimes you need to read previous papers to understand the paper that you are reading ◦ e.g. authors described a technique briefly and refer the readers to their previous publication for detailed information about the technique Skimming Skimming is a strategic, selective reading method in which you focus on the main ideas of a text. Skimming a text means reading it quickly in order to get the main idea or gist of the text When skimming, deliberately skip text that provides details, stories, data, or other elaboration. Instead of closely reading every word, focus on the introduction, chapter summaries, first and last sentences of paragraphs, bold words, and text features. Skimming is extracting the essence of the author’s main messages rather than the finer points. ❖To get Main ideas ❖Reading before reading (previewing) ❖Reading after you read (reviewing) ❖Evaluating which to read (choose resources to use) Save time ❖ Scanning Scanning a text means looking through it quickly to find specific information. Scanning is commonly used in everyday life, for example when looking up a word in a dictionary or finding your friend's name in the contacts directory of your phone Scanning, too, uses keywords and organizational clues. To answer questions Detailed reading/ intensive reading Detailed reading: for extracting information accurately Where you read every word, and work to learn from the text. In this careful reading, you may find it helpful to skim first, to get a general idea, but then go back to read in detail. Use a dictionary to make sure you understand all the words used. Extensive reading Class activity Read the article and identify the following: Readiness of Respiratory Therapists in Saudi Arabia to Manage Patients with COVID-19: A Cross-Sectional Study (https://journals.lww.com/sccj/fulltext/2021/05020/readiness_of_respiratory_therapists_in_saudi.3.aspx) ▪What Information is presented in the abstract? ▪What kind of information is found in the introduction? ▪What kind of information is presented in the methods section? ▪What information presented in the results? And the discussion section? ▪What is the difference between the information found in the results and discussion sections? Taibah University College of Medical Rehabilitation Sciences Respiratory Therapy Department Biostatistics & Research Methodology Course (BRM 497) Study design A study design is a specific plan or protocol for conducting the study, which allows the investigator to translate the conceptual hypothesis into an operational one. Research study design is a framework, or the set of methods and procedures used to collect and analyze data on variables specified in a particular research problem. Formulate hypothesis or Define research problem Literature review research question Design your research (study type, target Execution of project & Analysis of data population, sampling data collection technique & size, methods and techniques, Literature review Literature review Interpretation of data and Reporting testing hypothesis Planning research Literature review Study type Sample Population Statistics Validity and reliability Variable Exposure Outcome Bias you need to find and read published research papers to understand what has been done so far in your field of research. This is a critical preparatory step. and it is often called "literature survey", "literature search", or "literature review". Research Research Research idea question design Literature review What it is? Scientific account of what has already been published on a specific topic What it is not? Description or summary of all published information Literature reviews must…. 1. Be organized around a specific research question/ topic 2. Synthesize information into a summary of what is and is not known 3. Identify areas of uncertainty/ controversy 4. Create questions that need further research Literature review involves two processes 1. Searching for information 2. Critically appraising the literature PICO Population Intervention/Exposure Comparison Outcome In title, abstract, research question, … Sources for literature review ❖Scholarly Journals ❖Books ❖Reports ❖Library ❖Internet & Websites Databases A database is a collection of information, so that it can be searched easily. It can be in a library (either paper or electronic) It consists of a collection of data (journal articles, books, encyclopedias, newspapers, magazines, datasets, video, audio, … It gives more accurate and trusted results than googling it Why search databases Get more Doing Preparing information Study Lecturing research presentation on a topic Search engine versus Database A search engine, like Google, uses computer algorithms to search the Internet and find websites that match the keywords you enter. Library databases search for published and academic resources, including articles in journals, newspapers, and magazines. Library databases can be general or discipline-specific (e.g. an engineering database). Databases search Increase Reduce Restrict absolute Time saving number relevant Numbers number titles titles Needed to Read Academic databases ❖Paid/subscription ❖Scopus ❖Web of Science ❖MEDLINE ❖African journals online (free abstracts/subscription full texts) ❖Freely available (open) ❖Google Scholar ❖PubMed ❖Mendeley ❖SciELO ❖ArXiv (preprints) Full list of academic data bases are available at https://en.wikipedia.org/wiki/List_of_academic_databases_and_search_engines Eman Sobh Keywords Keywords are the words and phrases that people type into search engines to find what they're looking for. For example, if you were looking to buy a raincoat, you might type something like “raincoat” into Google. Even if the phrase consists of more than one word (eg “men’s raincoat”, it's still a keyword. MeSH terms Medical Subject Heading ( MeSH): is a controlled and hierarchically-organized vocabulary produced by the National Library of Medicine. It is used for indexing, cataloging, and searching of biomedical and health-related information. MeSH includes the subject headings appearing in MEDLINE/PubMed, the NLM Catalog, and other NLM databases. National Library of Medicine https://www.ncbi.nlm.nih.gov/mesh/ https://meshb.nlm.nih.gov/search try to search for “children” ❖Is this a MeSH term ❖write terms which appears to you Web/database Searches Select a Search Engine or Directory or database Document findings Determine Search Options and Protocol Print or download search Construct search query findings ◦Review and evaluate search results Download full-text source ◦Relevancy Print full-text source ◦Quantity ◦Timeliness Retrieve or request articles ◦Modify search query Search online catalog ◦Check bibliography Interlibrary loan new key words, Interlibrary delivery other authors Supplement results with Web ◦Link directly ◦Adapt original search query sources ◦Create new search query with new key words ◦Search for other works by same author(s) ◦Search using a different search engine Constructing a search query Boolean Operators NOT/AND NOT- eliminates terms award NOT trophy ADJ - order key terms within your search assisted ADJ living OR - for plurals, synonyms spelling variations (woman OR women) AND - narrows your search advertising AND bibliography ? or * - to truncate a term nur* for nurse, nursing child* for children, childhood,.. “X” for phrase searching “pulmonary tuberculosis” Limiters dates source type Language https://youtu.be/yjxx2bH6gio?si=krmEyfvoWfyqvPmT First step in literature review is to search for review articles Search for recent articles ❖Summarize current and past researches ❖Save time ❖May present gaps and limitations in current knowledge ❖You can find references to search for Questions to consider when doing literature review Who (authors e.g Sobh et al., target population e.g COPD patients, funding body eg EU) What (type of study, methods, results, conclusion) 5 Ws When (study period) Where (region) Why (aim, objectives, need for this) Research problem A research problem is a statement about an area of concern, a condition to be improved, a difficulty to be eliminated, or a troubling question that exists in scholarly literature, in theory, or in practice that points to the need for meaningful understanding and deliberate investigation It is difficulty needs solution Research problem Community Problem Action outcome How to identify research problem State the problem Understanding the nature of the problem Survey literature Discuss with your team and stakeholders Develop ideas Rephrase the problem Aim/objectives (SMART) Specific Measurable Achievable Reproducible Time bound Research Question Research Problem Try to arrange the options given in the drop-down menus below in the right order of steps, from knowing what your area of research is to reaching the point where you have a sense of direction for doing research. 1. what is known far in this area? 2. which unknowns to focus on in my research area? 3. what is not known in this area? 4. what is my area of research (problem)? Eman Sobh Feedback The correct order is this: Step 1: What is my area of research? Step 2: What is known so far in this area? Step 3: What is not known in this area? Step 4: Which unknowns do I want to focus on in my research? To reach the third step, you need to do an extensive literature survey. Here is where a recent review article can really save you time! You may find that there are a lot of unknowns in a particular area. What is unknown could be a gap in knowledge, so it could be an opportunity for you to address it! But that doesn't mean every unknown thing is important or relevant. Continue to the next page to learn more... Formulating a research question A research question is a clear and specific question that forms the basis of your research project. Three examples of research questions are given below. Try to identify the question that is the most specific ❖Are children at risk of bronchial asthma ❖Are school children at risk of bronchial asthma ❖Are school children exposed to indoor air conditioning at risk of bronchial asthma in Saudi Arabia Research question (FINER) Feasible Interesting Novel Ethical Relevant What is a Hypothesis? A hypothesis is a prediction regarding the possible outcome of a study Advantages of stating hypotheses include: ◦ Forces us to think more deeply and specifically about the possible outcomes of the study ◦ Enables us to make specific predictions based on prior evidence or theoretical argument ◦ Helps to clarify whether we are or aren’t investigating a relationship Disadvantages of stating hypotheses include: ◦ May lead to a bias on the part of the researcher ◦ In some studies, it would be presumptuous to predict what findings would be ◦ Focusing on the hypothesis could prevent the researcher from seeing other phenomena that might be important to the study Hypotheses Shape and guide a research study in terms of: Identification of study sample size What issues should be involved in data collection The proper analysis of the data Data interpretation Hypotheses from a Single Research Question Hypothesis Framing Traditionally….. H0: “Null” hypothesis (assumed) H1: “Alternative” hypothesis Hypothesis statement Hypothesis: a predictive declarative statement about the outcome of a study. “It’s impossible to verify the truth of a general law by repeated observations, but finding a single observation is sufficient to falsify such a law” Banerjee et al., 2009 30 What color are swans? What is the evidence for that? How do we get the evidence? “Repeated observations of white swans did not prove that all swans are white, but the observation of a single black swan sufficed to falsify that general statement” (Popper. 1976) Hypothesis statement Null hypothesis or H0 Stated in the negative Example: For patients with Asthma, receiving Breathing exercises (BTE) will not be different than those receive Inspiratory muscle training (IMT) on respiratory muscle strength, asthma control, or functional acapcity. BTE=IMT 32 Hypothesis statement Research/alternative hypothesis (H1) Prediction of outcome of study Directional or nondirectional Nondirectional Example: CV ratios for 24-month-old children with expressive language delay who participated in an 8-week ◦ intervention program will be different from the CV ratios of 24-month-old children with expressive language delay who received no intervention. 33 Activity 1. Make a group of 4-6 2. Choose name to your team 3. Choose a topic for your research (remember what you have studied regarding research problem) 4. Search the internet 5. Select the papers that you think it will serve your topic 6. Write down the steps (topic, keywords, database, total number of papers, number of selected papers) 7. Ask each member in the team to choose 2 of them and fill in the literature review sheet you will find at assignments section at blackboard (individual activity) 8. Discuss your papers with your team and write if that added something to you) (Group work) Taibah University College of Medical Rehabilitation Sciences Respiratory Therapy Department 1446 (2024-2025) Biostatistics & Research Methodology Course (BRM 497) Lecture 3 Biostatistics & Research Methodology Course (BRM 497) Quantitative & Qualitative research approaches Learning Outcomes ❖ To be able to differentiate between the different study designs. ❖ To be able to state the strengths and weaknesses of the different study designs. ❖ To be able to design a study suitable to answer a specific research question. ❖ To understand ethical issues behind clinical trials ❖ To know the level of evidence of each study Study design A study design is a specific plan or protocol for conducting the study, which allows the investigator to translate the conceptual hypothesis into an operational one. Research study design is a framework, or the set of methods and procedures used to collect and analyze data on variables specified in a particular research problem. Planning research Literature review Study type Sample Population Statistics Validity and reliability Variable Exposure Outcome Bias Level of evidence Causation vs. Correlation In respiratory therapy research, causation means that a specific treatment or intervention directly leads to an improvement in a patient's respiratory health. For example, if giving a patient oxygen therapy improves their blood oxygen levels, you could say that the oxygen therapy caused the improvement, assuming no other factors contributed. Correlation means that two things are connected, but one doesn’t necessarily cause the other. For instance, you might observe that patients who use a certain type of breathing device also tend to have fewer hospital visits. But this doesn't mean the device directly caused the reduction in hospital visits—other factors, like the patient’s overall health or other treatments they’re receiving, could be playing a role. Types of Research design Quantitative Qualitative Research Mixed research Research Research that relies Research that relies Research that primarily on the on the collection involves the collection of of qualitative data mixing of quantitative data to develop quantitative and to test theories theories, where the qualitative data are not in the methods form of numbers Types of Research Quantitative Research Qualitative Research Purpose Prediction Description Discovering facts or proving theories Understanding human behavior Hypothesis generation Focus, results Generalize the results to population Specific to group e.g cancer patients Approach Measure and test Observe and interpret Sample Calculated Small, non-random Questions Closed ended Open ended Measurements Variables Inducive (text, images, categories, patterns, Deductive of units audio, video) Data analysis Statistics Search for patterns , themes Inductive (by statistician) Deductive (by researcher) Final report Statistical report Narrative Qualitative Research What is Qualitative Research ❖ Is primarily exploratory research aims to understand and describe behavior by exploring beliefs, motivations or reasoning. So qualitative research focuses on life experiences; they are more about the “why” and “how”. Qualitative studies use categorical or descriptive data collected from interviews, observations, surveys or other records. ❖ Qualitative research includes different types of study design. Each type of study has different methods with unique advantages and disadvantages. ❖ It is characterized by its aims, which relate to understanding some aspect of social life (people's beliefs, experiences, attitudes, behavior, and interactions), that help researchers to understand how and why such behaviors take place ❖ It focuses on obtaining data through open ended questions. It generates non-numerical data. This method is not only about “what” people think but also “why” they think so. ❖ Usually used in social sciences like psychology and sociology and in market analysis ❖ The qualitative research methods allow for in-depth and further probing and questioning of respondents based on their responses, where the interviewer/researcher also tries to understand their motivation and feelings. Understanding how your audience makes decisions can help derive conclusions in market research. Qualitative Research Purpose Describe Understand Explain Identify Develop Generate When to use qualitative methods? These methods aim to answer questions about the ‘what’, ‘how’ or ‘why’ of a phenomenon rather than ‘how many’ or ‘how much’, which are answered by quantitative methods Qualitative research methodology A methodology is the system of methods used in a discipline area and the justification for using a particular method in research. Phenomenology: Used to describe the lived experience of individuals. Attempts to set aside biases and preconceived assumptions about human experiences, feelings, and responses to a particular situation. Experience may involve perception, thought, memory, imagination, and emotion or feeling. E.g. How do cancer patients cope with a terminal diagnosis? Grounded theory used for theory development. E.g. What are the barriers to healthcare access for a refugee population? Ethnography: is used to describe the characteristics of a particular culture/ethnographic group. E.g. What expectations and beliefs do people within specific communities hold about their healthcare options? Historical is used for Looking at the past to inform the future. E.g.: What caused an outbreak of polio in the past that may contribute to the outbreaks of today? Narrative inquiry: records the experiences of an individual or small group, revealing the lived experience or particular perspective of that individual, e.g.; What are the difficulties of living in a wheelchair? Action research: focus on the release, collaboration, and empowerment of the participants. E.g.: “What would it take to improve the stability of young people’s living situations?” Case studies: In depth description of the experience of a single person, a family, a group, a community or an organization. E.g Choi, T. S. T., Walker, K. Z., & Palermo, C. (2018). Diabetes management in a foreign land: A case study on Chinese Australians. Health & Social Care in the Community, 26(2), e225-e232. Field research: To understand attitudes, practices, roles, organizations, groups, or behaviors in their natural setting. Qualitative Research methods Interviews Case study (face to face, Focus groups research Telephone) Record Process of Surveys & keeping & observation questionnaires Documents Eman Sobh Quantitative Research What is Quantitative Research designs ❖Quantitative research aims to develop objective theories by generating quantifiable numerical data. So quantitative research is about “how many” or “how often” or "how large". The focus is on measurable data and numerical analysis. Quantitative research can use different types of study designs. Each type of study design has different methods with unique advantages and disadvantages. ❖ A research that is used to quantify the problem by way of generating numerical data or data that can be transformed into usable statistics ❖ It is used to quantify attitudes, opinions, behaviors, and other defined variables ❖ Uses measurable data to formulate facts and uncover patterns in research Eman Sobh Quantitative data Data sources include Surveys where there are a large number of respondents (especially where you have used a Likert scale). Observations (counts of numbers and/or coding data into numbers) Secondary data (government data; SATs scores etc). Analysis techniques include hypothesis testing, correlations, and cluster analysis. Eman Sobh Variables Independent Variable This variable is the ‘cause’ / as the predictor variable (TTT [which stands for Test, Treatment, or Therapy),] or medical intervention). Independent Variable = can be manipulated or allowed to vary Dependent Variable This variable is the ‘effect’ / should only vary in response to the independent variable / also known as the criterion variable. For example, if we want to explore whether high concentrations of vehicle exhaust impact incidence of asthma in children, vehicle exhaust is the independent variable while asthma is the dependent variable. Extraneous Variables: must be controlled to isolate the effect of the independent on the dependent variable Confounding Variables or confounders extraneous variables which have co-varied with the independent variable. Factors that affect the relationship between the independent and dependent variables A confounding variable in the example of car exhaust and asthma would be differential exposure to other factors that increase respiratory issues, like cigarette smoke or particulates from factories. Eman Sobh Epidemiologic study design Observational studies Descriptive Analytic Case Case Cross- Cross Case Ecologic Cohort study series sectional sectional control Eman Sobh Which study type will answer my clinical question? Epidemiologic study design ❖Observational Studies ❖Descriptive Studies Case reports /Case series /Population studies ❖Analytic ❖Cross-Sectional ❖Case-Control ❖Cohort ❖Experimental / Interventional studies ❖As per Control: RCT/NRCT ❖As per Blinding: open label/ Single /DoubleBlind ❖As per Design: Simple/Cross-over ❖As per Area: Field/Clinical/Lab Eman Sobh Eman Sobh Observational studies Eman Sobh Case-study or Case-series Some famous examples of case studies are John Martin Marlow’s case study on Phineas Gage (the man who had a railway spike through his head) Case studies are widely used in psychology to provide insight into unusual conditions. A case study, also known as a case report, is an in depth or intensive study of a single individual or specific group, while a case series is a grouping of similar case studies/case reports together. A case study/case report can be used in the following instances: where there is atypical or abnormal behaviour or development an unexplained outcome to treatment an emerging disease or condition https://www.simplypsychology.org/phineas-gage.html Eman Sobh Case-study or Case-series Eman Sobh Cross-sectional study Cross-sectional studies look at a population at a single point in time, like taking a slice or cross-section of a group, and variables are recorded for each participant. This may be a single snapshot for one point in time or may look at a situation at one point in time and then follow it up with another or multiple snapshots at later points; this is then termed a repeated cross-sectional data analysis. Cross-sectional study designs are useful when: Answering questions about the incidence or prevalence of a condition, belief or situation. Establishing what the norm is for a specific demographic at a specific time. For example: what is the most common or normal age for students completing secondary education in Victoria? Justifying further research on a topic. Cross-sectional studies can infer a relationship or correlation but are not always sufficient to determine a direct cause. As a result, these studies often pave the way for other investigations. Eman Sobh Cross-sectional study Eman Sobh Case control study In a Case-Control study there are two groups of people: one has a health issue (Case group), and this group is “matched” to a Control group without the health issue based on characteristics like age, gender, occupation. In this study type, we can look back in the patient’s histories to look for exposure to risk factors that are common to the Case group, but not the Control group. Example: a case-control study that demonstrated a link between carcinoma of the lung and smoking tobacco. These studies estimate the odds between the exposure and the health outcome, however, they cannot prove causality. Case-Control studies might also be referred to as retrospective or case-referent studies. Eman Sobh Case control study Eman Sobh Cohort study Cohort studies are longitudinal, observational studies, which investigate predictive risk factors and health outcomes. They differ from clinical trials, in that no intervention, treatment, or exposure is administered to the participants. The factors of interest to researchers already exist in the study group under investigation. Study participants are observed over a period of time. The incidence of disease in the exposed group is compared with the incidence of disease in the unexposed group. Because of the observational nature of cohort studies they can only find correlation between a risk factor and disease rather than the cause. Cohort studies are useful if: There is a persuasive hypothesis linking an exposure to an outcome. The time between exposure and outcome is not too long (adding to the study costs and increasing the risk of participant attrition). The outcome is not too rare. Eman Sobh Cohort study Eman Sobh Cohort study Adapted from: Cohort Studies: A brief overview by Terry Shaneyfelt [video] https://www.youtube.com/watch?v=FRasHsoORj0 Eman Sobh Experimental Research Experimental research involves a direct assessment of how one variable influences another This allows the establishment of causality All extraneous variables must be held constant while a single variable is manipulated, and the effect measured Eman Sobh Experimental Research Elements Control: The extent to which different factors are accounted for. Manipulation/intervention/Exposure: Is done to set the stage for the occurrence of the factor whose performance is to be studied under conditions in which all other factors are controlled Observation: The effect of the manipulation of independent variable on the dependent variable is studied or observed. Replication: A matter of conducting a number of sub-experiments within the frame work of an overall experiment design. Eman Sobh Interventions that Can Be Evaluated ❖New drugs and new treatment of diseases ❖New medical and health care technology ❖New methods of primary prevention ❖New programs for screening ❖New ways of organizing and delivering health services ❖New community health programs ❖New behavioral intervention programs True experiments have Experimental study design three/four Definition: An experiment is a study in which a treatment, elements: manipulation, procedure, or program is intentionally introduced, and a result control, and or outcome is observed. randomization (assignment, and selection). True Quasi experimental experimental True experiment study designs Clinical Trials Controlled trials Non-controlled Trials Randomized Non-randomized Controlled Trial (RCT) Controlled Trials (NRCT) Blinded Not- blinded Single- Double- Triple- Open Blinded Blinded Blinded Reading only A story (for learning) Assignment 1. Make a group of 4-6 2. Choose name to your team 3. Search the literature for examples of different study designs in your field and mark the sentences that define the study design type 4. Each team should have example for each study design types Eman Sobh References Myers, J. L., Well, A. D., & Lorch Jr, R. F. (2013). Research design and statistical analysis. Routledge. Bowling, A. (2014). Research methods in health: investigating health and health services. McGraw-hill education (UK). Indu, P. V., & Vidhukumar, K. (2019). Research designs-an Overview. Kerala Journal of Psychiatry, 32(1), 64-67. Thank You Taibah University College of Medical Rehabilitation Sciences Respiratory Therapy Department 1446 (2024-2025) Biostatistics & Research Methodology Course (BRM 497) Lecture 4 Randomized clinical trial ❖A trial is an experiment ❖A clinical trial is a controlled experiment having a clinical event as an outcome measure, done in a clinical setting, and involving persons having a specific disease or health condition ❖A randomized clinical trial is a clinical trial in which participants are randomly assigned to separate groups that compare different treatments ❖What is placebo? ❖A placebo is a substance or Controlled” implies pre-defined: ❖Eligibility criteria ❖Specified hypotheses ❖Primary and secondary endpoints (e.g., behavioral change, HIV incidence) to address hypotheses ❖Methods for enrollment and follow-up. ❖ Rigorous monitoring ❖Analysis plans and stopping rules ◦ treatment of no intended therapeutic value ❖“ Methods of randomization ❖ Simple flip of a coin for each subject; if the coin lands on its ‘head’, subjects are assigned to the first group & with ‘tail’ subjects are assigned to a control group. ❖ Write the names of the subjects on slips of paper & put the slips into a bowl & then draw lots. The first designated number of subjects are placed in one group, & rest are assigned under another group. ❖ Blind-folded subjects choose a number from a table of number horizontally (row) or vertically (columns), till a requisite number is reached for both experimental & control groups. ❖ Computer or calculator randomization ❖ Random number Web sites, such as http://random.org/ Phases of clinical trials Phase 1: trials aim to look at doses and side effects. Few population (less than 100). Used for drugs first time introduced to evaluate its safety. Determine levels of toxicity, metabolism, pharmacological effect, and safe dosage range. Identify side effects. Phase 2: trial, a new treatment is compared with another treatment already in use, or with a dummy drug (placebo). 100 or more Phase 3: compare new treatments with the best currently available treatment Phase 4: trials are done after a drug has been shown to work and has been granted a license. Phase 4 is sometimes written as Phase IV. The main reasons for running phase 4 trials are to find out: More about the side effects and safety of the drug. What the long-term risks and benefits are. How well the drug works when it’s used more widely. Randomized Controlled Trials (RCTs) The common important features of a randomized controlled trial are: Researchers pre-determine eligibility criteria for the population considered for inclusion in the trial, resulting in a population that is reasonably similar in important ways. Randomization: Participants are randomly assigned to an experimental or control group. This reduces the potential for bias and the impact of variables outside the researcher’s control. Exposure: Researchers manage trial participants’ engagement with the study, including exposure to the intervention of interest. This is a key difference from other, observational studies where researchers do not directly expose participants to interventions. The experimental group is exposed to the intervention of interest while the control group is not. Both groups are then followed in order for a comparison to be made between their respective outcomes. Blinding: Additionally, many randomized controlled trials are structured so that participants and/or researchers do not know which participants are part of the experimental group and which are part of the control group. Studies where participants don’t know if they are receiving the intervention are called ‘single-blind’ studies whereas studies where both researchers and participants are unware are called ‘double-blind’ studies. Participants may be ‘blinded’ because of potential biases they may have towards the intervention. Similarly, researchers may have biases about the intervention, or want to provide it to certain participants and not others. Having both parties unaware of who is receiving what – the double-blind structure – is superior because it reduces the impact of these biases on the outcome of the trial. When is it unethical to randomize? Known effective treatment Cannot use a placebo (e.g., trials to prevent mother-to-child HIV transmission). Need to provide standard of care Personal choice Cannot randomize very different interventions For example, trials of different types of contraceptive (e.g., pill vs. IUD), are ethically questionable because women have the right to select a method of their choice..(Can randomize within method type e.g., pill A vs pill B) Risks of new treatment likely to exceed risks of existing treatment Randomized study/randomized Controlled study Randomized Controlled Trials (RCTs) Intervention/Manipulation means that something is purposefully changed by the researcher in the environment. Randomization: This means that no matter who the participant is, he/she has an equal chance of getting into all of the groups or treatments in an experiment. Non-randomized controlled pretest post test design/Quazi-experimental study Bias Bias is a systematic error in study design, subject recruitment, data collection, or analysis that results in a mistaken estimate of the true population parameter. Selection bias occurs when the procedures used to select subjects and other factors that influence participation in the study produce a result that is different from what would have been obtained if all members of the target population were included in the study. For example, an online website that rates the quality of primary care physicians based on patients’ input may produce ratings that suffer from selection bias. This is because individuals who had a particularly bad (or good) experience with the physician may be more likely to go to the website and provide a rating. Information bias refers to a “systematic error due to inaccurate measurement or classification of disease, exposure, or other variables.” Recall bias, a type of information bias, occurs when study participants do not remember the information they report accurately or completely https://www.nlm.nih.gov/nichsr/stats_tutorial/cover.html Publication bias Validity & Reliability RELIABILITY VALIDITY ❖Reliability: is the degree of consistency ❖Validity talks about the soundness of the research design and methods. ❖ Reliability= repeatability ❖Entrarater reliability= interaobserver ❖While carrying out the experiment, the reliability=interaoperator reliability: the researcher has two objectives, relating to the consistency of the repeated measurements of the validity, i.e. draw conclusions about the impact of same observation by the same rater. an independent variable on the group under study and make inferences about the population as a ❖Enterrater reliability = interobserver whole. reliability=interoperator reliability: the consistency of the repeated measurements of the ❖The first objective stresses on internal validity, same observation by different raters. whereas the second focuses on external validity. Validity & Reliability ❖Validity speaks about objective measures (e.g instruments) ❖Reliability speaks about subjective measures (e.g operator) Reliability is needed in study where Ultrasonography done by different sonographers Generalizability of results Internal validity (truth in study) External validity (truth in real life) Validity EXTERNAL VALIDITY INTERNAL VALIDITY ❖External validity is the extent to ❖Internal validity is the extent to which the research results can which the experiment is free be inferred to world at large. from errors and any difference in measurement is due to an ❖It checks whether the casual independent variable and relationship discovered in the nothing else. experiment can be generalized or not. ❖It is a measure of accuracy of the experiment. Threats to internal validity (confounders) Threats associated with Threats associated with participants measurement Selection bias Regression (sampling) Instrumentation (error Maturation (time effect) in measurement) History (e.g prior Testing effect (pretest) educational level) Experimental bias (e.g Experimental mortality the subject being tested =attrition knows the experiment subject) Precision=reliability ❖ The precision of a variable is the degree to which it is reproducible, with nearly the same value each time when measurements are repeated. ❖Also called reproducibility , reliability , and consistency ❖ Precise values differ from each other because of random error ❖Sources of Random Error: 1- Observer variability 2- Instrument variability 3- Subject variability Accuracy=validity The accuracy of a variable refers to how close a measurement is to the true value Systematic error (bias) ◦ The greater the error, the less accurate the variable Classes of measurement error: ◦ Observer bias ◦ Instrument bias ◦ Subject bias Precision Vs. Accuracy Example: Precision refers to how close multiple measurements are to each Who is more precise when measuring the same other, regardless of how close they are to the actual value. 21.0 cm book four times? Mohammad's measurements: 18.0 cm, 19.0 cm, 20.0 cm, and 21.0 Mohammad: cm vary significantly (with a range of 3.0 cm), meaning his measurements are not very precise. 18.0 cm, 19.0 cm, 20.0 cm, 21.0 cm. Omar: Omar's measurements: 15.5 cm, 15.0 cm, 15.2 cm, and 15.3 cm are much closer to each other (with a range of 0.5 cm), indicating 15.5 cm, 15.0 cm, 15.2 cm, 15.3 cm. that Omar's measurements are more consistent, showing better precision. Omar is more precise. While neither is accurate (since the actual length is 21.0 cm), Omar's measurements are closer together, demonstrating greater precision. Activity Accuracy refers to how close a measured value is to Q1/ The volume of a liquid is 26 mL. A student the actual value. In this case, the true value is 26 mL, measures the volume and finds it to be 26.2 and the student's measurements are close to this mL, 26.1 mL, 25.9 mL, and 26.3 mL in the first, value, though not exact, meaning accuracy could be second, third, and fourth trial, respectively. improved. Which of the following statements is true for his measurements? Precision refers to how close repeated measurements a. They are neither precise nor accurate. are to each other, regardless of how close they are to the true value. b. They have poor accuracy. The student’s measurements are 26.2 mL, 26.1 mL, c. They have good precision. 25.9 mL, and 26.3 mL, which are all very close to each other, showing good precision. The measurements are d. They have poor precision. close to the true value but not exactly on target, so the accuracy is not perfect. However, the small variation between measurements demonstrates good precision. Activity Q2/ The volume of a liquid is 20.5 mL. Which of the following sets of measurements represents the value with good accuracy? a. 18.6 mL, 17.8 mL, 19.6 mL, 17.2 mL Accuracy is determined by how close the measurements are to the actual value, which in this case is 20.5 ml. The values in option d are very close to 20.5 mL, indicating good b. 19.2 mL, 19.3 mL, 18.8 mL, 18.6 mL accuracy. c. 18.9 mL, 19.0 mL, 19.2 mL, 18.8 mL - The other options (a, b, and c) contain measurements that are further away from 20.5 mL, making them less accurate. So, option d represents the set of measurements with good d. 20.2 mL, 20.5 mL, 20.3 mL, 20.1 mL accuracy. Alpha and Beta errors a Type I error means rejecting the null hypothesis when it's actually true, while a Type II error means failing to reject the null hypothesis when it's actually false Type I error vs. Type II error TRUTH DIFFERENCE NO DIFFERENCE STUDY FINDINGS DIFFERENCE TYPE I ERROR 𝛼 NO DIFFERENCE TYPE II ERROR BETA 𝛽 Narrative review Narrative review: Descriptive study, in which the authors select frequently articles based on their point of view which leads to its poor quality Systematic review: a review using a systematic method to summarize evidence on questions with a detailed and comprehensive plan of study. Systematic Reviews & Meta-analysis In clinical research systematic reviews and meta-analyses are considered popular study designs. A systematic review is a summary of the medical literature that uses explicit and reproducible methods for searching the literature. Meta-analysis is a mathematical synthesis of the results of these individual studies. These study designs are considered as useful tools for summarizing the knowledge that is obtained from scientific papers on a certain topic. In addition, combining different studies in a meta-analysis will increases statistical power, resulting in more precise effect estimates. Systematic Reviews - steps Define review question – very precisely; in partnership with commissioners, clinicians, patients (as appropriate) Develop & register protocol – provides transparency; defines exact inclusion criteria and methods Identify relevant studies – usually comprehensive search across multiple bibliographic databases plus reference checking Assess eligibility – careful matching of studies against inclusion criteria Extract relevant data – only what is required to answer the question Critically appraise studies – use a published tool to compare methodological features across studies Synthesise appropriately – depending on type of question and study designs quantitative or qualitative synthesis Test for heterogeneity Analyze data for meta-analysis if you have a sufficient number of studies Disseminate to appropriate audience – full transparent write-up plus as appropriate short report to funder, journal article, patient leaflet… Mendeley reference management Mendeley is a free reference management tool and academic social network that helps researchers organize and manage references, generate citations, and collaborate. It allows users to store, annotate, and organize research papers, sync across devices, and automatically format citations in various styles. Mendeley integrates with word processors like Microsoft Word for easy citation insertion and bibliography creation, making it a useful tool for managing academic and research work efficiently. (https://www.youtube.com/@mendeley). References Myers, J. L., Well, A. D., & Lorch Jr, R. F. (2013). Research design and statistical analysis. Routledge. Bowling, A. (2014). Research methods in health: investigating health and health services. McGraw-hill education (UK). Indu, P. V., & Vidhukumar, K. (2019). Research designs-an Overview. Kerala Journal of Psychiatry, 32(1), 64-67. Kusumaningsih, D. (2018, October). Mendeley as a reference management and citation generator for academic articles. In International Conference on Applied Science and Engineering (ICASE 2018) (pp. 81-83). Atlantis Press. Banerjee, A., Chitnis, U. B., Jadhav, S. L., Bhawalkar, J. S., & Chaudhury, S. (2009). Hypothesis testing, type I and type II errors. Industrial psychiatry journal, 18(2), 127-131. Taibah University College of Medical Rehabilitation Sciences Respiratory Therapy Department 1446(2024-2025) Biostatistics & Research Methodology Course (BRM 497) Lecture 5 Learning Objectives Be able to define some statistical terms To be able to define a “variable” in research and discuss different types of vraibles To understand different types of statistical tests To be able to choose the optimal tools for data collection based on the variables intended to be measured Biostatistics What is Statistics ◦ Statistics is the science of learning from data, and of measuring, controlling, and communicating uncertainty; and it thereby provides the navigation essential for controlling the course of scientific and societal advances ◦ Statistics is also an ART … of conducting a study, analyzing the data, and derive useful conclusions from numerical outcomes about real life problems… ◦ Examples : ◦ The number of Students in Taibah University ◦ The cost of health care services in Saudi Arabia What is Bio-Statistics ◦ Bio-Statistics is a group of methods used to collect, analyze, present, and interpret of biological, medical, and public health data to make decisions. ◦ Biostatistics is the application of statistics in medical research, ◦ e.g.: Clinical trials ◦ Epidemiology ◦ Pharmacology ◦ Medical decision making ◦ Comparative Effectiveness Research ◦ etc. 3 Why biostatistics ? What is the difference? ◦ Because some statistical methods are more heavily used in health applications than elsewhere ◦ E.g. survival analysis ◦ longitudinal data analysis, ◦ Because examples are drawn from health sciences ◦ Makes the subject more appealing to those interested in health ◦ Illustrates how to apply the methodology to similar problems encountered in real life 4 Data, Sampling and Variation in Data and Sampling ◦ The measurements obtained in a research study are called the data. ◦ The goal of statistics is to help researchers organize and interpret the data. ◦ Data: Measurement of records of facts made under specific conditions ◦ Most data can be put into the following categories ◦ Qualitative (categorial) ◦ Quantitative (numerical) Qualitative data are the result of categorizing or describing attributes of a population. Qualitative data are also often called categorical data. ◦ Examples : Hair color, blood type, ethnic group, the car a person drives, and the street a person lives Quantitative data are always numbers. Quantitative data are the result of counting or measuring attributes of a population. Quantitative data may be either discrete or continuous ◦ Example: Amount of money, pulse rate, weight, number of people living in your town, and number of students who take statistics. 5 Dr Abdul Rahman Hameed Data, Sampling and Variation in Data and Sampling Primary Data Primary data is the data that is collected for the first time through personal experiences or evidence, particularly for research. It is also described as raw data or first-hand information. The mode of assembling the information is costly. The data is mostly collected through observations, physical testing, mailed questionnaires, surveys, personal interviews, telephonic interviews, case studies, and focus groups, etc. Secondary Data Secondary data is a second-hand data that is already collected and recorded by some researchers for their purpose, and not for the current research problem. It is accessible in the form of data collected from different sources such as government publications, censuses, internal records of the organisation, books, journal articles, websites and reports, etc. This method of gathering data is affordable, readily available, and saves cost and time. However, the one disadvantage is that the information assembled is for some other purpose and may not meet the present research purpose or may not be accurate. 6 Dr Abdul Rahman Hameed What is a Variable? A variable is a characteristic or condition that can change or take on different values (i.e., eye color). Researchers choose certain variables to study since they are suspected to be related to a possible relationship to be discovered. Measuring variable & Measurement Scales To establish relationships between variables, researchers must observe the variables and record their observations. This requires that the variables be measured. The process of measuring a variable requires a set of categories called a scale of measurement and a process that classifies each individual into one category. 1. A nominal scale is an unordered set of categories identified only by name. Nominal measurements only permit you to determine whether two individuals are the same or different (yes or no, male or female). 2. An ordinal scale is an ordered set of categories. Ordinal measurements tell you the direction of difference between two individuals. 3. An interval scale is an ordered series of equal-sized categories. Interval measurements identify the direction and magnitude of a difference. The zero point is located arbitrarily on an interval scale. 4. A ratio scale is an interval scale where a value of zero indicates none of the variable. Ratio measurements identify the direction and magnitude of differences and allow ratio comparisons of measurements. 8 Types of Variables Variables Quantitative Categorical or Or Numeric Qualitative Continuous Discrete Nominal Ordinal Interval Ratio Types of Variable Quantitative Qualitative Variable Variable (Measured Numerically) (Can’t Mesured Numerically) Continuous Discrete Ordinal Nominal Variable Variable Variable Variable Any Numerical Values Only a certain numerical It is a Categorical Values Ordered Categorical over a certain intervals – values with no Example : Values Must be a Decimal intermediate values Gender, Occupation, type Example : Example : Example : of disease, types of Car, Severity of disease, level of Age, Length, Height, BMI, Number of Cars, Houses, Blood type, etc., Education, etc., Weight, Time, Cholesterol Accidents, Family size, (Binary Variable) Low – Moderate – High Level, etc., etc., Yes or No True or Fale Types of Variables Variables are classified as either quantitative or categorical A quantitative variable is conceptualized and analyzed in distinct categories, with no continuum implied (i.e., height). ◦ Can be subdivided into smaller units A categorical variable does not vary in degree, amount, or quantity, but are qualitatively different (i.e., gender). ◦ There is no middle ground or in-between measurement Types of quantitative (numeric) Variables Variables can be classified as discrete or continuous. Discrete variables (such as class size) consist of indivisible categories, and continuous variables (such as time or weight) Discrete variables are countable in a finite amount of time. For example, you can count the change in your pocket. You can count the money in your bank account. You could also count the amount of money in everyone’s bank accounts. It might take you a long time to count that last item, but the point is—it’s still countable. Example: The number of transactions done by a customer on a particular day. It can 0, 1, 2, …. but it cannot be 2.5 or 2.75. Continuous Variables would (literally) take forever to count. In fact, you would get to “forever” and never finish counting them. For example, take age. You can’t count “age”. Why not? Because it would literally take forever. For example, you could be: 25 years, 10 months, 2 days, 5 hours, 4 seconds, 4 milliseconds, 8 nanoseconds, 99 picosends…and so on. There are two types of continuous variables. They are, Interval Variable Ratio Variable To establish relationships between variables, researchers must observe the variables and record their observations. This requires that the variables be measured. 12 Interval Variable Interval variables have a numerical value These have order and equal intervals. They allow not only to rank order the items that are measured but also to quantify and compare the magnitudes of differences between them. the difference between the two values is meaningful. The difference between a temperature of 100 degrees and 90 degrees is the same difference as between 90 degrees and 80 degrees. It can be zero or below zero (negative) No true zero value (zero temp dose not mean absence of heat) Example The difference in temperature between 50 degrees and 60 degrees is 10 degrees; this is the same difference between 70 degrees and 80 degrees. Ratio Variable A ratio variable is similar to an interval variable with one difference: the ratio makes sense. Has a true zero value, and can not be below zero Example Let’s say respondents were being surveyed about their stress levels on a scale of 0-10. Stress level of 10 is double that of 5 Age, and height are also examples Classic examples of a ratio scale are any variable that possesses an absolute zero characteristic, like age, weight, height, or sales figures. Categorical Variable A categorical variable (also called qualitative variable) refers to a characteristic that can’t be quantifiable. Categorical variables can be either nominal or ordinal. Categorical variable may be coded as 0, 1, 2,… Nominal Variable A nominal variable is one that describes a name, label, or category without natural order. Sex and type of dwelling are examples of nominal variables. Examples Colors: Red, Yellow, Pink, Blue Countries: Singapore, Japan, USA, India, Korea Animals: Cow, Dog, Cat, Snake Gender: (Dichotomous Variable): Male & Female Ordinal Variable An ordinal variable is a nominal variable, but its different states are ordered in a meaningful sequence. Ordinal data has order but the intervals between scale points may be uneven. Because of lack of equal distances, arithmetic operations are impossible, but logical operations can be performed on the ordinal data. A typical example of an ordinal variable is the socio- economic status of families. We know 'upper middle' is higher than 'middle' but we cannot say 'how much higher'. Examples Independent vs. Dependent Variables The independent variable is what the researcher studies to see its relationship or effects. ◦ Presumed or possible cause (e.g smoking) The dependent variable is what is being influenced or affected by the independent variable (le.g lung cancer), outcome is a type of dependent variable ◦ Presumed results Independent variables may be either manipulated or selected ◦ A manipulated variable is a changed condition the researcher creates during a study, also known as an experimental or treatment variable (e.g drug) ◦ A selected variable is an independent variable that already exists (e.g gene) Independent vs. Dependent Variables Example 1 You are interested in “How stress affects mental state of human beings?” Independent variable ----- Stress Dependent variable ---- mental state of human beings You can directly manipulate stress levels in your human subjects and measure how those stress levels change mental state. Example 2 Promotion affects employees’ motivation Independent variable ----- Promotion Dependent variable ---Employees motivation Control/Constant Variable It is variable that is NOT allowed to be changed unpredictably during an experiment. As they are ideally expected to remain the same, they are also called constant variables. Example: An example of a constant variable is the voltage from a power supply. If you are examining “How electricity affects experimental subjects” you should keep the voltage constant, otherwise the energy supplied will change as the voltage will be changed. Types of Statistics/Analyses DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS ◦ Describing a phenomena How many? How much? Inferences about a phenomena Blood Pressure (BP), Heart rate Proving or disproving theories (HR) body mass index (BMI), Associations between intelligence quotient (IQ), etc. phenomena ◦ Frequencies If sample relates to the larger ◦ Basic measurements population E.g., Diet and health Statistical Description of Data Statistics describes a numeric set of data by its ◦ Center ◦ Variability ◦ Shape Statistics describes a categorical set of data by ◦ Frequency, percentage or proportion of each category Distribution - (of a variable) tells us what values the variable takes and how often it takes these values. Unimodal - having a single peak Bimodal - having two distinct peaks Symmetric - left and right half are mirror images. Descriptive Statistics Descriptive statistics are methods for organizing and summarizing data. For example, tables or graphs are used to organize data, and descriptive values such as the average score are used to summarize data. A descriptive value for a population is called a parameter and a descriptive value for a sample is called a statistic. 24 Inferential Statistics Inferential statistics are methods for using sample data to make general conclusions (inferences) about populations. Because a sample is typically only a part of the whole population, sample data provide only limited information about the population. As a result, sample statistics are generally imperfect representatives of the corresponding population parameters. 25 Sampling Error The discrepancy between a sample statistic and its population parameter is called sampling error. Defining and measuring sampling error is a large part of inferential statistics. 26 A demonstration of sampling error. Two samples are selected from the same population. Notice that the sample statistics are different from one sample to another, and all of the sample statistics are different from the corresponding population parameters. The natural differences that exist, by chance, between a sample statistic and a population parameter are called sampling error. Data Presentation Two types of statistical presentation of data - graphical and numerical. Graphical Presentation: We look for the overall pattern and for striking deviations from that pattern. Over all pattern usually described by shape, center, and spread of the data. An individual value that falls outside the overall pattern is called an outlier. Bar diagram and Pie charts are used for categorical variables. Histogram, stem and leaf and Box-plot are used for numerical variable. Numerical Data Presentation Tables: A table is a systematic arrangement of Imports and Exports of Country ‘A’ during 2002-05 related statistical data in columns and rows with some predetermined aim or purpose. The purpose of a table is to simplify presentation of related data and make comparisons easy. The reader can easily locate the desired information. Graphical data presentation Bar chart or diagrams A bar can be defined as a thick ‘line’, Number of cars registered in three States often made thicker to draw the reader’s attention. The height of this bar shows the quantity of the variable we want to present. It is also called one dimensional diagram because only height of the bar is important, and its base or width is not taken into account. To make them look more beautiful, bars are either coloured or shaded in different ways. Graphical data presentation Bar chart or diagrams Graphical data presentation Pie chart or diagram It is also known as angular diagram. Pie diagrams are more popularly used for presenting percentage breakdown of data Graphical data presentation time series line diagram line graph records the relationship between two variables. If one of the two variables is time in days, weeks, months or years we get a time series line graph. Graphical data presentation Histogram A histogram is a graphical display of data using bars of different heights. In a histogram, each bar groups numbers into ranges. Taller bars show that more data falls in that range. A histogram displays the shape and spread of continuous sample data Graphical data presentation Box Plotting Box plots (also called box-and-whisker plots or box-whisker plots) give a good graphical image of the concentration of the data. They also show how far the extreme values are from most of the data. A box plot is constructed from five values: the minimum value, the first The image above is a boxplot. A boxplot is a standardized way quartile, the median, the third quartile, of displaying the distribution of data based on a five number and the maximum value. summary (“minimum”, first quartile (Q1), median, third quartile (Q3), and “maximum”). It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed. BASIC STATISTICS RT-BRM Overview Types of data Presenting data (descriptive statistics) Normal or not-normal? Correlation Overview Mini quiz Confidence intervals Hypothesis/Significance testing and P-values Tests for continuous data Quantitative Data Two types. Discrete = finite values with nothing in between (e.g. number of cars in the car park) Continuous = values that can be measured, contains fractions and decimals (e.g. age, weight) Qualitative Data Two types. Nominal = a categorical variable that cannot be ranked (e.g. race, religion, sex) Ordered = a categorical variable that can be ranked (e.g. satisfaction level, grades) Presenting Data You can present your data in a variety of ways… Graphs (bar charts, scatter graphs etc.) Summary Statistics (means/medians, percentages etc.) Tables Presenting Categorical Data Bar Chart 5000 4000 No. of subjects 3000 2000 1000 0 no pain moderate pain extreme pain self-reported pain Presenting Categorical Data Frequency Tables % Patients with pain score > 7 Day New Treatment Standard Treatment (Gelclair) (Saline) 0 96% (48/50) 90% (47/52) 1 70% (35/50) 77% (40/52) 3 41% (20/49) 52% (26/50) 7 22% (10/45) 38% (15/40) 14 0% (0/30) 9% (2/22) Presenting Continuous Data Scatter Graph Presenting Continuous Data Box Plot Interquartile range = 25th and 75th quartile of the data Median Presenting Continuous Data Tables We can represent continuous data (e.g. age and BMI) in tables using descriptive statistics like means, medians etc. Presenting Continuous Data Tables What is a mean? The mean is the sum of the continuous data of interest (e.g. age), divided by the total count (e.g. the number of people who had information for their age). What is a median? The median is the middle number in a sequence of numbers (sorted from smallest to largest). What is a standard deviation? The standard deviation quantifies how much the available data differs from the mean value of the group (the further the rest of the data is from the mean, the larger the standard deviation). Presenting Continuous Data Tables The way in which we decide to present our continuous data is dependant on whether our data is …. NORMALLY DISTRIBUTED Descriptive Statistics Continuous Categorical Data Data Normally Not Normally Distributed Distributed Mean Percentages Median Standard Deviation Numbers Mode Range Range Interquartile Range When is our data normally distributed? Normality Assessment You can say your data is approximately Normally distributed if…. ✓ Symmetric histogram (Skew between -1 and 1). ✓ Mean = Median ✓ Standard Deviation is less than the mean. ✓ Kolmogorov-Smirnov test is not significant Approach this test with caution, the other assessments should come first. Mean = 57.23 Standard Deviation = 3.11 Median = 57.00 Skewness = 0.13 Kolmogorov- Smirnov p=0.049 Mean = 15.72 Standard Deviation = 11.22 Median = 11.50 Skewness = 1.27 Kolmogorov- Smirnov p=0.19 Presenting Continuous Data Tables Diabetics Non-diabetics Age; mean (SD) 56.8 (9.7) 59.2 (10.6) Duration of symptoms; 7.9 (4 to 33) 8.6 (4 to 32) median (range) Correlation Correlation measures the strength of the relationship between two continuous variables. (note: it is not a measure of agreement) Correlation Pearson correlation: Used when data is normally distributed Measures the strength of the linear relationship between variables Correlation Spearman correlation: Used when data is not normally distributed Measures the strength and direction of association between two ranked variables Correlation Pearson’s correlation only measures linear (“straight line”) relationships. Spearman’s correlation allows us to examine data which is not strictly linear. Pearson is asking is a linear increase in x associated with a linear increase in y. Spearman is asking as x increases does y increase (regardless of linearity). 300 200 Pearson=0.42 Spearman=1 100 y 0 2 4 6 x -1 0 0 Correlation A correlation of… 1 indicates a perfect positive relationship. -1 indicates a perfect negative relationship. 0 indicates no relationship at all. Correlation Ref: http://www.statisticshowto.com/probability-and-statistics/correlation-coefficient-formula/ True or False Scenario: Urinary oestriol (UO) was measured in a sample of 1000 pregnant women and summary statistics were calculated. Which statement(s) are correct? A The median is the UO value which occurs most frequently B If UO was found to have a positively skewed distribution, the mean will be less than the median. C The interquartile range is affected by outliers D A large positive outlier will have no effect on the mean E The median is unaffected by outliers True or False Scenario: Urinary oestriol (UO) was measured in a sample of 1000 pregnant women and summary statistics were calculated. A The median is the UO value which occurs most frequently FALSE The value that occurs most frequently is a mode. The median is the middle number in a sequence. True or False Scenario: Urinary oestriol (UO) was measured in a sample of 1000 pregnant women and summary statistics were calculated. Which statement(s) are correct? A The median is the UO value which occurs most frequently B If UO was found to have a positively skewed distribution, the mean will be less than the median. C The interquartile range is affected by outliers D A large positive outlier will have no effect on the mean E The median is unaffected by outliers True or False Scenario: Urinary oestriol (UO) was measured in a sample of 1000 pregnant women and summary statistics were calculated. B If UO was found to have a positively skewed distribution, the mean will be less than the median. FALSE http://www.statisticshowto.com/probability-and-statistics/skewed-distribution/ True or False Scenario: Urinary oestriol (UO) was measured in a sample of 1000 pregnant women and summary statistics were calculated. Which statement(s) are correct? A The median is the UO value which occurs most frequently B If UO was found to have a positively skewed distribution, the mean will be less than the median. C The interquartile range is affected by outliers D A large positive outlier will have no effect on the mean E The median is unaffected by outliers True or False Scenario: Urinary oestriol (UO) was measured in a sample of 1000 pregnant women and summary statistics were calculated. C The interquartile range is affected by outliers FALSE The IQR is the distance between the 25th and 75th percentile of the data (i.e. the middle 50% of the data) so is not affected by extreme values. True or False Scenario: Urinary oestriol (UO) was measured in a sample of 1000 pregnant women and summary statistics were calculated. Which statement(s) are correct? A The median is the UO value which occurs most frequently B If UO was found to have a positively skewed distribution, the mean will be less than the median. C The interquartile range is affected by outliers D A large positive outlier will have no effect on the mean E The median is unaffected by outliers True or False Scenario: Urinary oestriol (UO) was measured in a sample of 1000 pregnant women and summary statistics were calculated. D A large positive outlier will have no effect on the mean FALSE As all values in the data are included in the calculation of a mean, an outlier will influence the value of the mean. If the outlier is large and positive, it will inflate the mean. True or False Scenario: Urinary oestriol (UO) was measured in a sample of 1000 pregnant women and summary statistics were calculated. Which statement(s) are correct? A The median is the UO value which occurs most frequently B If UO was found to have a positively skewed distribution, the mean will be less than the median. C The interquartile range is affected by outliers D A large positive outlier will have no effect on the mean E The median is unaffected by outliers True or False Scenario: Urinary oestriol (UO) was measured in a sample of 1000 pregnant women and summary statistics were calculated. Which statement(s) are correct? E The median is unaffected by outliers TRUE The median is not influenced by outliers compared to the mean. The calculation of the mean uses the actual numerical value of the outlier, whereas the median does not. However, if an outlier is added into our data at a later time, the median will move to a new “middle value”. Now we know what type of data we have and how to present it, we can ask questions of the data. We might be comparing different groups within our data. We do this using hypothesis tests… Hypothesis tests Hypothesis testing is a way for you to test the evidence provided by your data to see if you have meaningful results. By “meaningful” we mean that they have not happened by chance. Hypothesis testing checks that your results are valid by figuring out the probability that your results have happened by chance. Hypothesis Tests Research Question: Is the mean VAS score in group A different than in group B? Mean VAS score: Group A : 40.2 Group B: 68.3 Hypothesis Tests Research Question: Is the mean VAS score in group A different than in group B? Mean VAS score: Group A : 40.2 Group B: 68.3 Null Hypothesis: “The mean VAS scores in group A and B are the same” Alternative Hypothesis: “The mean VAS scores in group A and B are different” Now we have our research question and have generated our null hypothesis we want a way of testing our null hypothesis. To do this we use a p-value. P-Value What is a P-Value? P is the probability of observing data at least as extreme as you have done, if the null hypothesis were true. Basically…. We are asking “ if there were no treatment effect (i.e. treatment A = treatment B), how probable would the data be in our sample?” P-Value If P ≤ 0.05 …. null hypothesis is not true there is a difference between A and B result is statistically significant REMEMBER: it is important to consider whether your result is clinically meaningful too! P-Value If P > 0.05 …. null hypothesis may be true there is not enough evidence of a difference between A and B result is not statistically significant Now we know what a p-value is, we can return to our hypothesis test! Hypothesis Test Research Question: Is the mean VAS score in group A different than in group B? Mean VAS score: Group A : 40.2 Group B: 68.3 P=0.02 Hypothesis Tests In short… Hypothesis tests measure the strength of evidence provided by the data for or against some question/proposition of interest Now we have a p-value, we want to present our data with summary statistics. In our case, our “summary statistics” are the mean VAS in each group and the difference between the means. Alongside the p-value and summary statistics, we need a measure of the precision of the results. ….this is a confidence interval! Interpreting a Confidence Interval A confidence interval is a range of computed values, in which the population value of an effect size may lie. A measure of precision Now… what does it mean in non-statistical terms? It gives a range of values for the effect, that is consistent with your data. You have your point estimate then the 95% CI is the “give or take a bit” Confidence Interval EXAMPLE: Question - Is the mean VAS score in group A different than in group B? “ The estimated mean difference in VAS score was 28.1 with 95% confidence interval (8.1, 48.1)” Confidence Interval We can tell that our difference is statistically significant because our 95% confidence interval does not contain 0. If the interval does not contain 0 (or whatever our null hypothesis states, which in our example is that there is no difference) p