Lecture - Research Methods and Biostatistics PDF
Document Details
Uploaded by AlluringActionPainting
Al-Ahliyya Amman University
Dr. Walhan ALSHAER
Tags
Summary
This lecture covers research methods and biostatistics, focusing on undergraduate students. Dr. Walhan ALSHAER from Al-Ahliyya Amman University and the University of Jordan presents the concepts of measures of central tendency, such as mean, median, mode, and geometric/harmonic mean. It also explores measures of variability, including range and standard deviation.
Full Transcript
Research Methods and Biostatistics طرق البحث العلمي واإلحصاء الحيوي For Undergraduate students Dr. Walhan ALSHAER Director of Pharmacological and Diagnostic Research Center Al-Ahliyya Amman University (AAU...
Research Methods and Biostatistics طرق البحث العلمي واإلحصاء الحيوي For Undergraduate students Dr. Walhan ALSHAER Director of Pharmacological and Diagnostic Research Center Al-Ahliyya Amman University (AAU) And Senior Research Scientist Cell Therapy Center The University of Jordan 1 Lecture: Introduction to Biostatistics Basic Statistical Concepts 2 Basic Statistical Concepts A. Descriptive Statistics: Measures of Central Tendency and Variability Descriptive statistics help summarize and describe the features of a dataset. They provide insights into the central tendency and variability of the data. 3 1. Measures of Central Tendency a) Arithmetic Mean (AM) The arithmetic mean is the sum of all values in a data set divided by the number of values. It is the most common measure of central tendency. Mean: The arithmetic average of all data points. Mean=Sum of Terms/Number of Terms Example: If you have data points of 2, 4, 6, and 8, the mean would be: Mean=(2+4+6+8)/4=5 4 1. Measures of Central Tendency b) Geometric Mean (GM) The geometric mean is the root of the product of all values in the data set, where n is the number of values. It is useful when the data set contains exponentially related values or varies over a wide range. 5 b) Geometric Mean (GM) Example. Which portfolio do you prefer, i.e. which has a better typical year? Portfolio A: +10%, -10%, +10%, -10% Portfolio B: +30%, -30%, +30%, -30% They look pretty similar. Portfolio A: Year-over-year average: (.98)^(1/4) = 0.5% loss per year. Portfolio B: Year-over-year average: (.83)^(1/4) = 4.6% loss per year. 6 1. Measures of Central Tendency C) Harmonic Mean (HM) The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals of the values. It is useful when dealing with rates or ratios, such as speed or density. 7 Harmonic Mean (HM) Example: If I have a rate of 30 mph, it means I get some result (going 30 miles) for every input (driving 1 hour). When averaging the impact of multiple rates (X & Y), you need to think about outputs and inputs, not the raw numbers. If we put both X and Y on a project, each doing the same amount of work, what is the average rate? Suppose X is 30 mph and Y is 60 mph. If we have them do similar tasks (drive a mile), the reasoning is: X takes 1/X time (1 mile = 1/30 hour) Y takes 1/Y time (1 mile = 1/60 hour) Combining inputs and outputs we get: Total output: 2 miles (X and Y each contribute “1″) Total input: 1/X + 1/Y (each takes a different amount of time; imagine a relay race) 8 1. Measures of Central Tendency d) Mode The mode is the value that occurs most frequently in a dataset. It is particularly useful in specific situations: When You Need the Most Common Value In categorical data, where arithmetic operations are meaningless (e.g., the most common color of cars sold). For identifying popular trends or preferences (e.g., the most purchased product size). Example: If your dataset is: 3,5,7,7,10, the mode is 7. When the Distribution Is Multimodal It highlights all peaks in multimodal data, offering insights into distinct groupings or clusters within the data. 9 1. Measures of Central Tendency e) Median The median is the middle value when data is arranged in order. It divides the dataset into two equal halves and is particularly useful when: Data Has Outliers or Skewed Distributions Unlike the mean, the median is not affected by extreme values. This makes it a better representation of the "typical" value in skewed distributions. Example: For 2,3,4,10,100 data Mean = (2+3+4+10+100)/5=23.8 (inflated by the outlier 100). Median = 4, which better reflects the center of the data. 10 2. Measures of Variability (Dispersion) These measures describe how spread out the data is. a) Range: The difference between the largest and smallest values in the dataset. Example: For the data set [2, 4, 6, 8], the range is: Range=8−2=6 b) Variance: A measure of how far each data point is from the mean. s2 = Sample variance Variance=> s2 n = Number of observations in sample xi = Xi the observation in the sample x= Sample mean 11 2. Measures of Variability (Dispersion) c) The standard deviation (SD) is a measure of how spread out the values in a dataset are around the mean. It indicates variability or dispersion, helping understand whether the data points are close to the mean or widely scattered. Standard Deviation: The square root of the variance. s2 = Sample variance n = Number of observations in sample xi = Xi the observation in the sample x= Sample mean A higher standard deviation means more variability. 12 Example Here’s an example with a dataset: [5, 7, 8, 9, 9, 10]. Mean: Median: Mode: Range: Standard Deviation: 13