Summary

This document is a set of notes on statistical concepts, methods, formulas, and examples. Focuses on biostatistics and related calculations such as mean, median, mode, standard deviation, and variance. Contains practical examples and practice questions.

Full Transcript

BIOSTATISTICS Contents: 1. Mean, Median Mode, Standard deviation, Variance (mids) 2. Probability 3. SPSS 4. Discrete Random Variables / Continuous Random Variable 5. Bernoulli’s Trial 6. Binomial Poisson , Hypergeometric 7. Negative Binomial , Geometric Distribution 8. Binomial / No...

BIOSTATISTICS Contents: 1. Mean, Median Mode, Standard deviation, Variance (mids) 2. Probability 3. SPSS 4. Discrete Random Variables / Continuous Random Variable 5. Bernoulli’s Trial 6. Binomial Poisson , Hypergeometric 7. Negative Binomial , Geometric Distribution 8. Binomial / Normal Distribution 9. Std. Normal Curve 10. Normal approximation Importance of Statistics in Psychology - Biostatistics is majorly used for analysis for organization of data. - The use of basic statistical data involves mean, median and mode (the average value, the central value, most repeated values respectively). - Standard deviation & Variance are the tools which are used to analyze any major occurring event which is important for the identification of the side effect and adverse effect of the drug. - The probability indicates the duration and frequency of the drug. - SPSS (statistical package for social sciences) is an analysis tool which compares two or more than two sets of data. - The correlation of data is an important factor for analysis. - Hypothesis is a set of information which is provided on the basis of information gathered during the observation. Topic # 01: mean, mode and median Mean Q# 01: The following is the data of a hospital (no.of patients treated per day). Identify the mean of the given data. Q# 02. Find the mean of the first 10 odd integers. no.of integers: 1,3,5,7,9,11,13,15,17,19 ∑x= 1+3+5+7+9+11+13+15+17+19 = 100 Mean = 100/10 = 10 Q# 03. BP → 20 patients Range → 140 mm of Hg ∑x= 2800 mm of Hg Mean= ∑x/n = 2800/20 = 140mm of Hg Median Q# 01 32,6,21,10,8,11,12,36,17,16,15,18,40,24,21,23,24,24,29,16,32,31,10,30,35,32,18,39,12,20. Ascending order 6,8,10,10,11,12,12,15,16,16,17,18,18,20,21,21,23,24,24,24,29,30,31,32,32,32,32,35,36,39, 40 n= 30 (even) 𝑛 30 𝑡ℎ 2 = 𝑥1𝑡ℎ 𝑣𝑎𝑙𝑢𝑒= 2 = 15 𝑣𝑎𝑙𝑢𝑒 → 21 𝑛 30 𝑡ℎ 2 + 1 = 𝑥2𝑡ℎ 𝑣𝑎𝑙𝑢𝑒= 2 + 1 = 16 𝑣𝑎𝑙𝑢𝑒 → 21 𝑥1+𝑥2 21+21 Median = 2 = 2 = 21 Q # 02 Skewed Distribution It is the distribution in the connection between mean, median and mode which can be calculated by following formulas. mean= 3 median - 2 mean (OR) mean-mode= 3 (mean-median) Q#01. For a moderately skewed distribution, the mean and median are 26.8 and 27.9 respectively. What is the mode of distribution? Mode= 3 median - 2 mean mode= 3(27.9) - 2(26.8) mode= 83.71 - 53.6 mode= 30.11 Q#01. Find the median of following data. 1,3,5,7,9,11,13,15 𝑛 8 8 2 = 2 = 4th value, 2 +1 = 5 7+9 16 2 = 2 =8 Variance → Std. deviation Q#2. 18,22,19,25,12 18+22+19+25+12 96 Mean = 5 = 5 = 19.2 Individual varian (x-mean) (x-𝑥) 2 (x-𝑥) 18-19.2 -1.2 1.44 22-19.2 +2.8 7.84 19-19.2 -0.2 0.04 25-19.2 +5.8 33.64 12-19.2 -7.2 51.84 ∑= 94.8 ( ∑ 𝑥 −𝑥 ) 94.8 Variance: 𝑁 = 5 = 18.95 Standard deviation: 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 18. 95 = 4.35 Q#3. 7.7, 7.4, 7.3, 7.9 7.7+7.4+7.3+7.9 Mean = 4 = 7.575 Individual varian (x-mean) (x-𝑥) 2 (x-𝑥) 7.7-7.575 0.125 0.015 7.4-7.575 -0.175 0.03 7.3-7.575 -0.275 0.07 7.9-7.575 0.325 0.105 ∑= 0.220 ( ∑ 𝑥 −𝑥 ) 0.220 Variance: 𝑁 = 4 = 0.055 Standard deviation: 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 0. 055 = 0.234 Q#2. 112,100,127,120,134,118,105,110 112+100+127+120+134+118+105+110 Mean = 8 = 115.75 Individual varian (x-mean) (x-𝑥) (x-𝑥) 2 112-115.75 -3.75 14.0625 100-115.75 -15.75 248.0625 127-115.75 11.25 126.56 120-115.75 4.25 18.06 134-115.75 18.25 333.06 118-115.75 2.25 5.06 105-115.75 -10.75 115.56 110-115.75 -5.75 33.065 ∑= 893.495 ( ∑ 𝑥 −𝑥 ) 893.495 Variance: 𝑁 = 8 = 111.6868 Standard deviation: 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 111. 6868 = 10.568 Calculation of Variance and Std. deviation up to three significant figures 1. All non zeros are significant 1-9 2. 0.00001= 1 3. 2.000= 4 4. 5001= 4 5. 50000= 5 6. Exponents are non-significant Q# 10,16,12,15,9,16,10,17,12,15 10+16+12+15+9+16+10+17+12+15 Mean = 10 = 13.2 Individual varian (x-mean) (x-𝑥) 2 (x-𝑥) 10-13.2 -3.2 10.25 16-13.2 2.8 7.84 12-13.2 -1.2 1.44 15-13.2 1.8 3.24 9-13.2 -4.2 17.64 16-13.2 2.8 7.84 10-13.2 -3.2 10.24 17-13.2 3.8 14.44 12-13.2 -1.2 1.44 15-13.2 1.8 3.24 ∑=77.6 ( ∑ 𝑥 −𝑥 ) 77.6 Variance: 𝑁 = 10 = 7.76 Standard deviation: 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 7. 76 = 2.78 Probability Probability is defined as probability of an event to occur in a specific period of time which is dependent on two factors, including the sample and the event Sample in the total number of possible outcomes while event is the number of probability occurring at a particular time In some cases, the sample size is constant. For e.g., A deck of cards, a rolling dice shows six digits, number of days in a week, number of days in a particular month, 29 days of February in a leap year. In other cases, the sample size is mentioned. In some of the cases, the events are not mentioned,For example; Number of odd or even digits after rolling a dice, number of odd or even days in a given month. The ratio of events and samples is termed as probability. The value of probability is never greater than one or it cannot be in negative digits. Probability in biostatistics The following are the importance of probability in biostatistics. 1. To identify the given data and to analyze in different graphical representations 2. For hypothesis testing, or different Population 3. For calculation of sample size and its implication 4. For risk assessment The probability is sometimes dangerous, for example in a hospital, the number of patients treated were up to 10,000 for the hospital is two. So as per probability, the division of sample −5 and event is 2∗ 10. 𝑖. 𝑒. nearly a negligible number, while 2 individuals lost their lives. Limitations of Probability 1. Human judgment is derived while using probability 2. The occurrence of extreme events is a struggle while using probability 3. Calculation of stationary data is a problem using Calculation of stationary data is a problem using ability probability. Drawbacks of Probability 1. Probability can provide false data due to assumptions 2. For sensitive issues probabilities not used which includes Disease and disaster management Practice questions Q: One card is drawn from the deck of cards, well shuffled.calculate the probability that the card will be a. Be an ace b. Be not an ace 𝑒𝑣𝑒𝑛𝑡 4 a. P= 𝑠𝑎𝑚𝑝𝑙𝑒 = 52 = 0.077 𝑒𝑣𝑒𝑛𝑡 48 b. P= 𝑠𝑎𝑚𝑝𝑙𝑒 = 52 = 0.923 [OR] Total Pr= 𝑝𝑎+ 𝑝𝑏 = 1-0.77 = 0.923 Q: A type of manufacturing company kept a record of the distance covered before a tyre needed to be replaced. The table shows the result of 1000 cases. Distance Less than 4000 4000-9000 9001-14000 More than 14000 Frequency 20 210 325 445 If a tyre is bought from this company, what is the probability. 1. It has to be substituted before 4000 km is covered (20) 2. It will lost more than 9000 km 3. It has to be replaced after 4000 km and 14000 km is covered. 20 a. P = 1000 = 0.02 770 b. P = 325 + 445 = 1000 = 0.770 535 c. P = 1000 = 0.535 (i.e. 210+325=535) Q: The percentage of marks obtained by a student in the monthly test are given below. Test 1 2 3 4 5 % 69 71 73 68 74 Based on the above table, find the probability of students obtaining above 70% and between 60-70%. 3 1. P= 5 = 0.6 2 2. P= 5 = 0.4 Q: from a well-shuffled deck of cards, what is the probability of a. a king is picked b. A king or queen is picked c. A king or queen or a jack is picked King→4 , queen→4, jack→4, 52→deck of cards 𝐸 4 a. P = 𝑆 = 52 = 0.076 𝐸 4+4 b. P = 𝑆 = 52 = 0.153 4+4+4 c. P = 52 = 0.230 Rules of Probability 1. Complement rule: According to the complement rule, the possibility of an event is calculated by subtracting the value of another component from ‘one’. This is only applicable for two possible outcomes. P(A) = 1 - P(A’) 2. Multiplication rule: To obtain the probability of an event, the possible outcomes are multiplied. This rule is applicable without replacement. The concept of with replacement is applicable for maintaining the initial and final quantities. 3. Addition Rule: The possible outcomes of one sum up if the condition is to get an event in either condition, for e.g., throwing a fall dice, what is the probability to get an odd number and to get 5 or 6. 4. Total probability: for a system containing more than two possible outcomes the total probability is calculated as the sum of all possible outcomes, P (1) = 0.4 P (2) = -0.4 Practice Questions Permutation Combination It is a type of probability which mainly focus It is a kind of probability calculation in on the specific arrangement of the possible which the sequence or order does not have outcomes an impact on outcome. E.g., Selection of players in the cricket team Combination is usually preferred when the needs a Specific order otherwise the quality selection is not important or any voluntary of game is affected services are required. In medical science, The arrangement is In medical sciences, combination is of important when it comes to the ‘dose importance during clinical trials, where the regimen’. It also helps in setting up a sequence or order doesn’t matter. pharmacy for the arrangement of medicine. Specifically, 1. Linear permutation 2. Circular permutation 3. Nonlinear permutation Formula It can be calculated by formula. The possible outcomes of combinations are sometimes greater due to no sequence. - If order/Sequence matters, then its - If Order/Sequence does not matter permutation then its combination Permutation Find the number of permutations of six objects taken three at a time. Q# A team of three people is to be selected from a group of seven people. How many different ways can the team be selected if the order of selection A team of three people is to be selected from a group of seven people. How many different ways can the team be selected if the order of selection mattress matters. n=7, r=3 7! 7*6*5*4*3*2*1 (7−3)! = 4*3*2*1 = 210 Q# Calculate the permutation of picking an object four times out of set of 18 objects. n=18,r=4 18! (18−4)! = 73440 Q# How many ways can six people stand in a line such that two specific people are always next to each other. 2! x 5! = 2 x 120 = 240 Q# What is the possibility of picking a team of 7 individuals from a class of 15 students. n=75, r=7 15! (15−7)! = 32432400 Q# In how many ways a team of 7 individuals to be picked from a class of 8 boys and 9 girls. n=8+9=17 , r=7 17! (17−7)! = 98017920 Q# In how many possible ways apple can be written if both the P’s are at the starting point n=5, r=3 5! (5−3)! = 60 Combination Q# In how many ways a word elephant can be written, starting with the letter ‘A’ 𝑟 𝑛! 𝑛𝑐 = 𝑟!(𝑛−𝑟)! 7! Q# In how many ways a team of 3 students is selected from the class of 5 girls and 2 boys, mentioning that the team must contain both the genders. n=7, r=3 7! 4 Q# In how many ways a book can be picked from a shelf of 2 history, 3 literature and 2 medical books. r=1, n=7 7! 1!(7−1)! =7 Q# A deck of 52 cards has 4 suits, clubs, diamonds, heart and spades. How many combinations of three cards can be formed from the deck. n=52, r=3 52! 3!(52−3)! = 22100 Q# A committee of six people is to be formed from a group of 12 men and 11 women. In how many possible ways the committee can be formed. n=23, r=6 23! 6!(23−6)! = 100947 Q# A committee of 5 people is to be formed from a group of 10 men and 8 women. In how many ways it could be arranged. n=18, r=5 18! 5!(18−5)! = 8586 After mids Biostatical Calculation 12-11-24 Statistical Testing 1. T-Test: It is used to compare the mean of two different groups. For example: comparison of blood groups in two different groups. 2. ANOVA: Compare the mean of three or more groups. For example: comparing effects of different treatments on blood pressure. 3. Chi-Square Test: Test independence of two categorical variables. For example: relationship between smoking and lung cancer. 4. Paired T-Sample Test: The test is used to identify the mean of the data before and after treatment. For example: Lithium Carbonate is used for bipolar disease for the treatment of Hallucination. The data before and after the treatment is analysed by using, “Paired T-Sample Test”. 5. Independent Sample T-Test: Independent sample t-test is used for identification of mean data of two different groups. The sample size must be the same and there must be an independent factor. For example: The effect of antidepressants is observed in smokers and non-smokers. An effect of data is observed using an independent sample t-test. 6. Regression Analysis: It is used for comparison of two variables, which are interdependent. For example: The spending pattern and the income. It could be used for more than two factors. 7. Spearman Analysis: The analysis is used for comparison of two non-linear factors. For example: The relationship between age and happiness. Non-linear factors means no direct connection of two factors. For example: The side effects of drugs are linear while off-label use of drugs is non linear. Linear → depend on condition: schooling → age Non-linear → independent of condition: happiness → age 8. Pearson Correlation: The correlation is used to analyse linear factors. For example: The relation between study hours and exam score. The two other linear factors include the dose frequency and effect of the drug. The linear factors are dependent variables on each other. 9. Partial Correlation: The type of correlation which involves the analysis of a third factor, which impacts on the relationship of linear factors. For example: The body weight is decreased after exercise but the diet plan is the 3rd factor which directly impacts the body weight. The use of painkillers will relieve the pain but anaerobic respiration is the 3rd factor which increases the concentration of lactic acid which results in muscle fatigue. 10. Z-Test: Z-Test is the comparison between sample mean and population mean. For example: The comparison of body mean index of sample mean of the BMI of the national population. The effect of a drug is compared to the national effect of the drug. Discrete Random Variable A type of random variables which can be countable, unique, isolated or mutually exclusive. The term countable means number of possible finite or infinite values. The term unique means the next value is different from the previous The term isolated means there is no intermediate values between two consecutive value The term mutually exclusive means one value at a time. 19-11-24 Discrete Random Variable characteristics: The following are the characters of discrete random variables 1. The values are countable or finite 2. The values are unique 3. The values are without a specific range Examples The examples include: a. No. of patients b. Possible outcomes in a clinical trials (3 phases premarketing, 1 phase postmarketing) c. Categorical values The discrete random variables are categorised into four categories. 1. Binomial Distribution It is a discrete random variable, which mainly focuses on the number of successes in a fix number of independent trials. The following must be the limitations of the binomial distribution: 1. Number of trials must be fixed 2. Probability of success must be constant 3. Trials must be independent 4. Only two possible outcomes Examples: 1. Assessment of quality control parameters and quality assurance to ensure the success affects of a product in a particular batch. 2. The medical treatment responses against a particular disease 3. To identify risks in any financial strategy 4. To identify the drug effect in a clinical trial with complete effect of the drug Bromazepam —->sedative drug —-> highly potent (0.5mg) 2. Normal Distribution It is also named as gaussian distribution or bell curve distribution. It is a continuous probability distribution that describes how the data are distributed around a curve. The following are the properties of normal distribution. 1. The data must be positive 2. The data must be of different individuals 3. Mid point of the values must be calculated for identification of bell curve Examples a. The normal distribution is used for analysis of blood pressures and any of the data having direct impact on the health environment or financial values Normal Distribution in Biostatistics 1. Analysis of data 2. Hypothesis testing 3. Confidence intervals 4. Medical research(Clinical trials) 5. Pharmacology of drugs Types of Normal Distribution 1. Normal distribution of mean zero ‘0’ & SD1, is called as standard deviation 2. Enormous distribution in which mean & Standard deviation is not equal to zero and one respectively is called as non-standard deviation Characters of normal distribution 1. It must be symmetrical about the mean 2. The curve should be a bell curve 3. The Mean, median, mode must be equal 4. The standard deviation must be major of dispersion of the data

Use Quizgecko on...
Browser
Browser