Biostatistics Notes PDF
Document Details
Uploaded by Deleted User
Tags
Summary
These are lecture notes on biostatistics, covering topics such as data sources, types of variables, levels of measurement, data collection methods, experimental design, and analysis. The notes also describe different types of graphs, charts, and diagrams, focusing on their application for descriptive statistics.
Full Transcript
Biostatistics 8/26/24 Chapter 1 biostats-field of stats related to science statistics collection of data organization and reporting - results sources of data surveys /observation stu...
Biostatistics 8/26/24 Chapter 1 biostats-field of stats related to science statistics collection of data organization and reporting - results sources of data surveys /observation studies - - experiment previously obtained records - random variable - variable with some chance or probability quantitative value (meaningful numeric data - Lex. number of total students in class , dollar amounts , qualitative observation (name label category( - Lex. color, wearing glass ? Zip code , social security numbers , sports jerseys continuous-entire spectrum possible (decimal can a lot ex. height weight , discrete countable values only gaps in a - number line # of students 5j ! 234 population entire group - to be studied sample-group selected study 8128/24 Chapter 1 Levels of measurement/measurement scales - listed from weakest to strongest * when goal collecting data strong levels of : - , measurement produce more powerful results data nominal-qualitative , or classification -categories , labels ex : gender wear glasses (yes or no , Ordinal-data can be ordered but the differences are unknown or unclear ex. pain scale 1-10 (Likert Scale) ; Top 100 college scale grading scale A ,B , CD , F (differce between 1 or 19) interval-data can be ordered, differences are known but no natural zero exists. ex Time. (years) Temp ( , F) , shoe size , ratio-data can be ordered , differences are known , natural zero exists &X age , time , height. Data collection Survey Methods -voluntary response - subjects can choose to respond or not (ex email). - connivence-easiest data for research to obtain random-everyone has an equal chance at Selection simple random-each section or pull is also random (SRS) stratified-break into groups randomly sample with groups cluster-break into groups , sample everyone from within select groups systematic 1st person - selected at random , then a pattern or process incentive based study 8130/24 Chapter 1 - Experimental Design Guidelines · purpose and goals of study · obtain people w/ disease · obtain "treatment" · obtain medical equipments personal · break subjects into groups ↳ placebo effect · get baseline data · apply treatment ↳ with personal observations for adverse effects including · collect data · data analysis (analyze results) · report conclusions Precision vs Accuracy Precision-how close the points are to each other Accuracy - how close the points are to a target value Correlation vs Causation causation - A causes B or B causes A correlation - relationship between A and B * Correlation does imply causation NOT - because two variables are correlated does not mean one is the other causing Chapter 2 8) 30124 Types of graphs , charts diagrams - Descriptive Statistics · common types of graphs/charts/tables/diagrams - Die/circle graph - line graph time series-line but time (sees , years) graph x= - - bar graph-vis qualitative 3 bars 11 do not touch -histogram- X is guantative bars touch - scatter plot - box and whisker plot pictograph pictures with a key - - - stem and leaf ploty , one Is · O ,6 0 90 2 5. , 9/4/24 common graph/charts. cont - dot plot (like a bar graph) - frequency distribution eX. Frequency i 18 24 C D ' - tally graph table of subjects - contingency - # must be in the table PlaceOn YAB) eX. Drug Pure - spreadsheets - Venn Diagrams Tree Diagrams 12 H - 17 is 32 22 - Flowchart - Array - List of numbers - ordered Array -frequency polygon -line graph Finding midpoint - uses midpoints for X take the two numbers - includes one grouping above or below divide by 2 given groupings eX 68 72 , 75 , 80 82 ,86 90 , 91 ,94 ,99. , ,. & 60-b N 70 79 Students - -frequency polygon -line graph Finding midpoint - uses midpoints for X take the two numbers - includes one grouping above or below divide by 2 given groupings eX 68 72 , 75 , 80 82 ,86 90 , 91 ,94 ,99.. , , should # ↓& 50 590 shape fresame - I be StudentsI g ⑳ stay 54 M disa 100-1098 *Relative fraction - or percentage must add up to relative frequency distribution relative bar % graph M 60-6918% Yo.I 70.19 20% 15. 2 80 8930 %. 3/10. 3 90 9948% - 25. 4 Er so tr si Grad Cumulative-total Cumulative Relative # of student percentage grade I 60-69 I 60-69 10%6 3 70-79 60-7930% 80-89 60-8960% 90-99 10 60-99100% Common Shapes (distributions) of Histograms Ermapure it ↳ height showed distributions => 1 skewed left ↳ retirement ages ↳ Skewedright - income 9/6/24 Cyclical won't ever goto zero constant up and down M Bimodal has to finish - and start a zero (2 peaks) TTTTTTTTi ex. rate of change in height rate of speed Trimodal-3 peaks (must start send at zero) Multimodal-3 or more peaks (still must start Bend at zero) uniform-all bars in theory have the even if sample results same height vary ex. rolling a dice deck of cards 7.7777.1 * Testbygiving by and figure o st Increasing Decreasing mel Imm = ageoflocallibraries &X. could have been bimodal , skewed right or left time in McDondale X = waiting of line at Y= # people could be trimodal , bell curve shapes and distributions of the normal bell curve -messkurtic middle Kurtosis-how peaked the data height is - -averge light bulb mesokurtic will always be zero - -- - - Leptokurtic - thin/narrow ex. ages in first grade Leptokurtic will always be + - Platykurtic - - flat - speed between two stop lights/signs average LED bulb - platykurtic will always be negative 9/9/24 Measures of Central Tendency mean-addingnumber then diving by sample size - - skewed data - median-center number of ordered set - mode-most frequently occurring number For example if each number occurs only once then no mode. ex 40 , 40 , 70 , 80 , 95 95.. we median Measures of spread I range-Max-Min 97 40 - 57 rang = Standard Deviations + Variances 6 steps for standard dev. 5 steps for sta. Var. 1. find the mean. take each number and subtract the mean 2 eX four numbers · for subtraction problems =. 3 Square each value from (2) 4 Add. each answer from (3) 5. divide by n-1 if data is from sample or unknown (from slep4TN If data is population 6 take root. square Find the varience 3 Sta dev. 84 ,92, 76 ,90 , 88 84 +92776 + 98788 = 86 5 84 8692 8676 8690 - - - - 8688-86 (2) + 62 + ( 10)2 + 42 + 22 168 = variance = 6 32. sta der. 9/11124 Coefficients of Variation (CV) St dev CV = percent. as a mean = = 7 3%. 5 Number Summary * minimum Firstquartile-percentimeidian a for lowef Q 3 (Third quartile) 15thpercentile (median of upper half - maximum Find 5 number a summary 71 74 , 79 , 81 , 85 , 89 ,91 , 94 94 , 99 , , min-71 Q1-79 02-87 93-94 Max-99 * ignore median when breaking odd lists Boxplots t 25% 25% 25% grads a & purpose - symmetric or not - - where the Outliers quartile start and stop IQR-inner quartile range for middle range I QR = Q3 - Q Outliers - low outlier is a pointbelow 01-1 5 IQR.. above Q3 + 1 5 IQR -high is a point.. Is there an outlier ? NO 79 - 1 5 (15). = 56 5. 94 + 1. 5(15) 79 22 - 94 + 22 5. 56.5 116 5. * Central tendency mean median mode 9/11/24 Chapter 3 Probability probability-likelyhood or chances of an event main ways to probabilities Classical/Theoretical ex.. a deck of cards P(success) of success n * Outcomes must be likely equally 9) 13/24 3 primary Methods for Probability - classical/theoretical ofsuccess of success relative frequency - - M - subjective - Best guess Bayesian Statistics given prior distribution, - - update the model as new information is known to end with posterior distribution - Elementary Probability Properties Plevent) must be between 0 and 100 percent. O and I Plsum of all possible outcomes) =I Definitions Independent : outcome or probability of event A does NOT impact/influence/affect outcome or probability of event B. Dependent : NOT independent I ↳Youtcomes - then changes mutually exclusive - events A and B cannot occur at the same time. Types ofSets Overlapping subjects maybe - in group A group B both or neither Disjoint - no overlap between groups AB Subset-everyone in A also in B Conditional Probability Plone event occuring given another event outcome) PLAIB) given complement - not in outcome set (opposite( P() P(A) + P() = 1 Multiplication Rule If A and B are independent n(A3 B) = n(A) n(B) · and p(A ? B) P(A) P(B) = · You have 20 +-shirts 10 pairs of pants and 2 pair of shoes How many different combinations are possible. 20 10 2.. = 400 A high school Consists of 43 Master lock dights , but the same number cannot be used twice in a row How many 3 digit codes are possible ? 43 42 42 = 75 852.. , P(2 coin flips both tails · 5..5 =. 25 Addition Rule-used for overlapping events eX. I had 140 students one semester and I brought incandy for halloween. I let students take a bag of M&MS , a bag of skittles , or both. * Ta total bags of skittles were taken To bags of MA Ms were taken 3) people took both. How many students didn't take and y Skittles M & M's 47 47 3238 32 38 79-32 2 didn't take candy 148 - 117 Glo 9/lb124 examples contingency table Probabilities - Titanic P (children) = 10"22x = 5% mired a P(surrived) /2201 = 32% Total 2201 Pladult) = 95 % Pladult died) = 65% Plchidren's surrved) & P(chidren's surrived) = 2 6%. 109t 711 - 57 = 35 % Pladult died) - 2201 or = 2092 + 1490 1438 = - 97 % 2201 P(surviving/chidren) = = 528 4(not) Surving (adult) 138 = 69 % = 2092 P(child/survived) = = 8% P (2 people both survived). screening Testing IntaOVID Rapid Testing (Last non-commerically) (available test) has covid ? Test - T = test + Test t F= test - D = has covid 778 J doesn't have covid = Formula Sensitivity= P(TID) ==. 75 15% or specificity = P(FIT) = 96 % + 658 = 94% accuracy= 20 778 If you don't know the population prevalence (% ) of a disease PredictedValue Positive (DVP) = P(DIT) 72% = PredictedValue Negative (PVN) = P(DIF) == 97 % If the population percent is known PVP= TID) P(D) -are given B (TID) PLD+ P(TIT)) PUNID(FID)P(b) & 9/18/24 Screen Sens Testing Spee PVD PVN RR OR If population unknown PVP = P(DIT) DUN = P(JIT) zo 00) PVP = ( =. =. 018 + PVN = 558 ( 685. 999) = 0 997. ( 999) + (00. RelativeRisk (RR) P(disease I P/disease I risk factor present risk factor absent S RR = RR = 21 4. You are 21 4 times. greater risk of having Covid if you test positive vs. testing negative. Odds(against) odds are failure to success. odds are 2 tol Probability 13 = odds are Stol Probability : " odds are 1 : 1 Probability : "a 120 + 100 100 + 165 test success odds ration= (test failures) OR (T0 (655)) = - (27(23) OR = 74 apopy Examl Topics and graphs diagrams - - varience and sta dev. - shapes of graphs ex skewed left or right , bimodel - meso , plato , and why - definitions-chap ! - normial Ordinal qualitative quantitative continuous > number lines height - # numberof discreat- gaps in number line question right disjount exam on overlapping subset are things independent/dependent 5 number Summary box plots outliers w/ formula ranges/mean/median mode addition rule with venn diagram multiplication rule sensitivity 2 spec. PVP PVN Whether OR RR s a good test accuracy nominal : name label group (names of people who took exam) Ordinal : order but differences unknown /letters of grades interval : differences known ; no zero starting point (pH) differces ratio : zero starting of point. percentages exam heigh weight 9123/24 Exam Review suppose I recorded cal count n=b Students 1420 , 1510 5 summary , - 2012500+2 , 200, 3000 Outliers Above Qs+ 1. 51Qs-Q) min-1420 2700 +(1 5 1190).. Q1 1510 = 2700 + 1785 92 : median = 2218 4485 Q3 = 2700 Max = 3 ,000 Below Q1 1 5 -. (1190) 1510-1785 no First 151 days in Pittsburgh Gale forecast Yes No g - i Yes 2b 17 - · No Totals 126 #3 Y Sensitivity : P(TID) P(15117) P15/17 = Specificity P(FIT) : : PVP = P (DIT) = 15/26 PVN P(D1T) 13 = = accuracy 15123 = = 151 relative risk RR = = 30 36 times the risk of it , happening odds ratio : bC = 84 1420 1510 1920 2500 2700 , 3 000 relative histogram in percents ↳A histogram in - whole numbers - P(gale)"/si Plgale and gale forecasted) is /is P(gale or gale predicted) = - 17726-15 28 151 151 P (2 days with no wind) (multiplication rule) # frequency polygon 0-999 O S 4 ,000 4 gaa ↳ - ⑧ - - ⑧ -99 5 iaa5 aga Buaa iyaa.