Biostatistics: Data Collection & Sampling Methods PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document provides an overview of data collection and sampling methods in biostatistics. It covers key concepts such as descriptive and inferential statistics, different types of variables (qualitative and quantitative), various sampling methods (probability and non-probability), and levels of measurement (nominal, ordinal, interval, ratio). It also introduces concepts related to demography.
Full Transcript
Biostatistics rosmo DATA COLLECTION AND SAMPLING METHODS census – actual measurement of all elements from population...
Biostatistics rosmo DATA COLLECTION AND SAMPLING METHODS census – actual measurement of all elements from population – can be viewed as sample that includes entire population Key Words parameter – numerical characteristics of population statistics – collection, organization, presentation, analysis, and interpretation of numerical data to obtain information statistic – measured characteristic of sample – steps 1. data collection – obtaining information Sampling Methods 2. organization – presenting data in tables/graphs/charts probability sampling – every element in population has equal chance for logical and statistical conclusions – simple random: chance methods/random numbers 3. analysis – extracting relevant information – systematic: selecting every nth subject for formulating numerical description – stratified random: choosing representatives from each group 4. interpretation – drawing conclusions from data – cluster: choosing an entire group as subjects – 2 major areas nonprobability sampling – some degree of personal subjectivity 1. descriptive statistics – describing properties – convenience: cheap, quick, and chooses based on availability 2. inferential statistics – prediction, estimation, conclusion : for quick user opinion polls/pilot testing biostatistics – specialized for fields of biology, medicine, and health – quota: similar to stratified random sampling, – role of biostatisticians but requires people who match characteristics of segment a. specialists of data evaluation – purposive/judgmental: based on previous ideas of population b. make public health decisions from findings : intentional selection based on characteristics c. influence change at policy-making level Levels of Measurement d. improve science, bridging gap between theory and practice Note: Measurement assigns numerical value to variable. – importance in clinical research and public health nominal – only number for identifying membership in category a. design and development of research framework – mutually exclusive, exhausting categories without rankings b. data monitoring and management – e.g. zip code, gender c. data analysis and reporting ordinal – greater/lesser than; no precise differences between ranks d. clinical trial – e.g. preferences, rankings, grade e. enriching patient care interval – greater/lesser than with unit of measurement f. data trends – ranking and precise differences between units of measure g. epidemiological studies – no meaningful zero variable – characteristics/attribute with different values – e.g. IQ – qualitative/attributes ratio – interval with absolute zero and multiples are meaningful – quantitative – true ratios exist a. discrete –values in interval with gaps between them – e.g. height – whole number DEMOGRAPHY b. continuous – value at any point along interval – decimals Definition of Demography data – values that variables can assume – size, composition, and geographic distribution of populations – primary: generated by a researcher – big three for stability/change: birth, death, migration – methods of collecting primary data – central component of societal contexts and change a. surveys – solicit information from people Composition of Demography ▪ self-administered/mail survey – cover letter + postage basic demographic features – age, sex, family, household status ▪ personal interview – flexible but expensive social and economic – language, education, ethnicity, religion, income ▪ telephone interview Distribution of Demography b. direct observation – watch/listen count/measure multiple levels – local, regional, national, global c. experiments – identify cause-and-effect types of boundaries – political, economic, geographic – secondary: gather by someone else for other purpose Tools of Demography – methods of collecting secondary data count – absolute # that is the basic of statistics a. internal – generated by own firm/organization rates – frequency – e.g. annual reports – crude rates: computed for an entire population b. external – gathered outside firm for other purpose – specific rates: computed for a specific subgroup – e.g. government agencies, published sources, ratio – subgroup to subgroup relationship commercial suppliers proportion – subgroup to population relationship questionnaire/data collection instrument – filled out by respondent or administered by interviewer constant (K) – unchanging, arbitrary # – 3 types of questions a. multiple choice c. open-ended cohort measures – events of a group with shared demographic exp. b. dichotomous – e.g. birth (most common; annually listed), marriage, school class – basic points a. must be short, simple, and clear period measures – events of all or part of a population; “snapshot” b. no leading questions Population Composition and Its Parameters c. no questions that may cause hesitation to answer Note: Population is a group of individuals of same species living and interbreeding within a given area. Inferential Statistics population size – # of individuals in a geographic range population/universe – entire set of objects of interest sample – smaller number/subset of objects within population 1 out of 2 Biostatistics rosmo 𝒏𝒐.𝒐𝒇 𝒊𝒏𝒅𝒊𝒗𝒊𝒅𝒖𝒂𝒍𝒔 graphical – visual/picture with numerical info density – population size and space ( ) 𝒖𝒏𝒊𝒕 𝒂𝒓𝒆𝒂 – histogram: contiguous vertical bars age structure – cohorts: age-specific categories (e.g. juvenile/subadult) of various heights fecundity – # of offspring produced : no gap between bars morality – # of deaths in a population, counterbalance to fecundity sex ratio – males and females Estimating and Projecting Population – frequency polygon: lines that connect points plotted for frequencies at midpoints of classes population estimate – annual size of a population between census periods or current year – data involved 1. population change (migration, fertility, mortality) 2. census results 3. change in population size projection – size of population in the future Basic Demographic Equation – ogive/cumulative frequency graph P2 = P1 + B – D + I – O population population natural increase net migration at any time at previous time (birth - death) (in - out) Vital Statistics vital statistics – vital events (e.g. birth/natality, death/ mortality, marriages/nuptiality, divorces) DATA PRESENTATION Presentation of Data Note: Raw data is not manipulated beyond original collection. textual – paragraphs that enumerate, emphasize, and identify data tabular – tables with columns and rows – e.g. 1. frequency distribution table – has classes and no. of cases 2. categorical frequency distribution table – specific categories like nominal/ordinal level data – stem-and-leaf plot: table with graph 𝒇 – frequency percentage: % = 𝒙 𝟏𝟎𝟎 : only for small data with positive values and >1 digit 𝒏 Other Concepts and Formulas – relative frequency: decimal/fraction equivalent of total # 𝑧2 ▪ 𝑝 ▪ (1−𝑝) 3. grouped frequency distribution – used when data is large 𝑒2 sample size (n) = 𝑧2 ▪ 𝑝 ▪ (1−𝑝) – rules a. 5-20 classes 1+ 𝑒2 ▪ 𝑁 b. class width is odd N = population size c. classes are mutually exclusive, continuous, and exhaustive z = z-score d. classes have equal width, E = margin of error (in decimal) except when it is open-ended p = standard deviation (0.5 if none) – steps 1. range – 𝑹 = 𝑴𝑨𝑿 − 𝑴𝑰𝑵 𝑁 Slovin’s formula (n) = 1+𝑁𝑒 2 2. no. of classes – 𝒌 = 𝟏 + 𝟑. 𝟑𝟐𝟐 𝒙 𝐥𝐨𝐠(𝒏) sampling error – expected error from observations in sample – round off k non-sampling error – mistakes in acquisition of data or observations 𝑹 3. class size – 𝒄= – non-responsive error and selection bias 𝒌 – round off c magic number 30 – rule of thumb for minimum sample size 4. class interval – lower limit and upper limit – application of Central Limit Theorem 5. frequency of each class 6. true class boundaries – continuity property 𝑳𝑻𝑪𝑩+𝑼𝑻𝑪𝑩 𝑳𝑳+𝑼𝑳 7. class mark – 𝑪𝑴 = = 𝟐 𝟐 – midpoint of a class 8. cumulative frequency – total # of observations in >CF or