Podcast
Questions and Answers
What is the primary purpose of descriptive statistics in the first stage of data analysis?
What is the primary purpose of descriptive statistics in the first stage of data analysis?
- To apply findings to a broader population
- To organize, summarize, and present data (correct)
- To generate complex mathematical models
- To make predictions beyond the sample data
Which of the following sampling methods ensures that every individual in the population has an equal chance of being selected?
Which of the following sampling methods ensures that every individual in the population has an equal chance of being selected?
- Snowball sampling
- Discretionary sampling
- Simple random sampling (correct)
- Intentional sampling
In stratified random sampling, what is the primary reason for dividing the population into subgroups (strata)?
In stratified random sampling, what is the primary reason for dividing the population into subgroups (strata)?
- To increase the cost-effectiveness of the study
- To ensure representation of key variables (correct)
- To simplify the data collection process
- To reduce the sample size
Which type of variable involves categories that cannot be logically ordered?
Which type of variable involves categories that cannot be logically ordered?
Which of the following is an example of a discrete variable?
Which of the following is an example of a discrete variable?
What is the first step in constructing a frequency distribution table for a set of variables?
What is the first step in constructing a frequency distribution table for a set of variables?
In a frequency distribution table, what does the cumulative frequency represent?
In a frequency distribution table, what does the cumulative frequency represent?
Which type of graph is most suitable for displaying qualitative data?
Which type of graph is most suitable for displaying qualitative data?
For quantitative continuous variables, which graphical representation is typically used?
For quantitative continuous variables, which graphical representation is typically used?
Which of the following describes the mode in a set of data?
Which of the following describes the mode in a set of data?
How is the median determined when there is an even number of observations?
How is the median determined when there is an even number of observations?
What does the range measure in a dataset?
What does the range measure in a dataset?
What is the coefficient of variation used for?
What is the coefficient of variation used for?
Which of the following is a limitation of non-probabilistic sampling?
Which of the following is a limitation of non-probabilistic sampling?
Which measure of central tendency is most affected by extreme values (outliers)?
Which measure of central tendency is most affected by extreme values (outliers)?
What formula is used to calculate the sample constant (K) in systematic sampling?
What formula is used to calculate the sample constant (K) in systematic sampling?
Which of the following best describes the goal of inferential statistics?
Which of the following best describes the goal of inferential statistics?
What is the primary concern when the basic condition of the sample is not representative of its respective population?
What is the primary concern when the basic condition of the sample is not representative of its respective population?
Which of the following is a key advantage of using grouped data in frequency tables when dealing with continuous variables?
Which of the following is a key advantage of using grouped data in frequency tables when dealing with continuous variables?
What should an investigator consider when determining the number of intervals for a grouped frequency distribution?
What should an investigator consider when determining the number of intervals for a grouped frequency distribution?
What is the purpose of calculating the “midpoint” in grouped data within a frequency table?
What is the purpose of calculating the “midpoint” in grouped data within a frequency table?
In cluster sampling, when is it useful to select groups?
In cluster sampling, when is it useful to select groups?
Which of the following can be considered a significant disadvantage specific to intentional sampling?
Which of the following can be considered a significant disadvantage specific to intentional sampling?
What determines the decision of an investigator to directly select the sample in discretionary sampling?
What determines the decision of an investigator to directly select the sample in discretionary sampling?
In constructing confidence intervals with stratified sampling, what must researchers account for to align results to a variable’s distribution?
In constructing confidence intervals with stratified sampling, what must researchers account for to align results to a variable’s distribution?
How would one standardize variables with an equal set of sample sizes?
How would one standardize variables with an equal set of sample sizes?
How could researchers analyze bias in the data based on subgroup distributions?
How could researchers analyze bias in the data based on subgroup distributions?
The data showed 53.3 % women and 46.7% men with an average age of 17.57%. It also reported the prevalence of cavities was about 60% of the people measured. If a student added more individuals to the study to equal 1,000 people, what would they need to watch to accurately measure the overall group if they can't measure every mouth?
The data showed 53.3 % women and 46.7% men with an average age of 17.57%. It also reported the prevalence of cavities was about 60% of the people measured. If a student added more individuals to the study to equal 1,000 people, what would they need to watch to accurately measure the overall group if they can't measure every mouth?
A public health official wants to rapidly assess community needs after a flood. They lack resources for a full probability sample. To quickly gather insights and preliminary data after a flood event what sampling technique would be BEST?
A public health official wants to rapidly assess community needs after a flood. They lack resources for a full probability sample. To quickly gather insights and preliminary data after a flood event what sampling technique would be BEST?
Imagine the following data: 1, 2, 3, 4, 5, 6, 7, 8, 999. What measure of central tendency would be LEAST sensitive to the outlier (999)?
Imagine the following data: 1, 2, 3, 4, 5, 6, 7, 8, 999. What measure of central tendency would be LEAST sensitive to the outlier (999)?
Flashcards
What is Statistics?
What is Statistics?
A branch of mathematics that collects, organises, analyses, and interprets data obtained from observations.
What is Descriptive Statistics?
What is Descriptive Statistics?
The initial phase of statistical analysis focused on organizing, tabulating, and graphing collected data.
What is Analytical Statistics?
What is Analytical Statistics?
A further step of statistical analysis that involves drawing about a population based on a sample.
What is Sampling?
What is Sampling?
Selecting a subset of individuals from a population to study.
Signup and view all the flashcards
What is a Population?
What is a Population?
The entire group of individuals which share common characteristics and are of interest to you.
Signup and view all the flashcards
What is a Sample?
What is a Sample?
A collection of individuals chosen from the population to represent it.
Signup and view all the flashcards
What is Sampling?
What is Sampling?
The process of choosing a sample from a population.
Signup and view all the flashcards
What is Sampling Unit?
What is Sampling Unit?
A group that is sampled in the population to the people that will participate in the sample.
Signup and view all the flashcards
What is Probability Sampling?
What is Probability Sampling?
Each member of the population has an equal chance of being selected.
Signup and view all the flashcards
What is Non-Probability Sampling?
What is Non-Probability Sampling?
Selection is based on convenience or judgment, not random chance.
Signup and view all the flashcards
What is Simple Random Sampling?
What is Simple Random Sampling?
Every member of the population is listed and numbered, and then randomly picked.
Signup and view all the flashcards
What is Systematic Random Sampling?
What is Systematic Random Sampling?
Members from the population are selected at regular intervals.
Signup and view all the flashcards
What is Stratified Random Sampling?
What is Stratified Random Sampling?
Population is divided into subgroups (strata) before random samples are taken.
Signup and view all the flashcards
What is Clustered Random Sampling?
What is Clustered Random Sampling?
The sample is drawn from random selected groups (clusters).
Signup and view all the flashcards
What is Intentional Sampling?
What is Intentional Sampling?
Individuals are selected based on ease of access.
Signup and view all the flashcards
What is Snowball Sampling?
What is Snowball Sampling?
Participants recruit others they know until enough subjects are enrolled.
Signup and view all the flashcards
What is Discretionary Sampling?
What is Discretionary Sampling?
The researcher chooses participants based on their judgement of who will be most useful.
Signup and view all the flashcards
What are Qualitative Variables?
What are Qualitative Variables?
Variables described by descriptive characteristics.
Signup and view all the flashcards
What are Quantitative Variables?
What are Quantitative Variables?
Variables measured by numbers.
Signup and view all the flashcards
What are Nominal Variables?
What are Nominal Variables?
Does not imply a numerical meaning, has unordered categories. E.g., sex.
Signup and view all the flashcards
What are Ordinal Qualitative Variables?
What are Ordinal Qualitative Variables?
The level of studies. Categories can be ranked. E.g., elementary, middle, high school.
Signup and view all the flashcards
What are Continuous Quantitative Variables?
What are Continuous Quantitative Variables?
Take any value on a numerical range like the variable age.
Signup and view all the flashcards
What are Discrete Quantitative Variables?
What are Discrete Quantitative Variables?
Can only take limited number of values like number of children.
Signup and view all the flashcards
What is a Frequency Distribution?
What is a Frequency Distribution?
Graph that organizes table distribution of frequencies.
Signup and view all the flashcards
What is a Rectangle Diagram?
What is a Rectangle Diagram?
Graphs using rectangles for qualitative variables.
Signup and view all the flashcards
What is a Sector Diagram?
What is a Sector Diagram?
Diagrams that show variable data represented.
Signup and view all the flashcards
What is a Bar Diagram?
What is a Bar Diagram?
Graphical representation for discrete variables.
Signup and view all the flashcards
What is a Histogram?
What is a Histogram?
A graph for continuous quantitative variables.
Signup and view all the flashcards
What are Frequency Polygons?
What are Frequency Polygons?
Graph showing continuous data.
Signup and view all the flashcards
What are Central Tendency Measures?
What are Central Tendency Measures?
The average around which most of the data is.
Signup and view all the flashcardsStudy Notes
- Descriptive statistics is covered in Unit 5.
Introduction
- Records values from a dental exam of 30 people from an epidemiological study.
- It is difficult to ascertain the condition of their oral health with 240 data points for age, sex, cavities, fluorosis, and gingivitis.
- It is easy to know the general situation of some oral health parameters of the 30 people if you look at a second data set relating to these values.
- Statistics transforms the multiple data points into a small number of figures.
- Statistics is a math science with processes for recording, organizing, synthesizing, and analyzing data from observation.
- It is a tool used in many disciplines, including epidemiology.
- The first is descriptive statistics, which aims to:
- Arrange data into tables and graphs from any study
- Describe data with numerical indices and statistics.
- The second step is analytic or inferential statistics, which does the following:
- Analyze data for information beyond the study sample.
- Apply information to a population and make predictions.
Sampling
- It is impossible to study all the people in a demographic study.
- You can perform a sample of the subject of focus to perform research.
- If the chosen population represents the origin population, the results can be applied or inferred to the group of origin.
- Common terms in epidemiological studies:
- Population/universe/collective: a group with common characteristics chosen by the researcher
- Sample: a set of individuals chosen from the population to be studied
- The sample must represent the population by sharing the characteristics that define it.
- Researchers infer the results of the sample study to the total community population
- Sample size: Number of individuals in the sample.
- Statistical protocols determine the size based on population size and frequency of the phenomenon under study.
- Sampling: Selecting a sample of study from a population.
- Sampling unit: Group of individuals in the population from which the sample participants are chosen; can be the full population or specific groups.
Types of Sampling
- Sampling is a first critical step in any study.
- Conclusions cannot be applied if study participants are not selected well.
- A sample must reflect its population by presenting the characteristics that define the population.
- Study external validity will be affected if samples are not representative.
- Probabilistic Sampling:
- All individuals have the same chance of being chosen for the sample.
- Ensures the sample represents the population since selection biases are removed.
- Selection can happen with varied procedures.
- Simple Random Sampling:
- Done when the population size is not high.
- A complete list of all population members is available beforehand
- Assign a number to each individual and choose numbers randomly.
- Systematic Random Sampling:
- Requires a list of the study population.
- Calculate a sampling constant K by dividing the total population by the sample size (K = N/n).
- Choose a random number r between 1 and K.
- Take the listing of population individuals and count K individuals from position r, selecting the components of the sample.
- The initial list should be made without pre-defined criteria; the results could be erroneous if there is criteria pre-determination.
- Stratified Random Sampling:
- Most suitable when specific variables need respect.
- Researchers should select the sample considering the distribution of these variables in the source population.
- The distribution of variables must be known in the population when performing this sample type.
- Once you know how many individuals are needed from each stratum, choose randomly via simple sampling.
- Random Cluster Sampling:
- Sampling is from pre-selected groups, instead of the entire, complete population.
- Each of these groups is called a cluster.
- Is used for convenience to simplify the election.
- Commonly used clusters include hospitals, schools, or universities.
- Select a group of clusters that meets the population characteristics, then choose the components of the sample.
- Useful when the population is very large and spread out
- Only information on people in the cluster is required.
- Non-probabilistic Sampling:
- Individuals do not have the same probability of sample selection, making population inferences more difficult.
- Selected samples can be unrepresentative.
- This method is for studies when there are not enough resources for probabilistic sampling.
- Intentional Sampling: Choosing individuals for a sample based on the ease of access to a specific group.
- Snowball Sampling: Start with an initial number of individuals; then, ask them to find other people.
- Discretionary Sampling: The researcher directly selects the study sample from a group they believe will be useful based on their criteria.
- A representative sample from the origin population is needed.
- Probabilistic sampling ensures representativeness for external validity and result inferring.
Types of Variables
- Epidemiological research involves collecting information from sample individuals.
- "Data" collected from each person corresponds to the variables studied.
- Researchers should record the number of cavities in people, as well as the age, sex, and nationality of the test subject.
- Variables include:
- Age.
- Cavities present.
- Number of oral pieces.
- Sex.
- Nationality.
- The variable type will determine which statistical analysis/graphical representation can be used.
- Qualitative variables: Expressed through words with qualities.
- Each quality of the variable is a category.
- Variable sex can be male or female (two categories).
- Marital status can be: married, separated, single, or widowed (four categories).
- Variables can be nominal or ordinal:
- The sex variable is nominal, and the categories are expressed with names that don't have a type of order; it does not matter if men/women are listed first.
- Ordinal qualitative variables like "level of required studies" can be ordered logically as: infant, primary, and secondary.
- Quantitative variables: Expressed with numbers
- Can include the following:
- Continuous quantitative: The variables can take any value of the numerical scale.
- Includes age.
- Discrete quantitative: Can only take a limited number of values.
- Number of children (whole numbers only).
Ordering and Tabulating Data
- Value is determined for each sample subject once study variables are set.
- Data is recorded on a data collections sheet that is made for each study.
- A descriptive study on caries would include the following data:
- Patient number.
- Presence of caries (yes/no).
- Number of decayed teeth.
- Number of absent teeth.
- Number of filled teeth.
- Total number of teeth.
- After data is collected for all variables, researchers construct a database or matrix to register all data from the subject sheets.
- All database columns consist of a variable, and contain collected information.
- Next you need to build the distribution table of frequencies.
- The following should be included:
- Total frequency (N): the total number of individuals analyzed.
- Absolute frequency (f): Number of individuals with a value of the variable X1
- Relative frequency (f): Quotient between each f1 and total population
- Cumulative frequency of X1 (fa): Sum of the past frequencies for a value X.
- The fa of the last X should match n.
- Each value X1 that the variable can take.
- The frequencies of each X1 are the second column (fi).
- The accumulated frequencies column is useful for median calculation.
- These frequencies are calculated by adding the frequency of an X with the frequencies of the prior X.
- Frequency of X1 =0 is 20 because there's no value before 0.
- In cases where data is vast, the table value is built organizing data in intervals (making analysis easier).
- The following protocol can be followed to construct intervals:
- Determine the range of X1 values (total number of different values): Range = (Ximax. – Ximin.) + 1.
- Determine the desired number of intervals
- Determine the amplitude or size of the intervals by dividing the range by the interval number.
- Build a new table by adding width to the amplitude that was calculated.
- the frequency of the upper interval limit, in this case the frequency of the value 5, which is 12, will always be included in the upper interval).
- A new Xi column calculating the marks of the new intervals will show on the new table.
- When working with frequency tables, each interval will be substituted by its value mark.
Graphical Representations
- After the compilation of collected and ordered data, research will begin recording each graphical value from each variable.
- Benefits to these representations are easy viewing and ideas on how the values are distributed.
- The display selected is based on variable type.
- Qualitative Variables:
- Rectangle diagram
- Sector diagram
- Pictograms
- Quantitative variables:
- Bar diagram
- Histogram
- Frequency Polygon
- Qualitative variables can be represented by rectangle diagrams containing the following information:
- Absolute frequency using the Y ordainment axis.
- X abcisas axis.
- Category number using number of rectangles depending on variable count.
- Equal rectangle width separated by same distance.
- Rectangle height corresponding to category display/frequency.
- Equal distribution in sectors that corresponds to the frequency.
- Quantitative discrete variables can be represented using a bar diagram.
- Similar to rectangles.
- Abscesses have a grade with the variable. Quantitative, continuous variables can be represented by histograms:
- The Y axis expresses the frequencies and the abscissa is the variable.
- Rectangles for each value of the consistent variable are next to each other
Measures of Centralization
- Include mode, arithmetic mean, and median.
- Designed to find the average values around which most data is centered.
- Only calculated for quantitative variables.
- Mode (Mo): most frequent value in the data
- Determined by searching for the value X with higher frequency in the frequency distribution.
- If there are two X values with same frequency, there are two modes.
- Arithmetic mean:.
- Sum of all X values in data, divided by the number of data.
- Median: X value in the central position, after ordering from lowest to highest value.
- This formula is used to find it easily with the following frequencies:
- Total values should be divided by 2
- Results for divisions can be found in frequency columns.
- Median values and values should correspond to the frequency, etc.
Measures of Dispersion
- How much the marks from the individuals' variable are close or far from the central tendency indexes.
- Insufficient for summarizing with variable type values.
- Central indexes will cause distribution types from 2 samples.
- Dispersion indexes have different interpretations and variables.
- Total amplitude: numerically distance between highest and lowest mark values.
- Interval is used to count marks.
- Formula is used.
- Variance: Media from each mark variance in regards to the value average
- Presented habitually like.
- Formula is used to calculate this.
- Deviation: The square root of the variance, which would yield an index with originally marked variables.
- Formula is used to calculate this.
- Coefficient of variation: Allows the comparison of data spreads between variables or distributions with different forms.
- The coefficient of variation is calculated from the normal deviation combined with the media.
Practical Case Statistics
- Descriptive statistical analysis is carried out on an epidemiological study to describe oral health using centralization indexes and dispersion.
- A descriptive epidemiological study was performed to determine the oral health of first-year students at a University of Madrid.
- A randomized, systematic sample was used to select the 30 components.
- Researchers gathered distinct variables on detention caries and gum condition.
- Studied data sheets that indicate variable order.
- Database shows column-numbered forms.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.