Podcast
Questions and Answers
Study Notes
Introduction to Statistics
- Statistics is a field of study dealing with the collection, analysis, interpretation, presentation, and organization of data.
- It is used to understand patterns, trends, and relationships within data, often used to make predictions or decisions.
Types of Data
- Categorical Data: Data that fits into categories or groups, like gender, color, or type.
- Quantitative Data: Data that can be measured numerically, like height, weight, or temperature.
Sampling
- Sampling is a process where a researcher selects one or more cases from a larger group (population) for study.
- Important for studying populations too large to collect data on every member.
- Crucial for generalizing findings to the entire population.
Sampling Methods
- Simple Random Sampling (SRS): every member of the population has an equal chance of being selected. Can be with or without replacement.
- Systematic Sampling: Every kth member of a population is selected.
- Stratified Random Sampling: The population is divided into subgroups (strata). Then random samples are drawn from each stratum.
- Cluster Sampling: A sampling method where the population is divided into groups (clusters). Then entire clusters are randomly selected.
- Convenience or Accidental Sampling: The researcher selects the most accessible individuals or cases.
Data Collection Methods
- Questionnaires: A structured set of questions used to collect data from individuals. Can be answered in person, by mail, phone, or online.
- Recording: Recording data collected through observation.
- Qualitative Methods: Methods used to find information through observation, watching, listening, or reading.
Sample Size
- Sample size is the number of individuals selected for observations.
- Precision (Acceptable amount of error)
- Population Homogeneity (Variability in pop.)
- Sampling Fraction (relative number of elements in sample to pop.)
Sampling Fraction Adjustment
- n' (adjusted sample size) = n (estimated sample size without adjustment) / [1+(n/N)]
- N: population size
Non-Probability Sampling
- Availability sampling: Uses readily available and accessible participants.
- Snowball sampling: Participants refer other participants to take part.
- Quota sampling: Samples are selected to match characteristics of the population across multiple subgroups.
- Purposive sampling: Selects participants based on their specific characteristic.
Spurious Relationships
- A spurious relationship exists where two variables appear to have a relationship, but that relationship is actually caused by a third variable.
- Controlling for other variables is important for understanding the true relationship.
Data Display
- Graphs: Used to present and visualize the distribution of data.
- Bar charts: Useful for displaying categorical data.
- Pie charts: Useful for displaying categorical data (parts of one whole).
- Histograms: Used for frequency distribution of quantitative data.
- Time series plots: Used to show how a variable changes over time.
- Dot plots: Useful for graphically representing data
- Stem plots: Display individual data points in a systematic way.
Variables
- Individuals: Objects or entities being observed. Can be people, animals or things.
- Variables: Characteristics of an individual. Can take various values or categories.
- Quantitative: Measured numerically. Examples: Height, weight, temperature.
- Categorical: Fits into categories. Examples: Eye color, gender.
- Categorical types: Nominal (unordered categories), ordinal (ranked categories)
How to Determine Variable Type
- Ask what is being measured of each individual.
- Is it a numerical value, or a descriptive category?
Measures of Center
- Mean: Average of all values in a data set.
- Median: Center point of a data set when ordered.
- Mode: Most frequent value in the data set.
Measures of Spread
- Range: Difference between the highest and lowest values.
- Interquartile Range (IQR): range between third and first quartile of a data set.
- Standard Deviation: Average distance between each data point and the mean.
- Variance: Sum of squares of deviations from the mean, divided by degrees of freedom.
- Semi-Interquartile Deviation: Half the difference between the third and first quartiles.
Box Plots
- Box plots visually display the five-number summary (Min , Q1, Median, Q3, Max) of a set of data.
- Helpful for identifying outliers.
Outliers
- Outlier: An observation that is substantially different from most of the other data points.
- Potential Issues
- How the outlier influences the calculated mean and standard deviation.
Choosing a Summary Statistic
- Use the mean and standard deviation for symmetrical distributions without outliers.
- Use the median for non-symmetrical distributions and those with outliers.
Hypothesis Testing
- Table showing the possible outcomes of a hypothesis test is included.
- Includes Type I and Type II errors for rejecting or accepting the null hypothesis.
Confidence Intervals
- Specific methods for calculating 96% and 70% confidence intervals (CI) using a given standard deviation and mean are included.
- Confidence Intervals (CI) give a range within which a true population value is estimated to lie with a specified confidence level.
Z-scores and Probabilities
- Explains z-scores transformation of normal distributions, and how to interpret percentiles.
- Includes z-score ranges for different grades (A, B, C, etc.)
- Explains how to interpret z-score values from tables of percentiles.
Probability
- A quantitative assessment of the likelihood of an uncertain event occurring.
- Always between 0 and 1 (inclusive), [0,1].
Sensitivity, Specificity, PPV, and NPV
- Sensitivity: Percentage of true positives (correct positive results).
- Specificity: Percentage of true negatives (correct negative results).
- Positive Predictive Value (PPV): Percentage of true positives among positive test results.
- Negative Predictive Value(NPV): Percentage of true negatives among negative test results.
- Important in evaluating the usefulness of tests (e.g. screening tests).
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.