Podcast
Questions and Answers
Flashcards
Mean
Mean
The average of a data set calculated by summing all values and dividing by the number of individuals.
Measure of Center
Measure of Center
A single value that represents the typical or central value of a data set.
Why is the Mean a Good Summary for Heights?
Why is the Mean a Good Summary for Heights?
The mean is a good summary for heights because the distribution of women's heights tends to be relatively symmetrical and coherent, implying a typical center point.
Why is the Mean a Bad Summary Here?
Why is the Mean a Bad Summary Here?
Signup and view all the flashcards
Calculating Mean using Calculator
Calculating Mean using Calculator
Signup and view all the flashcards
Numerical Summaries Must be Meaningful
Numerical Summaries Must be Meaningful
Signup and view all the flashcards
Data Distribution?
Data Distribution?
Signup and view all the flashcards
Outliers
Outliers
Signup and view all the flashcards
Mean vs. Median (Symmetric)
Mean vs. Median (Symmetric)
Signup and view all the flashcards
Mean vs. Median (Skewed)
Mean vs. Median (Skewed)
Signup and view all the flashcards
Outliers and Mean
Outliers and Mean
Signup and view all the flashcards
Outliers and Median
Outliers and Median
Signup and view all the flashcards
First Quartile (Q1)
First Quartile (Q1)
Signup and view all the flashcards
Standard Deviation
Standard Deviation
Signup and view all the flashcards
Variance
Variance
Signup and view all the flashcards
Degrees of Freedom (df)
Degrees of Freedom (df)
Signup and view all the flashcards
Squared Deviations from the Mean
Squared Deviations from the Mean
Signup and view all the flashcards
Sum of Squared Deviations
Sum of Squared Deviations
Signup and view all the flashcards
What is standard deviation used for?
What is standard deviation used for?
Signup and view all the flashcards
How is standard deviation calculated?
How is standard deviation calculated?
Signup and view all the flashcards
Five-Number Summary
Five-Number Summary
Signup and view all the flashcards
Boxplot
Boxplot
Signup and view all the flashcards
Sum of Squares
Sum of Squares
Signup and view all the flashcards
Sample Variance
Sample Variance
Signup and view all the flashcards
Why (n-1) for Sample Variance?
Why (n-1) for Sample Variance?
Signup and view all the flashcards
Z-Score
Z-Score
Signup and view all the flashcards
Probability
Probability
Signup and view all the flashcards
Sensitivity
Sensitivity
Signup and view all the flashcards
Specificity
Specificity
Signup and view all the flashcards
Positive Predictive Value
Positive Predictive Value
Signup and view all the flashcards
Negative Predictive Value
Negative Predictive Value
Signup and view all the flashcards
Confidence Interval
Confidence Interval
Signup and view all the flashcards
Hypothesis Testing
Hypothesis Testing
Signup and view all the flashcards
Study Notes
Introduction to Statistics
- Statistics is a field of study dealing with the collection, analysis, interpretation, presentation, and organization of data.
- It is used to understand patterns, trends, and relationships within data, often used to make predictions or decisions.
Types of Data
- Categorical Data: Data that fits into categories or groups, like gender, color, or type.
- Quantitative Data: Data that can be measured numerically, like height, weight, or temperature.
Sampling
- Sampling is a process where a researcher selects one or more cases from a larger group (population) for study.
- Important for studying populations too large to collect data on every member.
- Crucial for generalizing findings to the entire population.
Sampling Methods
- Simple Random Sampling (SRS): every member of the population has an equal chance of being selected. Can be with or without replacement.
- Systematic Sampling: Every kth member of a population is selected.
- Stratified Random Sampling: The population is divided into subgroups (strata). Then random samples are drawn from each stratum.
- Cluster Sampling: A sampling method where the population is divided into groups (clusters). Then entire clusters are randomly selected.
- Convenience or Accidental Sampling: The researcher selects the most accessible individuals or cases.
Data Collection Methods
- Questionnaires: A structured set of questions used to collect data from individuals. Can be answered in person, by mail, phone, or online.
- Recording: Recording data collected through observation.
- Qualitative Methods: Methods used to find information through observation, watching, listening, or reading.
Sample Size
- Sample size is the number of individuals selected for observations.
- Precision (Acceptable amount of error)
- Population Homogeneity (Variability in pop.)
- Sampling Fraction (relative number of elements in sample to pop.)
Sampling Fraction Adjustment
- n' (adjusted sample size) = n (estimated sample size without adjustment) / [1+(n/N)]
- N: population size
Non-Probability Sampling
- Availability sampling: Uses readily available and accessible participants.
- Snowball sampling: Participants refer other participants to take part.
- Quota sampling: Samples are selected to match characteristics of the population across multiple subgroups.
- Purposive sampling: Selects participants based on their specific characteristic.
Spurious Relationships
- A spurious relationship exists where two variables appear to have a relationship, but that relationship is actually caused by a third variable.
- Controlling for other variables is important for understanding the true relationship.
Data Display
- Graphs: Used to present and visualize the distribution of data.
- Bar charts: Useful for displaying categorical data.
- Pie charts: Useful for displaying categorical data (parts of one whole).
- Histograms: Used for frequency distribution of quantitative data.
- Time series plots: Used to show how a variable changes over time.
- Dot plots: Useful for graphically representing data
- Stem plots: Display individual data points in a systematic way.
Variables
- Individuals: Objects or entities being observed. Can be people, animals or things.
- Variables: Characteristics of an individual. Can take various values or categories.
- Quantitative: Measured numerically. Examples: Height, weight, temperature.
- Categorical: Fits into categories. Examples: Eye color, gender.
- Categorical types: Nominal (unordered categories), ordinal (ranked categories)
How to Determine Variable Type
- Ask what is being measured of each individual.
- Is it a numerical value, or a descriptive category?
Measures of Center
- Mean: Average of all values in a data set.
- Median: Center point of a data set when ordered.
- Mode: Most frequent value in the data set.
Measures of Spread
- Range: Difference between the highest and lowest values.
- Interquartile Range (IQR): range between third and first quartile of a data set.
- Standard Deviation: Average distance between each data point and the mean.
- Variance: Sum of squares of deviations from the mean, divided by degrees of freedom.
- Semi-Interquartile Deviation: Half the difference between the third and first quartiles.
Box Plots
- Box plots visually display the five-number summary (Min , Q1, Median, Q3, Max) of a set of data.
- Helpful for identifying outliers.
Outliers
- Outlier: An observation that is substantially different from most of the other data points.
- Potential Issues
- How the outlier influences the calculated mean and standard deviation.
Choosing a Summary Statistic
- Use the mean and standard deviation for symmetrical distributions without outliers.
- Use the median for non-symmetrical distributions and those with outliers.
Hypothesis Testing
- Table showing the possible outcomes of a hypothesis test is included.
- Includes Type I and Type II errors for rejecting or accepting the null hypothesis.
Confidence Intervals
- Specific methods for calculating 96% and 70% confidence intervals (CI) using a given standard deviation and mean are included.
- Confidence Intervals (CI) give a range within which a true population value is estimated to lie with a specified confidence level.
Z-scores and Probabilities
- Explains z-scores transformation of normal distributions, and how to interpret percentiles.
- Includes z-score ranges for different grades (A, B, C, etc.)
- Explains how to interpret z-score values from tables of percentiles.
Probability
- A quantitative assessment of the likelihood of an uncertain event occurring.
- Always between 0 and 1 (inclusive), [0,1].
Sensitivity, Specificity, PPV, and NPV
- Sensitivity: Percentage of true positives (correct positive results).
- Specificity: Percentage of true negatives (correct negative results).
- Positive Predictive Value (PPV): Percentage of true positives among positive test results.
- Negative Predictive Value(NPV): Percentage of true negatives among negative test results.
- Important in evaluating the usefulness of tests (e.g. screening tests).
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.