Statistics: Lecture 1

StunningHedgehog avatar
StunningHedgehog
·
·
Download

Start Quiz

Study Flashcards

37 Questions

What is the purpose of creating a database in the context of an epidemiologic investigation?

To organize information in a structured manner for analysis and interpretation.

In the context of a database for epidemiologic investigation, what does each row represent?

An observation or record representing one person.

What is the role of the first column or variable in a database used for epidemiologic investigations?

Contains the person’s name, initials, or identification number.

In an epidemiologic investigation, what does a variable represent?

Any characteristic that differs from person to person.

What is the value of a variable in the context of an epidemiologic investigation?

The number or descriptor that applies to a particular person.

Why is it important to organize data in an organized manner for conducting an epidemiological study?

To ensure efficient management and analysis of information.

Which measure of central location is recommended when dealing with data that are not normally distributed?

Median

What is the main reason for not using the mean as a measure of central location for data that are severely skewed or have extreme values?

It is sensitive to outliers

In epidemiological data, which measure of central location is often preferred when the data tend not to be normally distributed?

Median

Which measure of spread represents the central portion of the distribution, from the 25th percentile to the 75th percentile?

Interquartile range (IQR)

What is the method for calculating the standard deviation?

Summing the squared differences and dividing by n–1

Which measure of spread divides the data in a distribution into 100 equal parts?

Percentiles

What is the value of the 1st quartile (Q1) for the given set of observations: 0,2,3,4,5,5,6,7,8,9,9,9,10,10,10,10,10,11,12,12,12,13,14,16,18,18,19,22,27?

$6.5$

Which measure of spread is generally used in conjunction with the median for characterizing the central location and spread of skewed distributions?

$Standard$ deviation (SD)

Which measure is calculated only when the data are more-or-less normally distributed?

$Standard$ deviation (SD)

"The mode and median tend not to be affected by outliers." True or False?

$True$

Which measure provides the central value among the options provided?

Median

In epidemiology, a nominal-scale variable is one whose values are:

Categories without any numerical ranking

An interval-scale variable is measured on a scale of equally spaced units, but without a true zero point. An example of an interval-scale variable is:

Date of birth

Which type of variable is considered a qualitative or categorical variable in epidemiology?

Nominal-scale variable

What type of variable is measured on a scale of equally spaced units with a true zero point?

Ratio-scale variable

Which measure of central location is the single, usually central value that best represents a distribution of data?

Mean

The median is the value that divides the data into two halves, with one half of the observations being smaller than the median value and the other half being larger. This is also known as the:

50th percentile

What type of distribution has a central location to the left and a tail off to the right?

Positively skewed distribution

Which property of frequency distribution refers to the distribution out from a central value?

'Spread'

What does the standard deviation describe in a set of data?

Variability in a set of data

What is the primary practical use of the standard error (se) of the mean?

Calculating confidence intervals around the mean

How is a 95% confidence interval for a mean calculated?

Mean minus 1.96 times standard error

Which measure is often used to summarize a distribution of data?

Standard deviation

What is a common way to indicate a measurement’s precision?

Providing a confidence interval

Why are confidence intervals often calculated for the mean and other measures?

To make generalizations about the larger population

What does a narrow confidence interval indicate?

High precision in measurements

Which measure represents the central value among the options provided?

Median

What measure is recommended when dealing with data that are not normally distributed?

Median

What does each row represent in the context of a database for epidemiologic investigation?

A new individual or subject

Which measure divides the data in a distribution into 100 equal parts?

Percentile

What does variability we might expect in the means of repeated samples refer to?

Standard error of the mean

Study Notes

Purpose of Database in Epidemiologic Investigation

  • Creating a database in epidemiologic investigation helps to organize and analyze data to identify patterns and relationships between variables.

Database Structure

  • Each row in the database represents a single case or observation.
  • The first column or variable is used to identify each case or observation.

Variables in Epidemiologic Investigation

  • A variable represents a characteristic or attribute of interest in an epidemiologic investigation.
  • The value of a variable is the specific measurement or observation of that characteristic.

Importance of Data Organization

  • Organizing data in a systematic manner is crucial for conducting an epidemiological study, as it enables researchers to identify patterns and relationships between variables.

Measures of Central Location

  • The median is recommended when dealing with data that are not normally distributed.
  • The mean is not suitable for data with extreme values or severe skewness, as it can be affected by outliers.
  • The median is often preferred when the data tend not to be normally distributed.

Measures of Spread

  • The interquartile range (IQR) represents the central portion of the distribution, from the 25th percentile to the 75th percentile.
  • The standard deviation is calculated using the formula √(Σ(xi - μ)^2 / (n - 1)), where xi is each data point, μ is the mean, and n is the sample size.
  • The percentile divides the data in a distribution into 100 equal parts.
  • The IQR is generally used in conjunction with the median for characterizing the central location and spread of skewed distributions.

Quartiles and Percentiles

  • The 1st quartile (Q1) is the value below which 25% of the data points fall.

Scales of Measurement

  • A nominal-scale variable is one whose values are categorical or qualitative.
  • An interval-scale variable is measured on a scale of equally spaced units, but without a true zero point. An example is temperature in Celsius.
  • A ratio-scale variable is measured on a scale of equally spaced units with a true zero point. An example is temperature in Kelvin.

Distribution Properties

  • A skewed distribution has a central location to the left and a tail off to the right.
  • The frequency distribution's property of symmetry refers to the distribution out from a central value.
  • The standard deviation describes the spread or dispersion of a set of data.

Confidence Intervals

  • The primary practical use of the standard error (se) of the mean is to calculate confidence intervals.
  • A 95% confidence interval for a mean is calculated using the formula: CI = x̄ ± (Z * (se)), where x̄ is the sample mean, Z is the Z-score corresponding to the desired confidence level, and se is the standard error of the mean.
  • Confidence intervals are often calculated for the mean and other measures to estimate the range of values within which the true population parameter is likely to lie.
  • A narrow confidence interval indicates a high degree of precision in the estimate.

Test your understanding of the concepts of mean and median in statistics. Learn about when to use the arithmetic mean and the implications of data distribution on choosing the appropriate measure.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free
Use Quizgecko on...
Browser
Browser