PREP 4b Intro to Applied Stats - Defining the Data PDF
Document Details
Uploaded by WorldFamousZombie1045
UCL
Tags
Summary
This document provides an introduction to applied statistics, specifically focusing on defining data and variables in a mental health research context. It outlines the distinction between populations and samples, and explains different types of variables, such as exposure, outcome, independent and dependent variables. The document also touches upon the importance of specifying the target population during research studies and how to approach sampling for accurate representation.
Full Transcript
**[Core Principles in Mental Health Research ]** **[PREP: 4b Intro to Applied Stats -- Defining the data ]** [Learning Objectives:] - You will be able to describe the difference between populations and samples and comment on statistical inference - You will be able to comment on the diff...
**[Core Principles in Mental Health Research ]** **[PREP: 4b Intro to Applied Stats -- Defining the data ]** [Learning Objectives:] - You will be able to describe the difference between populations and samples and comment on statistical inference - You will be able to comment on the different types of data and variables that we can create using the data from our samples [Note on Terminology & Approach:] - We focus here on frequentist statistics - The main alternative approach is Bayesian statistics **Epidemiology & medical statistics** **Psychology** **Social science** --------------------------------------- ---------------------- ---------------------------------- Exposure variable Independent variable Explanatory / predictor variable Outcome variable Dependent variable Response variable [Defining Exposure & Outcome:] - The outcome variable is the variable that is often the focus of our attention, whose variation or occurrence we are seeking to investigate and understand - E.G. depression; eating disorders; psychosis; bipolar - We are often interested in identifying risk factors or exposures that may influence the occurrence or severity of the outcome - The purpose of a statistical analysis is often to quantify the magnitude of the association between one or more exposure variables and the outcome variable [Population & Samples:] - In research studies, we collect data on a sample from a much larger group called the population - The sample is of interest not in its own right but for what it tells us about the population - Statistics allows us to use the sample to make inferences about the population from which it was derived - Because of chance, different samples from the population will give different results, and this must be considered when using a sample to make inferences about the population - The concept of sampling variation is at the heart of frequentist statistics and will be explained in the interpreting statistics lecture of the core module [Specify the Target Population:] - In any research study, it is important to carefully and precisely specify the target population - Care should also be taken to ensure that the sample represents the target population - The researcher may take a random sample of university students to test her hypothesis - What are the potential problems with her approach? [Sampling From the Target Population:] - If students differ from other young people in any way that affects their experiences of loneliness or depression (exposure and outcome), the sample and the finding may not represent the population - The finding will not be generalizable and will apply only to the population of UK university students [Types of data 1:] - The raw data from a research study consist of observations made on individuals - The number of individuals is called the sample size - Any aspect of an individual that is measured - for example - their depressive symptoms, exposure to loneliness, age, gender or highest educational qualification is called a variable - A first step in choosing how best to display and analyse data is to classify the variables into their different types - This is important because the choice of statistical test to use depends on the nature of the outcome (i.e. how the outcome variable is classified) - The main division is between numerical (quantitative) variables, categorical (qualitative) variables and rates [Outcomes] [1. Numerical Variables: ] - A numerical variable is either continuous or discrete - A continuous measurement can take on any number within the possible / plausible range E.G. BMI (26.42, 28.35) - A discrete variable can only take on certain scores (whole numbers) such as the number of depressive episodes in 10 years (0, 2, 3, 4, 12) - Note: Often, variables that are technically discrete are described as continuous and continuous is often used to mean numerical [Categorical Variables:] - A categorical variable assigns people to one of two or more qualitatively distinct categories (E.G. 1, 2, 3) - A binary variable is categorical variable with only two categories E.G. clinical diagnoses (diagnosed with schizophrenia or not, 0 or 1) - An ordered categorical variable assigns people to ordered categories E.G. socioeconomic status: low / middle / high - Nominal categorical variables assign people to categories with no underlying order E.G. eye colour. [2. Rates:] - Rates of disease are measured in longitudinal studies and are the fundamental measure of the frequency of occurrence of events (such as illness or death) over time - For example, 30-year mortality rates among adults with depression - The rate of occurrence of psychosis in the Swedish population