Quantitative Research Methods and Data Analysis PDF
Document Details
Uploaded by TalentedRealism
Tags
Summary
This document provides an introduction to quantitative research methods and data analysis, focusing on key concepts, skills, and basic principles. It covers topics such as data literacy, variables, and sampling methods within a psychological context.
Full Transcript
**Quantitative research methods and data analysis.** **Introduction power point 1.** **[Data literacy in psychology. ]** The ability to read, understand, argue with, and make decisions based on data. **[Why it matters?]** - In psychology, data informs our understanding of behaviours, mental...
**Quantitative research methods and data analysis.** **Introduction power point 1.** **[Data literacy in psychology. ]** The ability to read, understand, argue with, and make decisions based on data. **[Why it matters?]** - In psychology, data informs our understanding of behaviours, mental processes, and treatment effectiveness. - Being data literate empowers you to critically assess research findings and their implications. - Even if you don't conduct research, you will frequently encounter research findings in your studies and career. **[Key skills. ]** - ***Reading data*** -- understanding graphs, tables, and statistical results. - ***Critical thinking*** - Evaluating the quality of research and identifying biases or limitations. - ***Decision making*** - Using data to make informed choices in - professional practice or personal life. - ***Communicating insights*** - Explaining data-driven conclusions clearly and effectively to others. **[Basic concepts.]** - ***Research question/hypothesis*** -- the statement or question that explains clearly what the researcher want to know about. - ***Literature review*** -- critical examination of existing research , concepts and theories related to the research topic. - ***Research design*** - how the researcher will carry out the research -- selection of sample of participants that are relevant for the research question/s and design of tool. - ***Data collection*** -- gathering data from the sample so that the research question/s can be answered. - ***Data analysis*** -- Management, analysis and interpretation of data - ***Writing up*** -- Reporting and disseminating the research findings **[Quantitative data:]** - Relates to numbers. - Evidence based practice and policy making. -- ***[for example]***, how can we know if a particular intervention e.g., CBT benefits clients with a particular condition e.g., phobias, ***[or]*** how can we know how long it would take for people with this condition to improve once they receive this intervention. **[Population and sample. ]** - ***[Population]*** refers to all the units from which the sample is to be selected. - ***[Sample]*** -- the part of the population that is selected for the research. It is a subset of the population. - ***[Subject]*** -- the basic unit from which the data is collected. - The number of subjects measured make up the sample, denoted by *n.* **How do we ensure that the sample represent the population.** - Ensuring the percentages are mirrored in our sample. - ***[Randomness]*** -- everyone in the sample has an equal chance to participate in our study -- allows us to generalize. **Sampling.** A diagram of a sample Description automatically generated ***[Non probability and probability. ]*** - ***[Non-probability -]*** judgment, convenience, quota and snowball. - ***[Probability]*** -- random, stratified, cluster, systematic. **[Types of external validity. ]** The extent to which you can generalize research findings to other groups and settings. 1. ***[Population validity. ]*** 2. ***[Ecological validity]*** -- we try as much as we can that the setting of the experiment represents the real world. ![A diagram of different types of data Description automatically generated](media/image2.png) ***[Variables ]*** - A variable is a property of an object that can take different values for example, different hair colours or age. - A specific characteristic that is observed or measured for a subject. - Can be an - *Individua*l -- e.g., attitude, preferred colour. - *Group* -- e.g., amount of donations received. - *Nation* -- e.g., GDP - Quantitative data consists of one or more variables, observed or measured numerically for a number of subjects. **[Types of variables. ]** 1. ***[Categorical]*** - Nominal, Ordinal. 2. ***[Numerical]*** -- Continuous, Discrete. **[Nominal]** -- (A label that we assign). - The numbers are only labels for possible values (Categories) of the variable, without magnitude (one is not low or higher than the other). - The categories cannot be rank ordered. - ***For example,*** gender, religion, ethnicity, marital status, university course, political party. - We can only compare if they are the same or different. **[Ordinal variables]**. - The categories can be ordered but the distances between the categories are not equal. - The order of the numbers is meaningful, but the magnitude is not. - It would make sense to say that the value is higher or lower than another. - ***[For example]***, Likert scales, educational level, attitude. **[Interval Variables ]** - Can be treated as real numbers. - Comparison of both order or magnitude. - Numerical variables where the difference between values is meaningful and consistent, but there is no true zero point. - The difference between values is consistent (e.g., the difference between 10 and 20 is the same as between 20 and 30) - A score of zero does not mean none of the attribute. (e.g., 0°C does not mean "no temperature"; An IQ score of zero does not exist) - Examples include age, weight, income, IQ **[Ratio Variables]** - Numerical variables that have equal intervals between values and a true zero point. - These variables measure quantities where zero means none of the attribute, allowing you to say that one value is twice as much as another. (A height of 0 cm means no height, and someone who weighs 60 kg weighs twice as much as someone who weighs 30 kg.) - Common in psychological research for measuring physical quantities or behaviours that can be completely absent. (A reaction time of 0 seconds means no time was taken, and 4 seconds is twice as long as 2 seconds.) A diagram of measurement Description automatically generated **[Continuous ]** - A variable that takes on any value within a range, and the number of possible values within that range is infinite. - They can be subdivided without limits. - E.g., age, income, distance. **[Discrete]** - A variable that takes on distinct, countable values. - They cannot be subdivided. - E.g., gender, class year, number of trainers at the gym. **[Operational definition of variables.]** An operational definition is something that gives meaning to a construct or variable by setting out the activities or operations that are necessary to measure it. **[In quantitative research we can]** - Describe single variables. - Make inferences about single variables. E.g., take a representative sample and based on the sample you can make inferences about the whole population. - Describe or make inference about associations between variables. For example, looking at stress management and university courses, two variables, looking at whether it makes a difference. - There are different techniques for different types of variables. **[Variable and Associations. ]** - Questions about one variable at a time -- What is the average age/income of 2^nd^ year psychology students. - Questions about associations -- e.g., do attitudes on drug use vary by age. - Two variables are associated if knowing the value of one (the explanatory variable) helps to predict the value of another (the response variable) - E.g. knowing attitudes towards political party (explanatory variable) predicts voting (response variable) ![A screenshot of a white background Description automatically generated](media/image4.png)**[Descriptive and inferential statistics]**. ***Descriptive statistics*** - Summarise information in the observed data -- e.g., how many respondents in our survey intend to adhere to waste disposal schedules? - Are procedures for summarising a group of scores or otherwise making them more understandable. - Psychologists use descriptive statistics to summarise and describe a group of numbers from a research study. - Involve a set of methods r techniques that we use to analyse data. - Methods that help us take a large group of numbers and make sense of them. - The main purpose is to take a big set of numbers and condense them into smaller more understandable form. - Instead of looking at each score, we summarise the overall picture. - They help us see patterns and trends to make it easier to interpret and understand. - Visual representations to help make data more understandable. - This therefore, helps researchers to understand and communicate their findings clearly. ***Inferential statistics*** - Generalising conclusions to a population on the basis of sample observations. - Are procedures for drawing conclusions based on the scores collected in research to make inferences about a large group of individuals based on a research study in which a much smaller number of individuals took part. **[Value]** - Can be a number or a category. - A score is a particular person's value on a variable. **[Sample distributions]** - They summarise the distribution of scores of a given variable. This will show the list of the values in the variable, together with the number of occurrences. - The sample of data scores is referred to a distribution. **[Bar charts and Histograms. ]** **[Measures of Central Tendency.]** -- these give an indication of the centre-point, the middle or medium ground. **[Mode]** (the most common value) - The score with the highest frequency -- which category has the highest frequency (Most popular) - Appropriate with any measurement level (Others are more restrictive and none apply to nominal-variables) ***Advantages*** - Shows the most important value. - Unaffected by extreme scores in one direction - Can be obtained when extreme values are unknown - Can be more informative than mean or median, depending on the shape of the distribution. ***Disadvantages*** - Does not take note the exact values of each item and can be less sensitive than the mean - Not useful for small sets of data where several values occur equally or for sets of data that only occurs once. **[Median]** (The middle value) - For observed values that are ordered (Ordinal/scale variables). ***Advantages*** - Easier to calculate than the mean - Unaffected by extreme values in one direction. - Better with skewed data. - Obtained when extreme values are unknown. ***Disadvantages*** - Does not note the exact values of each item. - Less sensitive than the mean - If values are few it can be unrepresentative. **[Mean]** (The average value) - Commonly used. - Mean represents the average value. - Defined as the sum of observations divided by the sample size. ***Advantages***. - It is the basis for many powerful statistics. - Most sensitive of the measures. - Takes a central position on an interval and continuous scale. ***Disadvantage*** - Its sensitivity can also be a disadvantage as it is susceptible to extreme values. **[Mean, Median, Mode.]** - The mean is the most commonly used, but it is not always appropriate. - The mean is sensitive to outliers -- extremely high or small observations. - For skewed distributions, useful to provide more than one measure of central tendency. ![A group of colorful rectangular cards with text Description automatically generated](media/image6.png) **[The normal distribution. ]** - A bell-shaped curve that is symmetrically around the mean. - This means that there is an equal number of subjects above or below the mean. - The shape of the curve also indicate the proportion of the subjects at each of the standard deviations, above or below the mean. - Often, they are not normal and so there are implications on the relationship between the mean, median, and mode. A diagram of normal distribution Description automatically generated **[Distribution characteristics.]** - The distribution of incomes has a scattering of scored that is much higher than others. - This distribution is said to be positively skewed or skewed to the right. - A skew refers to a tailed distribution that is marked by a certain number of scores that are higher or lower than normal. - It is said to be negatively skewed when the tail is to the left. - A distribution that is neither positively nor negatively skewed is symmetric, i.e. a normal distribution ![A diagram of normal distribution Description automatically generated](media/image8.png) **[Skewness]** - The measures of central tendency tend to spread when the normal distribution is distorted. - The horizontal push or pull distortion of a normal distribution curve gets captured by the skewness measure. - Skewness is a measure of the asymmetry of a distribution. A distribution is a asymmetrical **[Kurtosis]** Notices the peak of the curve and the tails of the curve. The vertical push/pull distortion gets captured by the kurtosis ***Platykurtic distribution*** -- flattened/lower than the normal distribution. ***Leptokurtic distribution*** -- higher than the normal distribution. **Lecture 3.** **[Variability.]** - Variability of a distribution can be considered as the amount of spread of the scores around the mean. - How close or far from the mean are the scores in a distribution. - Measures of central tendency can be misleading. - They can tell us what the middle-ground might be, but do not tell us anything about the variability from the middle-ground. ***Scenario:*** - We have collected data about the monthly income from 3 groups of 20 individuals each. - When we calculate the mean, we find that it is 6,000 for each group. - What more information do we need to identify whether the groups are relatively homogenous (the same) or whether there is a large income disparity? - Measures of variation are used to summarise this aspect of the distribution. ![](media/image10.png) ***Stress levels (Scale 1-10 -- Mean 7).*** ***Low Variability***: If most students report stress levels between 6 and 8, this means their stress scores are close to the mean. In this case, the distribution of stress scores has low variability, showing that most students feel similarly stressed. Most scores are concentrated near the mean, reflecting similar experiences. ***High Variability:*** If the stress levels range widely, from 1 to 10, some students are extremely stressed while others are barely stressed. This indicates high variability because the scores are far from the mean, with students experiencing a wide range of stress levels. Scores are spread out, with some individuals experiencing much more or much less of the measured psychological variable **[Measures of Dispersion ]** - Describe the characteristics of a distribution of how widely it is spread. - This is done with measures of dispersion or variability. Such measures are not normally applied to nominal-level data, as the idea of spread limited sense. - Dispersion is a measure of how much or how little the rest of the values tend to vary around the central tendency or central value in a set of numbers. - Some measures are available for ordinal data, and these can also be applied to interval and ratio data, while others are only suitable with the two latter types of data. 1. Categorical ordinal- level data -- interquartile range and percentiles. 2. Metric data -- range, mean deviation, standard deviation, and variance. **[Variation -- 3 methods:]** 1. Range. 2. Interquartile range. 3. Standard deviation. (Mainly used for scale variables, otherwise we use frequencies) **[Range.]** - The difference between the minimum (Lowest observed variable) and maximum (Highest observed variable) values in distribution. - Maximum and minimum values are the highest and lowest observed variables. - E.g. 93 highest marks; 22 lowest marks - Range = 93-22 = 61 marks. - The disadvantage of the range is that very extreme scores can affect it. ***Advantages.*** - Easy to calculate. - Includes extreme values. ***Disadvantages.*** - Can be distorted by extreme values - Relies on the two extreme values in the set of numbers and therefore unrepresentative of the distribution of values between extremes. **[Quartiles.]** - Similar to the median. - The only difference being that quartiles divide the distribution into 4 parts (Median divides it into 2). **[Interquartile range.]** - A more accurate description of where the majority of the scores lie. - Describes the range of scores withing the middle 50 per cent of all scores or respondents. - The lowest 25 per cent of scores or frequencies, and the highest 25 per cent, are excluded from consideration. - Therefore, eliminating the possible distorting effects of extreme scores at either end of the distribution - The Interquartile range is difference between Q3 (score at 75% and Q1 (score at 25%). - It is the middle 50% of scores. - E.g., Lowest score is 62, Q1 = 64, Q2 = 71; Q3 = 77, highest 81. - Interquartile range is 77-64 = 13. - 50% of students are spread between 13 marks ***Advantages.*** - Easy to calculate. - Representative of the central 50% of the values in the data set. - More robust to problems caused by others. ***Disadvantages*** - Relies on two values in the set of numbers. - Does not provide a measure of dispersion based on all values in the set of data. **[Standard deviation]** - Measures the dispersion of a dataset relative to its mean and is calculated as the square root of the variance. **[Deviation.]** - The deviation of a score is the difference between the score and the mean for the set of scores from which the scores is derived. - E.g., - Scores -- 22. 25. 30, 42, 87, 100. - Mean -- 51. - Calculate deviation: 22-51= -29 25-51=-26 30-51=-21 42-51=-9 87-51=+36 100-51=+49 THE MEAN IS EXACTLY CENTRAL IN A SET OF DATA AND THE NEGATIVE DEVIATION SCORES CANCEL OUT THE POSITIVE DEVIATION SCORES. **[Mean deviation:]** - To calculate the mean deviation, we need to get rid of the negative signs and examine the absolute values of our deviation scores. - Rarely used so we use standard deviation instead. ![](media/image12.png) A math equations with numbers and symbols Description automatically generated **[Variance]** - An alternative way to remove the negative sign from a set of deviation scores is to square the values (multiplying the values by themselves). ![](media/image14.png) - ***4 steps to calculate variance:*** 1. Subtract the mean from each score. This gives each score's deviation score, which is how far away the score is from the mean. 2. Square each of these deviation scores (multiply each by itself). This gives each score's squared deviation score. 3. Add up the squared deviation scores. This total is called the sum of squared deviations. 4. Divide the sum of squared deviations by the number of scores. This gives the average (the mean) of the squared deviations, called the variance. ![](media/image16.png)**[Variance of a population:]** N-1 refers to the degrees of freedom: ***Disadvantages of the Variance:*** - ***Variance***: describes the variability of the distribution from the mean. - It produces large values in squared units, since it requires squaring the deviations. - To "restore" the values to the correct units, we take the square root of the variance. This is the standard deviation (SD or s.d.) - ***Standard deviation***: the average amount of variation around the mean. **[Standard deviation:]** - The standard deviation makes use of deviation from the mean. - ***Consider this set of scores*** 1, 2, 2, 3, 3, 3, 4, 4, 5. - Mean = 3. - 2 scores deviate from the mean by a value of 2 (1 & 5). - 4 scores deviate from the mean by a value of 1 (2, 2, 4, 4,). - 3 scores do not deviate from the mean - ***Now consider this set of score*** - 2, 2, 3, 3, 3, 3, 3, 4, 4. - 2 scores deviate from the mean by +1 (4, 4). - 2 scores deviate by -1 (2, 2). - i.e. 4 scores deviate from the mean by a value of 1; as compared to 6 scores deviating from the mean in the previous example. - Variation of scores and deviance from the mean is higher in the former example when compared to the latter - ***A higher standard deviation*** -- data is more dispersed. - ***A lower standard deviation*** -- data is more grouped together. ![](media/image18.png) ***Advantages:*** - It forms the basis of many more complicated statistics. - It considers of all the values in the data set. - It is the most sensitive of the measures of dispersion. ***Disadvantages:*** - It is a bit complicated to calculate manually, but very easy with statistical software. **[To summarise:]** - The ***variance (SD^2^)*** is the average of the squared deviation of each score from the mean. It shows us how spread out the scores are (that is, their variability), while the mean only indicates the central tendency of the distribution. - The ***standard deviation (SD)*** is the square root of the average of the squared deviations from the mean. It is the square root of the variance. The standard deviation indicates approximately the average amount that scores differ from the mean. **Lecture 4.** **[What are Inferential Statistics?]** - Procedures for drawing conclusions based on the scores collected in a research study but going in more detail. - Allows researchers to make inferences about a large group of individuals based on a research study in which a smaller number of individuals took part. - Psychologists conduct research to test a theoretical principle or the effectiveness of a practical procedure. - *For example,* a social psychologist might examine the effectiveness of a media literacy programme intended to reduce exposure to online risk. The study is carried out with a particular group of research participants. But researchers use inferential statistics to make more general conclusions about the theoretical principle or procedure being studied. - These conclusions go beyond the particular group of research participants studied. - The population is that group of people that has all the data we are interested in about a topic. - However, we very rarely have access to the whole population we are investigating. - The solution is to recruit a sample. - Inferential statistics are techniques that allow us to use these samples to make hypotheses testing and generalizations about the populations from which the samples were drawn. **[Sampling.]** - Main interest is in learning about populations. - The sample is a sub-set of the population that sufficiently represents the wider population at large. - Rarely impossible to investigate the entire population (only exception is census). - We require good samples to ensure that we are making valid inferences. **[Parameters vs. Statistics. ]** - ***Population*** - total set of observations that can be made. - ***Parameter*** - a value that describes a characteristic of an entire population - Parameter values exist, but they are nearly always unknowable. They cannot be measured. - ***A statistic*** - a characteristic of a sample. - Inferential statistics allow you to use sample statistics to make conclusions about a population. - To draw valid conclusions, you must use adequate sampling techniques. - It is a question of guessing, with an acceptable margin of error. - We are using what we know to say something about what we do not know (e.g. using information about 377 Sliema residents to make claims about population of 19,655 \[2021 -- Census\]) - The methods by which we use statistics (calculated number for a sample) to make inferences about population parameters (a computation made for a population). ![](media/image20.png) **[Unbiased Samples.]** **[Randomness:]** - For a true random sampling, the population should have an equal probability of being selected. - Therefore, there is no specific group/ subset of the population that is favoured/excluded more than the other **[Representativeness.]** - The sample should reflect the diversity and characteristics of the population. - Better to have a smaller and representative sample. **[Populations and samples. ]** - Generally, a sample is chosen from a given population, calculate statistics and used to estimate parameters. - E.g. Sliema population of 19,655; sample size 377. - An infinite number of samples can be drawn, and these may be totally different from each other. - E.g. If we have selected 377 residents for our first sample, none of these may be selected again if we were to have a second sample, considering that 19,278 residents were not part of the first sample. **[Probability Sampling. ]** - Samples are chosen using probability methods to ensure that every subject has a chance to be chosen. - Therefore, each subject should have a non-zero chance of being selected. - Inferences should not be based on non-probability samples as these are not reliable. - The quality of inferences we can make depends to a large extent on how well our sample represents the wider population. - The simplest form of probability sampling is Simple Random Sampling. - Random Sampling adopts the principle of Equal Probability in Selection - every person has n/N chance of being selected. - Other probability sampling strategies include: - ***Stratified sampling*** (researchers divide subjects into subgroups called strata based on characteristics that they share like race), - ***Cluster sampling*** (you divide a population into clusters, such as districts or schools, and then randomly select some of these clusters as your sample). - ***Systematic Sampling*** (researchers select members of the population at a regular interval -- for example, by selecting every 15th person on a list of the population). **[Probability and Non-probability Sampling.]** - ***Probability sampling*** is a sample that are selected using a random selection so that each unit known that they are in a chance to be chosen. - ***Sampling error*** refers to the difference between a sample and the population from which it was selected. - ***Non-probability sample*** is a sample that has not been selected using a random selection method so there are units in a chance to be selected more than others. **[Random Sampling.]** - Not easy to adopt. - There are specific requirements (example, knowledge of population size) and have a sampling frame for recruiting participants that gives everyone equal chance of selection such as a database (example, all REGISTERED university students). - We need a procedure for randomness. (example, drawing out of a hat). **[IMPORTANT!]** - Variables studied are stochastic, i.e. they involve a degree of chance or uncertainty, they vary in a population and will be, assumingly, normally distributed - some will have high scores, others will have low scores, most will vary around the average (e.g. IQ). - Probability sampling needs to be done in the right way, otherwise you might end up over-sampling extremes. - Different samples will have different distributions and different variances i.e. different samples will produce different statistics on which to make inferences about the population. - Thus, some samples are more accurate for making inferences than others, i.e. they have less variance than others from the true population mean. **[Sampling distributions:]** ![](media/image22.png) **[Difference between SAMPLE distribution and SAMPLING distribution. ]** ***SAMPLE distribution*** -- takes a singular sample from a population and interpreting data. ***SAMPLING distribution*** -- distribution of a statistic made from multiple simple random samples drawn from a specific population. **[Sampling Distribution. ]** - Is a frequency distribution of different sample means. - Every score in the sampling distribution is itself a mean of a sample distribution. - The unit of analysis in the sampling distribution is each single sample (at any one point), each score is a sample mean, i.e. the variable is a statistic; height of the curve is the frequency the statistic will likely come up over an infinite number of samples. - What we want to find out is how far from our mean is from the true population mean i.e. the standard deviation of the sampling distribution. - The Sampling Distribution represents the distribution of sample means taken from the same population. - The true mean will have the highest frequency, and most sample means will be quite close to the true mean. - Sampling distributions help us understand how sample statistics vary from sample to sample and enable us to make statistical inferences about a population based on sample data. - The Normal Distribution approximates real world variables. **[Draw Multiple Random Samples from a Population]**: 1. Calculate the mean of each sample. 2. The mean will be different each time, but most values will vary around the true population mean, others will be higher, others lower, and some will be extremely high and others extremely low. 3. The distribution of sample means is a hypothetical collection of sample means for all the possible random samples of a particular size (n) that can be obtained from a population. In other words, what we want to do is look at all of the possible samples of the same size and make predictions based on their properties. **[Normal distribution. ]** - When we carry out research, we often compare the actual distributions of the variables in the research study to the normal curve. - It is not expected that the distributions of the variables in a study match the normal curve perfectly. The normal curve is a theoretical distribution. However, we often check whether the variables approximately follow a normal curve. **[Example -- Memory test. ]** - We are interested in the number of different letters a particular person can remember accurately on various days (with different random letters each time). - We decide to give them a test daily for a whole year. - The person has the ability to recall, say, seven letters in this kind of memory task. On some testings the number of letters remembered may be high, on others low, and on most somewhere in between. - In general, the person remembers a middle amount, an amount in which all the opposing influences cancel each other out. Very high or very low scores are much less common. - A unimodal distribution with most of the scores near the middle and fewer at the extremes. - A symmetrical distribution, as the number of letters recalled is as likely to be above as below the middle. **[Characteristics of the Normal Distribution. ]** - ***Unimodal*** - most of the scores are near the middle and there are fewer at the extremes. - ***Roughly symmetrical*** -- the number of values below the mean is likely to be identical as the number of values above the mean (50% above and 50% below). The curve is centred at its mean (average), which is also the median and mode of the distribution. The mean, median, and mode all coincide at the highest point of the curve. - ***Bell-shaped*** -- not too flat, neither too pointed. - ***Mean and Standard Deviation***: The two parameters that define a normal distribution are the mean (μ) and the standard deviation (σ). The mean represents the centre of the distribution, while the standard deviation determines the spread or dispersion of the data. A larger standard deviation results in a wider and flatter curve, while a smaller standard deviation results in a narrower and taller curve. - ***The Empirical Rule:*** - Approximately 68% of the data falls within one standard deviation of the mean. - About 95% of the data falls within two standard deviations of the mean. - Nearly 99.7% of the data falls within three standard deviations of the mean. **[Example - IQ test.]** - On many widely used intelligence tests, the mean IQ is 100, the standard deviation is 15, and the distribution of IQs is roughly a normal curve. - Knowing about the normal curve and the percentage of scores between the mean and 1 standard deviation above the mean tells you that about 34% of people have IQs between 100, the mean IQ, and 115, the IQ score that is 1 standard deviation above the mean. - Similarly, because the normal curve is symmetrical, about 34% of people have IQs between 100 and 85 (the score that is 1 standard deviation below the mean), and 68% (34% + 34%) have IQs between 85 and 115. - There are many fewer scores between 1 and 2 standard deviations from the mean than there are between the mean and 1 standard deviation from the mean. - About 14% of the scores in a normal curve are between 1 and 2 standard deviations above the mean. Similarly, about 14% of the scores are between 1 and 2 standard deviations below the mean. - Thus, about 14% of people have IQs between 115 (1 standard deviation above the mean) and 130 (2 standard deviations above the mean). - We are told that a person scored in the top 2% on a test. - What can we deduct about their standard deviation? - It is possible to figure out a person's number of standard deviations from the mean from a percentage. - Assuming that scores on the test are approximately normally distributed, the person must have a score that is at least 2 standard deviations above the mean. - Why? - 50% of the scores are above the mean, but 34% are between the mean and 1 standard deviation above the mean, and another 14% are between 1 and 2 standard deviations above the mean. - That leaves 2% of scores (that is, 50% - 34% - 14% = 2%) that are 2 standard deviations or more above the mean ![A diagram of a normal distribution Description automatically generated](media/image24.png) **[Central Limit Theorem]** - A unimodal symmetrical curve does not guarantee that it will be a normal curve; it could be too flat or too pointed. - However, it can be shown mathematically that in the long run, if the influences are truly random, and the number of different influences being combined is large, a precise normal curve will result. - The Central Limit Theorem states that, as the sample size increases, the sampling distribution of the mean approaches a normal distribution, regardless of the shape of the population distribution, provided that the samples are sufficiently large. **Lecture 5.** **[Z-scores. ]** **[What are Z-scores. ]** - They describe a particular score in terms of where it fits into the overall group of scores. - For every variable, the population mean will have different values. - The distribution is standardised by converting real scores into z scores. - Using the mean and the standard deviation a Z score can be created to describe the score in terms of how much it is above or below the average. - The true population mean is converted to 0, and other values converted into standard deviation units. - Z-scores refers to the number of standard deviations that a score is above (or below, if it is negative) the mean of its distribution. - It makes use of the mean and the standard deviation to describe a particular score. - An ordinary score is transformed into a z-score so that it better describes the score's location in a distribution. **[Relative Standing.]** Example - To what extent are you a morning person on a 7-point scale? (1 = not at all, and 7 = extremely). - Jerome responds with a 5. - How can we know if Jerome is more or less of a morning person in relation to other students? - Without knowing any information about how the other students answered the question, it is difficult to say. - Which information about the other students could help us? - Now suppose that we know for students in general, the mean rating (M) is 3.40 and the standard deviation (SD) is 1.47. - With this information, we can see that Jerome is more of a morning person than is typical among students. - Jerome is also above the average (1.60 units more than average; that is, 5 - 3.40 = 1.60) by a bit more than what students typically vary from the average (1.47, the standard deviation) A diagram of a graph Description automatically generated ![A graph of normal distribution Description automatically generated](media/image26.png) A line with numbers and letters Description automatically generated ![A white text on a white background Description automatically generated](media/image28.png) **[Raw scores vs Z-scores. ]** - Raw scores are ordinary scores (or numbers in a distribution before they have been made into a Z score or otherwise transformed) (E.g. 5, 142cm, 10 seconds, etc). - With raw scores, we cannot determine whether a particular deviation from the mean can be considered as large or small. - A z-score is a location on the distribution. - By transforming a raw score into a z-score and envisioning the z-distribution, we can identify the location of any raw score in any distribution. - A z-score also communicates the raw score's distance from the mean. - A z-score describes a raw score's location in terms of how far above or below the mean it is when measured in standard deviations. - The farther a raw score is from the mean, the larger its corresponding z-score. - On a normal distribution, the larger the z-score, whether positive or negative, the less frequently that z-score and the corresponding raw score occur. - Low z-scores do not always imply something negative. - E.g. race time, number of times that subjects report feeling anxious over a month. **[Why do we need z-scores?]** - ***To describe the relative standing of scores.*** - A z-score helps us understand where a specific score falls compared to the rest of the group. - It also contains information on how far away a score from the average (in standard deviations) is. - Gives context to individual scores. - For example, if two people took the same test and one scored a 70 and another scored an 85, the z-score would show how far from the average each score is, allowing us to see whether one score is exceptional or just slightly better. - ***Comparing scores from different distributions.*** - Z-scores allow the comparison of scores from different sets of data, even if the data sets have different scales or units. - Imagine comparing test scores from two different subjects---Research Methods and Social Psychology. - A score of 90 in Research Methods might not mean the same thing as a score of 90 in Social Psychology if the averages and spreads are different. - When these scores are converted to z-scores, we can make a fair comparison by seeing how far each score is from its respective mean, on a standardised scale. - ***Comparing the relative frequency of scores in any distribution.*** - Z-scores can also be used to determine how frequently scores occur in a distribution. - In a normal distribution, a z-score of 0 means a score is at the mean, and about 68% of all scores fall within one standard deviation (z scores between -1 and +1). - This can help us understand how unusual or common a particular score is. - For instance, a z-score of +2 might indicate a score that is higher than 97.5% of the scores in the distribution, which makes it quite rare. - ***Describing and interpreting sample means.*** - When working with sample means (e.g., the average test score for a group), z-scores allow comparisons between the sample mean and the population mean. - Once a z-score for the sample mean is calculated, it can be used to determine whether the sample is typical or unusual compared to the overall population. - This is particularly useful in inferential statistics, where the aim is to make judgments about a population based on sample data. **[Examples:]** A developmental psychologist observed 3-year-old Jacob in a laboratory situation playing with other children of the same age. During the observation, the psychologist counted the number of times Jacob spoke to the other children. The result, over several observations, is that Jacob spoke to other children about 8 times per hour of play. ![A graph of a number Description automatically generated with medium confidence](media/image30.png) A math problem with numbers Description automatically generated Ryan would clearly be unusually talkative, with a Z score of +2. Ryan speaks not merely more than the average but more by twice as much as children tend to vary from the average. **[Formula to change a Raw Score to a Z-Score. ]** - A Z-score is the number of standard deviations by which the raw score is above or below the mean. - To figure a Z-score, subtract the mean from the raw score, giving the deviation score. - Then divide the deviation score by the standard deviation. - ![](media/image32.png)The formula is: A math problem with numbers and lines Description automatically generated - To change the z-score to a raw score the process is reversed. - Multiply the Z-score by the standard deviation and then add the mean. - ![](media/image34.png)The formula is: - Suppose Brian has a Z score of 1.5 on the number of times spoken with another child during an hour. Calculate the raw score for Brian. **[The mean and standard deviation of Z-scores. ]** - The mean of any distribution of Z scores is always 0. When each raw score is changed into a Z score, the mean is subtracted out of all the raw scores, making the overall mean come out to 0. In other words, in any distribution, the sum of the positive Z scores must always equal the sum of the negative Z scores. Thus, when you add them all up, you get 0. - The standard deviation of any distribution of Z scores is always 1. This is because when you change each raw score to a Z score, you divide the score by one standard deviation. - It is important to note that the shape of a distribution is not changed when raw scores are converted to Z scores. So, for example, if a distribution of raw scores is positively skewed, the distribution of Z scores will also be positively skewed. **[The Normal Curve Table and Z-scores.]** - The normal curve is a precise mathematical curve. Thus, it is possible to figure out the exact percentage of scores between any two points on the normal curve (not just those that happen to be right at 1 or 2 standard deviations from the mean). - Statisticians have worked out tables for the normal curve that give the percentage of scores between the mean (a Z score of 0) and any other Z score (as well as the percentage of scores in the tail for any Z score). **[Z-scores and Probability. ]** - Because the normal distribution has mathematical properties, knowing one's z-score also means that you can know what the probability of that score having is in the general population. - Use a z-score table to find the p(lower) and p(higher) for a score. - p(lower) + p(higher) = 1. - We are basically dividing the distribution from that point before and after. - The probability of getting a lower score than that is on the left and the probability of getting a higher score is on the right. ![A math problem solving Description automatically generated with medium confidence](media/image36.png) **[Use of Z-scores in Psychology. ]** 1. ***Standardized Testing and IQ Scores.*** - In psychology, standardized tests (such as IQ tests or personality assessments) are used to measure an individual\'s cognitive abilities or traits. These tests often use z-scores to compare an individual\'s score to a normative sample. - An IQ score of 100 is the average, with a standard deviation of 15. A z-score can help psychologists interpret how far above or below the average a particular score is. For instance, a z-score of +2 would indicate that a person scored higher than most people (about 98% of the population). 2. ***Comparing Psychological Traits Across Tests.*** - Psychologists often want to compare traits measured by different tests, such as comparing levels of anxiety and depression in the same individuals. Since these tests may have different scales, z-scores allow for standardized comparisons. - By converting raw scores from anxiety and depression scales into z-scores, a psychologist can more easily see how a person scores in relation to the average for each trait and make appropriate clinical judgments. 3. ***Interpreting Survey Data***. - In psychological research, z-scores can be used to interpret responses from surveys. - In a survey measuring levels of stress, a psychologist might want to know how each respondent\'s stress level compares to the overall sample. By calculating the z-scores for each individual\'s responses, the psychologist can identify who is experiencing significantly higher or lower stress levels compared to the rest of the population. 4. ***Clinical Diagnosis.*** - In clinical psychology, z-scores help in diagnosing mental health disorders. - When using a depression inventory or anxiety scale, the z-score of a patient\'s result compared to a normative group can indicate how severe their symptoms are. A high positive z-score might suggest that the individual is experiencing more severe symptoms compared to the average person, guiding the clinician in making diagnostic decisions and determining the need for intervention. 5. ***Analysing treatment effectiveness.*** - Z-scores are often used to measure in psychological symptoms before and after treatment. - In cognitive behavioural therapy, z-scores show how the level of anxiety decreased relative to the population after the treatment. - A z-score that moves from positive (high anxiety) to near zero or negative (normal to low anxiety) could indicate significant improvement. 6. ***Analysing Experimental Data.*** - In psychological experiments, researchers often use z-scores to assess whether participants' responses in different conditions are significantly different from the mean response. - In reaction time experiments, z-scores can standardize the data to show whether a participant's reaction to a particular stimulus is faster or slower than the average across all participants. This is useful in experiments involving cognitive processes, such as attention or memory studies. 7. ***Identifying Outliers in Research***. - Z-scores can be used to detect outliers in psychological research data. - If a participant has an unusually high or low score (such as extreme aggression on a behavioural checklist), their z-score might fall far outside the typical range (e.g., a z-score of +3 or -3). - Identifying these outliers can help researchers decide whether the data point should be excluded or further investigated for potential errors or unique conditions. 8. ***Comparing Group Differences in Studies.*** - In psychological research, z-scores can be used to compare different groups. - In a study of cognitive performance among different age groups, z-scores can help standardise the scores from different groups (e.g., young adults vs. older adults), allowing researchers to compare how far each group's performance deviates from the mean for the entire population. **Lecture 6 -- Standard Error.** Standard error is a good sample that can accurately and efficiently assemble the model of the population. Error or bias is inevitable. ![A black background with text and charts Description automatically generated](media/image38.png) ![](media/image40.png) A questionnaire with black text Description automatically generated **[Sampling distributions]** - The sample mean is a variable, varying from sample to sample. - For random samples it varies around the population mean (sometimes low or high). - The central limit theorem -- the mean of the sampling distribution is the population mean. - The mean of the sampling distribution - ![](media/image42.png) - If we keep taking samples of size n, eventually the mean of sample means will equal the population mean -. - If we take enough samples, the resulting sampling distribution will become more and more normally distributed. - This will happen irrespective of the shape of the true population distribution. - This true population could be skewed or demonstrate kurtosis, but the sampling distribution is different and separate and with increased trials it approximates normal distribution. - Meaning that with increased trials, we will be right 'on average'. - The sampling distribution is a hypothetical distribution. - We usually only take one sample. **[Sampling error. ]** - We usually only take one sample, and it is unlikely that our mean will equal the true population mean. - Sampling error refers to the difference between the sample mean and the true population mean. - This is inevitable and cannot be eliminated because we only observe a single sample. - We can know how much it is -- it gives an indication of the variance of our obtained mean from what the true population mean might be. - To measure the sampling error, we rely on the sampling distribution. - ![](media/image44.png)We need the standard deviation of the sampling distribution of sample means (the standard error) - **[Errors in Probability Sampling.]** - The solution to use probability sampling methods can reduce sampling error and allow us to calculate how much a sample varies, by chance, from the population. - Randomized samples will still have some degree of sampling error because a sample is only an approximation of the population from which it is drawn. - Sampling errors can be reduced by increasing the sample size. **[Calculating Sampling error.]** - Use the population (sampling distribution) standard deviation (or standard error) divided by the square root of the sample size: ![A black and white math equation Description automatically generated with medium confidence](media/image46.png) - As sample size increases, standard error decreases. - The denominator in the formula (square root of n) increases as the sample size increases. - The sampling distribution is hypothetical. - Thus, we never know the standard deviation of the sampling distribution, so we use the standard deviation of our sample instead and accept that our standard error is in itself an estimate. - We also use confidence intervals as a measure of confidence rather than the standard error on its own. **[Basics of the standard error.]** - The Standard Deviation tells us something about how well the mean represents the sample data. - We don't have access to the whole populations, we use samples. - Research in which there is a known population mean and standard deviation is quite rare in psychology. - All samples will differ somewhat, we usually have a distribution of sample means. - How well does a particular sample represent the population? -- Standard Error **[Standard error.]** - It is the estimate of sampling error that is determined from the standard deviation of the distribution of sample means. - It is the Standard Deviation of sample means. - This represents the sampling error present in our samples. - It tells us how much we expect the sample to differ from the population. - The Standard Deviation calculated for sample means will tell you how much variability there is between sample means. This is called the Standard Error of the Mean. - We cannot collect hundreds of samples. - The Central Limit Theorem allows us to approximate the standard error (with larger samples). - The Standard Error quantifies uncertainty in the estimation of the mean. - The standard error of the mean, or simply standard error, indicates how different the population mean is likely to be from a sample mean. - It tells you how much the sample mean would vary if you were to repeat a study using new samples from within a single population. - Standard error and standard deviation are both measures of variability. The standard deviation reflects variability within a sample, while the standard error estimates the variability across samples of a population (sampling distribution). - A large standard error means that there is a lot of variability between the means of different samples: our sample may not be representative of the population. A diagram of a sampling error Description automatically generated **[Confidence Intervals.]** - Standard error helps in computing confidence intervals, but we do not know whether the sample mean is representative of the true population mean. - This means that we don't know how confident we can be in our findings. - Confidence intervals gives us some confidence by computing a range of plausible means, given the data. - Not a single estimate but there is a range with different confidence levels. - Confidence intervals (CI) is the range of scores, between and upper and lower value, includes the true population mean, and the range of the possible population means from which it is not highly unlikely (likely) that you could have obtained your sample mean. - The confidence limit is the upper or lower value of a confidence interval. - We are estimating the population mean of a variable based on a sample with a mean, it is subject to sampling error (the deviation from the true population mean in a sample distribution. - To work out confidence intervals, the standard error needs to be known as it works for a normal distribution. - A range of values so defined that there is a specified probability that the value of a parameter lies within it. - We are calculating the boundaries within which we believe the true value of the mean will fall. - Such boundaries are called Confidence Intervals. - To do this, we use our sample value as a midpoint and then set an upper and lower limit around it. ![](media/image48.png) **[Calculating Confidence Intervals. ]** - Confidence Intervals must be constructed to tell us something useful. E.g., the likelihood that they contain the true value of X. - 95% and 99% Confidence Intervals are the most commonly used. - Normally, you would want to be more than 68% confident about your estimates. - Thus, when figuring confidence intervals, psychologists use 95% or even 99% confidence intervals. - 95% CI: If we collected 100 samples, calculated the mean and the confidence intervals, it means that for 95 of the samples the confidence intervals, would contain the true value of the mean. - The cut off points of z-scores. **[Confidence Intervals for a mean. ]** - For a normal distribution, there is a probability of 0.95 that a randomly selected value from the distribution is within 1.96 standard deviations of the mean - or a probability of 0.68 that it will fall within 1 standard deviations of the mean. - or a probability of 0.997 that it will fall within 3 standard deviations of the mean. - If we draw a single sample at random from the population distribution of our variable, and calculate its mean, the probability is 0.95 that it will be within -1.96 and +1.96 standard deviations from the mean. - Values within this interval are more likely to occur than ones that are outside this interval. - The interval is thus a sensible summary of the most likely values of the mean. **[Confidence Interval. ]** - The range of values that you expect your estimate to fall between a certain percentage of the time if you run your experiment repeatedly or re-sample the population in the same way. - Confidence is another way to describe probability. - For example, if you construct a confidence interval with a 95% confidence level, you are confident that 95 out of 100 times the estimate will fall between the upper and lower values specified by the confidence interval. ![](media/image50.png) A math equations and formulas Description automatically generated with medium confidence ![A math equations and numbers Description automatically generated with medium confidence](media/image52.png) **[For example, ]** - We can therefore say that we found that on average students consume 2.8 units of coffee a day, and we are 95% confident that their average consumption varies between 2.7 units and 2.9 units a day. - If we obtained many samples (of the same size) from the same population and calculated the confidence interval for each, approximately 95% of these intervals would contain the true population mean. - In simpler terms, we are 95% confident that average consumption varies between 2.7 and 2.9 units a day. - The probability level of 0.95 of the normal distribution that we used (i.e. 1.96) is the confidence level of the interval. - Width reflects our uncertainty, the more certain we choose to be, the wider the interval; the less confident we choose to be, the narrower the interval. - Also, the bigger the sample size, the smaller the standard error, the narrower the interval; increasing sample size thus reduces uncertainty. - The larger the standard deviation in our sample, the wider the interval (this is a characteristic of our sample that we can do very little about as it is based on the population characteristics and the extent of variability of our object of study in a given population). **[Using Confidence Intervals. ]** - Confidence intervals are useful for communicating the variation around a point estimate. - Proportions. - Population means. - Differences between population means or proportions. - Estimates of variation among groups **[Caution with Confidence Intervals. ]** - Confidence Intervals only refer to what range of values you can expect to find if you re-do your sampling or run your experiment again in the exact same way. - They cannot be used to refer to the true value of the statistical estimate because confidence intervals are based on a sample, not on the whole population. - Accurate sampling plans, realistic experiments, increase the chance that confidence interval includes the true value of the estimate. **Lecture 7 -- Hypothesis Testing & *p --* value.** **[What is a hypothesis?]** - Hypotheses are claim about population parameters. - A hypothesis is prediction intended to be tested in a research study. - The prediction may be based on: 1. informal observation (as in clinical or applied settings regarding a possible practical innovation). 2. Related results of previous studies. 3. A broader theory about what is being studied. - A theory is a set of principles that attempt to explain an important psychological process. - A theory usually leads to various specific hypotheses that can be tested in research studies. A penguin with a ball Description automatically generated **[The Problem:]** - Because of sampling variation, the hypothesis may not be true in a sample even if it is true in the population. - Statistical testing provides explicit rules for deciding when to conclude that a particular sample value is or is not evidence against a hypothesis. - Significance testing uses sampling distributions to assess the strength of evidence against hypotheses. **[Hypothesis Testing. ]** - Hypothesis testing is a statistical method used to make inferences about a population based on a sample of data. - Hypothesis testing is a systematic procedure for deciding whether the results of a research study, which examines a sample, support a hypothesis which applies to a population. **[Assumptions of the Significance Test]** - Any significance test makes assumptions, of which we need to be aware if we are to make a correct choice. - Random Sampling. - Normality of Data. - No outliers - Hypotheses define the questions that can be answered; assumptions indicate which test is appropriate for which data. - The choice of significance test depends on the nature of the data and the specific assumptions of the test. - Parametric tests make specific assumptions about the population distribution, while nonparametric tests are more robust but might be less powerful in certain situations. **[Terminology. ]** - ***Research Hypothesis (H~1~)*** -- a statement in hypothesis testing about the predicted relations between populations (often the difference between population means). (AKA Scientific Hypothesis / Alternative Hypothesis). - ***Null hypothesis (H~0~)*** -- a statement about a relation between populations that is the opposite of the research hypothesis; statement that in the population there is no difference (or a difference opposite to that predicted) between populations; contrived statement set up to examine whether it can be rejected as part of hypothesis testing. **[Social Learning Theory Hypotheses]** - ***Null Hypothesis (H₀):*** There is no difference in aggressive behaviour between children who observe an adult behaving aggressively toward a toy and those who do not. (In other words, we would be saying that the observation of aggressive behaviour does not influence children\'s own aggressive behaviour). - ***Alternative Hypothesis (H₁):*** Children who observe an adult behaving aggressively toward a toy will exhibit higher levels of aggressive behaviour compared to children who do not observe such behaviour. - In hypothesis testing, researchers would aim to collect data to see if they can reject the null hypothesis in favour of the alternative hypothesis, supporting the idea that observed aggressive behaviour leads to imitation in children. **[The Null Hypothesis. ]** - The research hypothesis and the null hypothesis are complete opposites: if one is true, the other cannot be. - This is why the research hypothesis is often called the alternative hypothesis---that is, it is the alternative to the null hypothesis. - Researchers care most about the research hypothesis. - When doing the steps of hypothesis testing, this roundabout method of seeing whether or not the null hypothesis can be rejected to decide about its alternative (the research hypothesis) is used. **[Steps in Hypothesis Testing.]** 1. ***State your research question and make hypotheses about the answer***. - State the research question and develop the null and alternative hypotheses using literature in the chosen research area. - Scientific/Alternative/Research Hypothesis. - Null Hypothesis. - These hypotheses are about the population, not the sample ![](media/image54.png) - ***What do our hypotheses mean?*** - H1 = In the general population, memory performance changes depending on different study techniques. - (In the general population, different study techniques give different memory performance scores). - H0 = In the general population, memory performance does not change depending on different study techniques. - (In the general population, different study techniques give the same results) - ***Two tailed hypotheses*** -- both directions effect or relationship are considered in the alternative hypothesis of the test. - ***One-tailed hypotheses*** -- only one direction of an effect or relationship is predicted in the alternative hypothesis of the test. - In the scientific community, the one-tailed hypothesis is preferred but depends on the research question. - Generally, a one-tailed (directional) hypothesis is used when there is previous research or theories. - A two-tailed (non-directional) hypothesis is used when there is no previous research. 2. ***Set a decision criterion for deciding about the hypotheses.*** - In inferential statistics tests, we are seeking to find the location of the sample mean in a distribution. - What is the chance of obtaining the data in this study where the null hypothesis is true? - If the chance is high, then the null hypothesis cannot be rejected. - If the chance is low, then the null hypothesis is rejected and the alternative hypothesis is true. - ***[Levels of Significance]*** - When setting in advance how extreme a sample's score needs to be to reject the null hypothesis, researchers use Z scores and percentages. - In general, psychology researchers use a cutoff on the comparison distribution with a probability of 5% that a score will be at least that extreme if the null hypothesis were true. - That is, researchers reject the null hypothesis if the probability of getting a sample score this extreme (if the null hypothesis were true) is less than 5%. - This probability is usually written as p \<.05. - In some areas of research, or when researchers want to be especially cautious, they use a cutoff of 1% (p \<.01) - ***[The Critical Region and Alpha Level]*** - The critical region is the most extreme portion of a distribution of statistical values for the null hypothesis determined by the decision criterion. - If the cutoff sample score or critical value is reached or exceeded by the sample score, it rejects the null hypothesis. - The Alpha level is the probability level used by researchers to indicate the cutoff probability level that allows them to reject the null hypothesis. - In psychological research it is usually set at 0.05. 3. ***Collect your sample data.*** - See which the best way is to collect the data for your hypothesis. ![A diagram of a method Description automatically generated](media/image56.png) 4. ***Calculate statistics*** 1. Summarise the data with descriptive statistics. 2. Choose an appropriate inferential statistics test. 3. Calculate the inferential statistic and corresponding probability (*p*) value for statistic. - The *p* value is the probability value associated with an inferential test that indicates the likelihood of obtaining the data in a study when the null hypothesis is true. - It can take any value between 0 and 1. - ***Test statistic:*** - A statistic refers to a quantity that is calculated from a sample. - A test-statistic is a number that is calculated from a sample that was used to test a null hypothesis. - Different types of tests that use different distributions (that differ from the normal distribution based on assumptions made regarding the population parameters). - The job of statisticians is to devise these different tests; the job of social researchers is to select the right one. - The outcome of statistical testing is to determine the probability that an observed value is obtained if the null hypothesis were true. 5. ***Decide about the hypothesis.*** - ***How to decide?*** - Compare the statistic *p* value with α - Decide to either reject or retain the null hypothesis. - Decide if you can accept the alternative hypothesis. - The smaller the p-value, the stronger the evidence against the null hypothesis. - ***The Logic of Hypothesis Testing*** - "What is the probability of getting our research results if the opposite of what we are predicting were true?" - We usually predict an effect of some kind. However, we decide on whether there is such an effect by seeing if it is unlikely that there is not such an effect. - If it is highly unlikely that we would get our research results if the opposite of what we are predicting were true, that finding is our basis for rejecting the opposite prediction. - If we reject the opposite prediction, we are able to accept our prediction. - However, if it is likely that we would get our research results if the opposite of what we are predicting were true, we are not able to reject the opposite prediction. - If we are not able to reject the opposite prediction, we are not able to accept our prediction. **[More about the p-Value]** - A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis therefore we reject the null hypothesis. - A large p-value (\> 0.05) indicates weak evidence against the null hypothesis therefore you fail to reject the null hypothesis - The p-value should always be reported so that your readers can draw their own conclusions. **[Do not confuse α and p!]** - These numbers are easily confused because they are both numbers between 0 and 1 and are both probabilities - The α level is set in advance and we measure p-value against it. - Every test statistic has a corresponding probability or p-value. This value is the probability that the observed statistic occurred by chance alone. The ways of finding p differ from test to test. - To determine if an observed outcome is statistically significant, we compare the values of alpha and the p value. **[Hypothesis testing errors]** ![A pink and white rectangular box with black text Description automatically generated](media/image58.png) **[Type I Error. ]** - An error that is made in a hypothesis test when the researcher rejects the null hypothesis but turns out to be true (False positive). - E.g. in our study we find that there is a difference between the "mindful mnemonic" study method and traditional methods but this difference does not exist in the population. - The chance of making a Type I error is determined by the alpha level. - If the Alpha level is set at 0.05, it means there is a 5% chance of Type I error. **[Type II Error. ]** - An error made in a hypothesis test when the researcher does not reject the null hypothesis when it is actually false (False negative). - E.g. in our study we find that there is NO difference between the "mindful mnemonic" study method and traditional methods, but this difference exists in the population. - This would be an effect that exists in the population, but that was not detected in the study. This could happen because of two reasons. - ***Reasons for type II errors:*** - The probability of committing a Type II error is influenced by the sample size and the effect size -- the magnitude of the difference or effect. - Increasing the sample size or the effect size generally reduces the probability of Type II errors, but it may not always be practical to do so. - To keep Type II error low, design the study to maximize the effect in question and calculate the optimal sample size. **[Power Analysis.]** - Power = the ability of a hypothesis test to detect an effect or relationship when one exists. - Power analysis allows you to determine for each type of statistical test how many subjects you need to have in your sample in order to be able to demonstrate a certain effect size (e.g. small, medium, or large) - Power of the Test: The power of a statistical test is the probability of correctly rejecting a false null hypothesis. It is equal to 1 minus the probability of a Type II error and is often denoted by the symbol β. **[What β level is acceptable? ]** - This is context dependent, but according to Cohen (1992), the maximum acceptable probability of a Type II error is β = 0.2 (20%) - This means that the minimal acceptable power of a statistical test should be set at 80%. - A power of 80% implies that the test has an 80% chance of correctly rejecting a false null hypothesis (i.e., a Type II error rate of 20%). - Various statistical tools and software can perform power analyses, and the specific formula used depends on the statistical test being conducted. There are practical limitations to increasing sample sizes, and the goal is often to find a balance between achieving adequate power and working within resource constraints. **[Statistical Significance. ]** - The result of a statistical test is significant when the p-value is less than or equal to the alpha level in an inferential test, and the null hypothesis can be rejected. - Statistical significance is a measure of whether the observed results in a study are likely to have occurred by chance or if they reflect a genuine effect in the population. - Practical importance refers to the real-world significance or meaningfulness of the findings. It addresses whether the observed effect, even if statistically significant, is large enough to be of practical relevance or impact. - As social researchers, the latter is very important. - A statistically significant result might not have practical importance. **[Therapy and Anxiety. ]** - A psychologist conducts an experiment to test whether a new therapy is effective in reducing symptoms of anxiety. - After analysing the data, the psychologist finds a p-value of 0.02, which is below the conventional significance level of 0.05. - In this context, the result is statistically significant, suggesting that the observed effect (reduction in anxiety) is unlikely to be due to random chance. This provides evidence against the null hypothesis. - While the study may show statistical significance in reducing anxiety, it\'s crucial to assess whether the actual reduction is practically meaningful. - For instance, if the therapy results in a statistically significant but very small reduction in anxiety scores that might not make a noticeable difference in individuals' lives, the practical importance of the finding could be questioned. **[The Replicability Crisis in Psychological Science.]** - This ensures that results mere chance findings and can be generalized to a broader population. - Several factors led to scepticism about the reliability of many psychological studies. - ***Publication Bias*** -- Journals tend to publish with statistically significant results more frequently than studies with non-significant results. Researchers may be less inclined to publish non-significant findings, leading to an overestimation of the true effect sizes. - ***Small sample sizes*** - An increased risk of obtaining results that do not generalise well to the broader population. - ***Lack of replication studies*** - The reliability of many psychological findings remains uncertain. - ***Cultural and contextual factors*** - Lack of cross-cultural validation can limit the generalizability of psychological studies. - ***p-hacking --*** involves the manipulation of statistical analyses or data collection to achieve statistically significant results. - Questionable research practices, such as selectively reporting results that support a hypothesis, can also compromise the integrity of findings. - The term p hacking introduced by Nuzzo (2014) and refers to the conscious or subconscious manipulation of data in a way that produces a desired p value. - This could happen based on decisions about when to stop collecting data, whether or not your data will be transformed, which statistical tests will be used, which cases will be excluded etc. - This increases the likelihood of obtaining false-positive results, making it challenging to replicate findings in subsequent studies. - ***How to prevent p-hacking?*** - Transparent reporting. - Decide how to treat outliers early on. - Correct for multiple comparisons. - Replicate your own results. - Publication of negative results **[Data Dredging.]** - Splitting data in different ways to observe correlations as a result of chance. - This refers to the failure to know that the correlation was in chance. **Exploratory Research.** **Hypothesis Research** ---------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Open-ended and Flexible Focused and structured No known hypothesis but researchers try to discover relationships or trends within data. Researchers test pre-defined hypothesis by using statistical methods. Uses various statistical techniques and visualization methods to uncover the patterns and relationships. The analysis in hypothesis-driven studies is guided by the specific hypotheses, and statistical tests are chosen a priori to assess the validity of these hypotheses. High risk of data dredging as researchers are more prone to find patters due to chance, especially when using multiple analyses. Reduced risk of data dredging as analysis focuses on testing predetermined hypothesis that reduces false positives. **[Ways to avoid data dredging:]** - Clearly state your goal. - Correct for multiple comparisons. - Test your findings in a new hypothesis driven study. - Report both. **[Effect Sizes.]** - Reporting the p-value is no longer sufficient. - The effect size is the magnitude of the difference between groups. - A significant p-value tells us that an intervention works, whereas an effect size tells us how much it works. - This is an objective and standardized measure which allows comparison between different studies with different variables and scales of measurement. - Most common: Cohen's d ; Pearson's correlation coefficient r ; the odds ratio. **[What makes a good effect size?]** ![](media/image60.png) **[Multiple Comparisons.]** - Multiple comparisons arise when a statistical analysis involves multiple simultaneous statistical tests, each of which has a potential to produce a "discovery". - The more tests you perform, the higher the probability of encountering at least one statistically significant result by chance alone. - This increased risk of obtaining a false positive (Type I error) is known as the "multiple comparisons problem". - When you perform a large number of statistical tests, some will have p-values less than 0.05 purely by chance, even if all your null hypotheses are really true. **[Multiple Testing Correction Methods:]** - ***Bonferroni Correction:*** - The Bonferroni correction is a conservative method that involves dividing the desired overall significance level (usually 0.05) by the number of comparisons. - This adjusted significance level (α) is then used to determine statistical significance for each individual test. - If you are conducting 10 tests and want an overall significance level of 0.05, the Bonferroni-corrected significance level for each test would be 0.05 / 10 = 0.005. - ***False Discovery Rate (FDR) Correction***: - FDR correction controls the expected proportion of false discoveries among the rejected hypotheses. - It aims to control the rate of false positives relative to the total number of rejections. - It ranks p-values from smallest to largest, compares each p-value to a critical value based on its rank, and rejects or accepts the null hypothesis accordingly **Lecture 8 - T-tests and ANOVA.** **[What does a T-test do?]** - It is like a truth checker as it helps us see the difference in test scores that are big to believe that working out practical exercises works better that reading. - If the difference is big enough, the t-test lets us say: Yes, working out practical examples really makes a difference. - If the difference is small, we might conclude that it is probably just a coincidence. - T-tests help us figure out if something like a new way of learning, or anything else we're testing actually works or if the difference we see might just be random. - It is also used to compare the means of two groups. - It is a hypothesis-testing procedure in which the population variance is unknown; it compares t scores from a sample to a comparison distribution called a t distribution. - It determines whether a treatment, process or intervention has an effect on the population of interest, or whether two groups differ from one another. - T-tests helps you figure out if the differences between two groups are real or if they could have happened by chance. **[Examples for the different types of T-tests.]** - ***One Sample T-Test*** - determines whether the sample mean is statistically different from a known or hypothesized population mean. E.g. sample mean = 4 portions vs. population mean = 3.5 portions. - ***Paired Samples T-Test*** = compares the means between two related groups on the same continuous, dependent variable. E.g. number of portions before and after a 5-a-day health awareness training. - ***Independent Samples T-Test*** = compares the means between two unrelated groups on the same continuous, dependent variable. E.g. Who is more likely to have at least 5 portions of fruit or veg per day? Men or women? Older or younger? B. Psy (Hons) or B.A. students? Those who use a reminder app, or those who are told the benefits of the 5 a day. **[What does the t-test do?]** - The t-test helps points out the difference in for example portions of fruit and vegetable per day between the two groups is big enough to be meaningful or if it could have happened just by random luck. - It is a tool that measures how "spread out" the fruit and veg eating numbers are in each group. If the numbers are all close together, the tool will show a small number. If the numbers are all over the place, the tool will show a bigger number. - The t-test looks at these numbers from both groups and compares them. If the numbers are very different, it tells you there\'s a good chance the two groups really do eat different amounts of fruit and veg. But if the numbers are pretty similar, it suggests that the difference might just be due to random chance. A pink and white table with black text Description automatically generated ![A white text with black text Description automatically generated](media/image62.png) **[Independent Sample T-test.]** - Also known as unpaired samples t-test. **[Before Running Any Statistical Test]** - Check if your data satisfies the assumptions of this particular test. - The test only works correctly when the assumptions are met. **[Assumptions: the Independent Samples t-test]** 1. ***Dependent variable***: should be measured on a continuous scale. 2. ***Independent variable***: should consist of two categorical, independent groups. 3. ***Independence of observations***: There is no relationship between observations in groups or between the groups themselves. 4. ***Outliers***: There should be no significant outliers. 5. ***Normal Distribution:*** The dependent variable should be approximately normally distributed for each group of the independent variable. 6. ***Homogeneity of variances***: The population variance for each group of your independent variable is the same. **[Parametric vs Non-parametric Tests.]** **Parametric Tests** **Non-parametric Tests** ---------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------- Rely on specific assumptions about the data distribution. More flexible and can be used when assumptions are not met. Assumes that data follows a specific distribution (usually -- normal/bell-shaped distribution). Example -- t-test, ANOVA. Make fewer assumptions about the underlying data distribution. Suitable for when the data meets the assumption of normal distribution and when specific parameters (mean, variance) can be estimated. Considered distribution-free. Example -- Mann-Whitney U test, Wilcoxon signed-rank test. Appropriate when the data does not meet the assumptions of normal distribution or when dealing with ordinal or categorical data. ![A white paper with black text Description automatically generated](media/image64.png) **[Independent Sample T-test.]** - Difference between two independent samples: independent samples and measurements. - Is the difference between the means of the two groups statistically significant? - E.g. Is there a difference in anxiety between men and women? - The null hypothesis assumes that there is no difference between the two groups based on this characteristic. - The alternative hypothesis assumes there is a difference between the group (two-tailed) or that there is a direction of the difference (one-tailed). - The p-value will be calculated as part of the SPSS procedure. With p\ - Is symmetrical. - How tightly the data is grouped. - How data is skewed. - Box plots are drawn as a box with a vertical line down the middle, and has horizontal lines attached to each side (known as "whiskers"). - The box is used to represent the interquartile range (IQR) --- or the 50 percent of data points lying above the first quartile and below the third quartile --- in the given data set. - The whiskers are used to represent the variability of the minimum, maximum and any outlier data points in comparison to the IQR (the longer the whisker, the wider the variability of the attached data points to the IQR).