Lecture 3_1: Distribution and Probabilities PDF
Document Details

Uploaded by VictoriousElf1785
Bournemouth University
Tags
Related
- Chapter 6 The Normal Probability Distribution PDF
- Noida Institute of Engineering and Technology Statistics & Probability BAS0303 PDF
- Chapter 6: Normal Probability Distributions PDF
- Probability and Theoretical Distributions PDF
- Normal Probability Distribution: Random Variables
- Chapter 5: Normal Probability Distributions PDF
Summary
This document covers the basics of probability distributions, including normal distributions, probability, and Z-transformations. It also details the concept of probability distributions and how such distributions can be used to understand various scenarios.
Full Transcript
Week 3 EMSA 2024/2025 Distribution and Probabilities Students would know about… Distribution and Probabilities Running EMSA Experiment 2 Writing an introduction for a report Distribution and Probabilities Probability distributions Normal distribution Z-transformation Probability ...
Week 3 EMSA 2024/2025 Distribution and Probabilities Students would know about… Distribution and Probabilities Running EMSA Experiment 2 Writing an introduction for a report Distribution and Probabilities Probability distributions Normal distribution Z-transformation Probability Probability Refers to the likelihood of a particular event of interest occurring Probability Often expressed from values of 0 to 1 If you were to flip a coin, what is the probability that you will get heads? Probability Often expressed from values of 0 to 1 There is a 1 in 2 probability (0.5), which means that the likelihood to obtain heads is 50%. Probability Probability for certain scenarios can be obvious, but not always… 1. Winning the lottery of 30 million pounds 2. All politicians telling us the truth all the time 3. Getting lung diseases if you smoke 4. Manned flight to Mars within the next 10 years Probability Distribution Hence, how can we predict the probability accurately? Probability Distribution Defined as how elements are shared out among scores Probability Distribution Defined as how elements are shared out among scores Allows us to know the probabilities associated with the occurrence of every score in the distribution (i.e., sample). Probability Distribution 250 participants reported their mood during lockdown from 1 (extremely happy) to 9 (extremely sad) 23 people are extremely sad No participant reported to be extremely happy Probability Distribution But simple descriptions are not that informative. Probability distributions allows us to make guesses about the data How likely a particular score is Probability Distribution 250 participants reported their mood during lockdown from 1 (extremely happy) to 9 (extremely sad) If I took one participant randomly from this sample, what would be the most likely score? Probability Distribution 250 participants reported their mood during lockdown from 1 (extremely happy) to 9 (extremely sad) Answer: 6 Because this is the score that was rated most frequently. Based on this, we know that most likely participants were sad during lockdown. Probability Distribution It is very unlikely to obtain a score of 3 or below, but not improbable! Based on this, we can guess that a small minority of participants were somewhat happy during lockdown. In statistics, it is important to know if our analysis is due to chance (likely), or Is this likely or not? statistically significant (unlikely). Probability Distribution However, there are certain criteria(s) before we can use probability distribution to determine the likelihood of an event. One of this is that the distribution must be normally distributed. Normal Distribution Psychological variables tend to follow a normal distribution E.g., Height, intelligence, memory, etc. Carl Friedrich Gauss discovered an equation that fits this histogram Such distributions are called normal or Gaussian distributions Normal Distribution Mean Why is normal distribution important? Last week we talked about summarizing our data (descriptive statistics). Depending on the distribution of our data we can know which descriptive statistics is most appropriate. Normal Distribution Mean Characteristics of a normal distribution ❑ Mean score is in the center of the distribution (most likely value) ❑ Mean, median and mode are identical Normal Distribution Mean Characteristics of a normal distribution ❑ Mean score is in the center of the SD distribution (most likely value) ❑ Mean, median and mode are identical ❑ The width of the distribution is determined by the standard deviation Normal Distribution Mean Characteristics of a normal distribution ❑ Mean score is in the center of the SD distribution (most likely value) ❑ Mean, median and mode are identical ❑ The width of the distribution is 50% 50% determined by the standard deviation ❑ Symmetrical around the mean Approx. 50% scores would be below the mean and 50% above the mean Normal Distribution Mean Characteristics of a normal distribution ❑ Mean score is in the center of the SD distribution (most likely value) ❑ Mean, median and mode are identical ❑ The width of the distribution is determined by the standard deviation ❑ Symmetrical around the mean Approx. 50% scores would be below the mean and 50% above the mean ❑ The probability of a score increase as we approach to the mean Normal Distribution Mathematically normal distributions can be defined by its mean (μ, pronounced “mew”) and the standard deviation (σ, pronounced sigma) Knowing the mean and the SD of a normal distribution, you can predict approximately how likely any score is Normal Distribution Mean In this intelligence distribution, the mean is 100 and the standard deviation is 15 SD By looking at the distribution, is an intelligence score of 110 more likely than a score of 75? Normal Distribution Mean In this intelligence distribution, the mean is 100 and the standard deviation is 15 SD By looking at the distribution, is an intelligence score of 110 more likely than a score of 75? The answer is YES! *Remember, the higher the frequency, the more ‘likely’ of obtaining that score when chosen from random* Normal Distribution Mean In this intelligence distribution, the mean is 100 and the standard deviation is 15 SD What intelligence score corresponds to one standard deviation below the mean intelligence score? Is this score more or less likely than a score of 130? Normal Distribution Mean In this intelligence distribution, the mean is 100 and the standard deviation is 15 SD What intelligence score corresponds to one standard deviation below the mean intelligence score? Is this score more or less likely than a score of 130? One SD below the mean is 85, which has a higher frequency than 130. Ans: This score is MORE likely than 130. Normal Distribution Given the properties of a normal distribution: In short, the further a score is from the center (i.e., more SDs away from the mean of sample), the less probable of obtaining this score, as reflected by lower frequency. Normal Distribution Given the properties of a normal distribution: 68% of the scores are located in the range between -1SD and +1SD The probability of selecting someone within this range of scores is.68 Normal Distribution Given the properties of a normal distribution: 68% of the scores are located in the range between -1SD and +1SD The probability of selecting someone within this range of scores is.68 95% are located in the range between - 1.96SD and + 1.96SD The probability of selecting someone within this range of scores is.95 The probability of selecting someone outside of this range is.05 (or 5%) Normal Distribution For an intelligence dataset with a mean IQ of 100 and SD of 15 (and normally distributed) 68% Approx. 68% of the scores are located in the range between 85 and 115 100 – (1*15) = 85; 100 + (1*15) = 115 Normal Distribution For an intelligence dataset with a mean IQ of 100 and SD of 15 (and normally distributed) Approx. 68% of the scores are located 16% 16% in the range between 85 and 115 100 – (1*15) = 85; 100 + (1*15) = 115 Approx. 32% would be outside of this range ~ 16% scores would be higher than 115 ~ 16% scores would be lower than 85 Normal Distribution For an intelligence dataset with a mean IQ of 100 and SD of 15 (and normally distributed) 95% Approx. 95% of the scores are located in the range between 70 and 130 100 – (1.96*15) = 70.6 ≈ 70 100 + (1.96*15) = 129.4 ≈ 130 Normal Distribution For an intelligence dataset with a mean IQ of 100 and SD of 15 (and normally distributed) Approx. 95% of the scores are located in the range between 70 and 130 100 – (1.96*15) = 70.6 ≈ 70 2.5% 2.5% 100 + (1.96*15) = 129.4 ≈ 130 Approx. 5% would be outside of this range (scores will be higher than 130 or lower than 65) ~ 2.5% of the population are geniuses with IQ over 130. Z-Scores RECAP - Standardizing scores (z-scores) It allows the comparison of different scores coming from different tasks Z-scores are unitless Z-scores is also determined by the mean and SD + and – signs are used to indicate that the score is above or below the mean They are related, z-score also assumes that your data are normally distributed Z-Scores 68% of the sample are located in the z-score range between -1SD and +1SD The probability of selecting someone with a z-score of -1 or +1 is.68 95% are located in the range between - 1.96SD and + 1.96SD The probability of selecting someone with a z-score of -1.96 or +1.96 is.95 Z-Scores Imagine the average/mean height in the UK is normally distributed at 170cm, and the standard deviation is 10 cm. Majority (68%) of the UK population will be around 160 to 180 cm (±1 SD). As we move further away from the mean height (e.g., ±2 SD), the number of people decreases (32%). Z-Scores If the height in UK is normally distributed, we can tell how much taller (probability) you are from the general population. E.g., if your height is 190 cm, that is 2 SD above the mean. This is reflected as z-score of +2 Based on the probability distribution, this also means you are in the top 2.5%, or 97.5% taller than the UK population. Normal Distribution However, some distributions are skewed: E.g., Positively skewed The tail of the distribution is toward the right (Median < Mean) Reaction times Income Normal Distribution Imagine an experiment where participants sit in a driving simulator and are instructed to press the brake pedal as quickly as possible when they see a red light appear on the screen. Normal Distribution Imagine an experiment where participants sit in a driving simulator and are instructed to press the brake pedal as quickly as possible when they see a red light appear on the screen. Most participants are quick; however, some participants may be Fast Slow slower due to inattention. This causes the average (mean) reaction time to be higher than the majority of reaction times (median). Normal Distribution Other examples, e.g., income of individuals in the United Kingdom Few individuals with very high salary. E.g., CEO, Director Normal Distribution However, some distributions are skewed: E.g., Negatively skewed The tail of the distribution is toward the left (Median > Mean) Easy tasks Normal Distribution Imagine a group of students that are required to present their thesis, where their presentation duration would be recorded. Normal Distribution Imagine a group of students that are required to present their thesis, where their presentation duration would be recorded. Most students will take up all the given time to present (e.g., 45 minutes). However, some students were very 15 minutes 60 minutes quick (e.g., 25 minutes) because they were under-prepared. This causes the mean reaction time to be lower than the majority of presentation times (median). Normal distribution But obtaining a score of exactly 1 or 2 SD away from the mean is not always realistic E.g., your height might be 182 cm, this is only 0.2 SD above the mean, how many percent are you taller than the UK population? Normal distribution What if we want to be more specific? E.g… How likely is to select a participant with a score lower than 72 or proportion of scores below 72 or (lower tail) P(X < 72) Normal distribution What if we want to be more specific? E.g… How likely is to select a participant with a score between 72 and 120 or proportion of scores between 72 and 120 (middle) P(72 < X < 120) What if we want to be specific? How likely is that one individual gets a score below 72? (Or in other words proportion of scores below 72) What if we want to be specific? How likely is that one individual gets a score below 72? (Or in other words proportion of scores below 72) We need to know the mean, the SD and have one of the following: 1. A deep understanding of calculus integration Asking what is the proportion of scores that are within a range of score is the same as asking what is the area under the curve between that range What if we want to be specific? How likely is that one individual gets a score below 72? (Or in other words proportion of scores below 72) We need to know the mean, the SD and have one of the following: 1. A deep understanding of calculus z = (72-100)/15 = 1.866667… integration 2. A standard normal table In our intelligent dataset, a score of 72 corresponds to a z-score of -1.86 What if we want to be specific? How likely is that one individual gets a score below 72? (Or in other words proportion of scores below 72) We need to know the mean, the SD and have one of the following: 1. A deep understanding of calculus integration 2. A standard normal table In our intelligent dataset, a score of 72 corresponds to a z-score of -1.86, so the proportion of scores below 72 is 0.031 What if we want to be specific? How likely is that one individual gets a score below 72? (Or in other words proportion of scores below 72) We need to know the mean, the SD and have one of the following: 1. A deep understanding of calculus integration 2. A standard normal table 3. A piece of software that calculates this for you What if we want to be specific? How likely is that one individual gets a score below 72? (Or in other words proportion of scores below 72) We need to know the mean, the SD and have one of the following: 1. A deep understanding of calculus integration 2. A standard normal table 3. A piece of software that calculates this for you (e.g., JASP) Doing it on JASP You designed an experiment where: 1) Participants are presented with a two lights on a computer screen 2) And were instructed to press a button as soon as the lights do NOT match. Doing it on JASP You measured the average reaction time of 50 participants. The aim of your study is to determine the probability of obtaining a specific reaction time. Doing it on JASP Click on the ‘+’ sign on the right corner, and tick Distributions. Doing it on JASP Select ‘Distributions’ on your menu bar and select ‘Normal’ to begin visualizing or calculate probability on a normal distributed data. Doing it on JASP Get your variable from the data set (e.g., reaction time) and draw samples. This will provide you the descriptive information you need to create your distribution figure. Doing it on JASP Get your variable from the data set (e.g., reaction time) and draw samples. This will provide you the descriptive information you need to create your distribution figure. Doing it on JASP Fill this in based on your Get your variable from the data set (e.g., data, remember to change parameter to SD not variance Hours of sleep) and draw samples. This will provide you the descriptive information you need to create your distribution figure. You should change the range based on your minimum and maximum scores Doing it on JASP Let’s say we want to find out, how likely (i.e., probability) are participants faster than 180 milliseconds (e.g., reaction time less than 180) Doing it on JASP Let’s say we want to find out, how likely (i.e., probability) are participants faster than 180 milliseconds (e.g., reaction time less than 180) Doing it on JASP Looks like there is a 6% chance that participants are faster than 180milliseconds! Doing it on JASP Now, let’s say we want to find out, how likely (i.e., probability) are participants slower than 300 milliseconds during the task (>300 milliseconds). Doing it on JASP Apply this to our IQ dataset, how likely is that one individual gets a score below 72? You can use this online calculator! Click on this link: shinyapps.io Or paste this on your browser https://tomfaulkenberry.shinyapps.io/psystat/ Select the type of distribution; In our case, Normal Mean value (i.e., 100) SD value (i.e., 15) Or paste this on your browser https://tomfaulkenberry.shinyapps.io/psystat/ We want to find the area below a threshold, so we should select ‘Lower tail’ Or paste this on your browser https://tomfaulkenberry.shinyapps.io/psystat/ What is the ‘threshold’ we want? In our case, The calculator will it’s 72 automatically compute the probability once you fill in the sections In conclusion…? Type of distribution? Mode = Median = Mean Symmetrical bell-shaped Normal curve Mode < Median < Mean Positive Skewed Negative Mean < Median < Mode In conclusion…? How likely will I obtain a score Probability of within this range? normal distribution 68% of your data will 1 Standard deviation be within 1 SD away from the mean from the mean 1.96 Standard 95% of your data will deviations from the be within 1.96 SD away mean from the mean More than 1.96 5% of your data will Standard deviations fall outside of 1.96SD from the mean from the mean Running EMSA Experiment 2 Please remember to run the experiment on yourself (see Brightspace). This experiment will be investigating how frequency of drinking coffee affect sleep quality and duration! Will take