PSY201: Introduction to Quantitative Research in Psychology Lecture Notes PDF

Summary

These notes summarise the PSY201 course, specifically focused on introduction to quantitative research in psychology. The document outlines the course syllabus, textbook options, various course components (assignments, exams, tutorials), and introduces fundamental concepts in data analysis, including different data types, sampling methods, and descriptive statistics. The emphasis is on consuming and producing information.

Full Transcript

PSY201: Introduction to Quantitative Research in Psychology 1 Lectures: Mondays 9-11 Tutorials: Tuesdays and Wednesdays Instructor: Prof. Keisuke Fukuda (O ce hour: Mondays 11-12 @ CCT4067) ffi Syllabus Make sure to attend the one that you registered for! Syllabus...

PSY201: Introduction to Quantitative Research in Psychology 1 Lectures: Mondays 9-11 Tutorials: Tuesdays and Wednesdays Instructor: Prof. Keisuke Fukuda (O ce hour: Mondays 11-12 @ CCT4067) ffi Syllabus Make sure to attend the one that you registered for! Syllabus Textbook: Print ($199.95): Introduction to Statistics and Data Analysis | 7th Edition Roxy Peck/Chris Olsen ISBN: 9798214000008 OR eBook ($76.95): Introduction to Statistics and Data Analysis | 7th Edition Roxy Peck/Chris Olsen ISBN: 9798214000152 https://www.uoftbookstore.com/adoption-search-results? ccid=4863629&itemid=354166 Purchase through UTM bookstore is strongly recommended. Support for purchases through alternative methods is not guaranteed. Syllabus Course evaluations: Term test (30%) 90 minutes long on Oct 21st during lecture Multiple choice You may bring your hand-held non-programmable calculator and test aid (1-page, double-sided, Letter size) Final Exam (40%) 120 minutes long during Exam period Multiple choice You may bring your hand-held non-programmable calculator and test aid (1-page, double-sided, Letter size) Syllabus Course evaluations: Two written assignments (5% each) Assignment 1 due at 11:59pm on Oct 18th via Quercus submission Revised Assignment1 (optional) due at 11:59pm on Nov 15th via Quercus submission Assignment 2 due at 11:59pm on Dec 3rd via Quercus submission Tutorial participation and completion (18%) Your attendance AND completion of worksheet (submitted after each tutorial through Quercus by Friday 5pm) is required to get a full mark for each tutorial. You may miss 1 of 9 tutorials without losing a mark. Syllabus Course evaluations: SONA Experiment participation (2% for 3 credits = 3 hours) Participate in 3 hours of psychology experiments posted in UTM Psychology Research Sign-up system (SONA website). To do so, you need to create a participant account rst and enroll in PSY201_2024F course on SONA website (See Quercus for a quick guide). A failure to show up on time (no later than 15min after the appointment time) results in a penalty of -1 credit. Those who opt out can complete the substitute assignment (1 assignment = 1 credit) using the following link (https://www.utm.utoronto.ca/psychology/faculty-research/experiment-database- overview/substitute-assignments-experimental-credit). fi Why statistics? To be an informed consumer and producer of information! Informed consumer of information can Extract information accurately from visualized data (tables, graphs, etc). Evaluate numerical arguments Decide whether or not to use the information to change your behavior. Why statistics? To be an informed consumer and producer of information! Informed producer of information can Collect data appropriately. Summarize the data in an informative manner (Descriptive statistics) Analyze the data to draw fair conclusions (Inferential statistics) Visualize the data to communicate it to the audience. Consuming information wisely Example 1: What does this information tell us? Consuming information wisely Example 2: What does this information tell us? source: http://www.tylervigen.com/spurious-correlations Consuming information wisely Example 3: Dangerous DHMO! What does this information tell us? It causes >30% of erosion on the earth. It is found in >80% of lethal malignant tumor in human body. >50% of humans intake this within a week of their death. It is used in 100% of the nuclear power plants. It is used in >50% of cruel animal experiments. It is added in > 75% of processed food known to cause death. source: https://ja.wikipedia.org/wiki/DHMO Consuming information wisely Exercise 1: What does this information tell us? As part of its regular water quality monitoring efforts, an environmental control board selects five water specimens from a particular well each day. The concentration of contaminants in parts per million (ppm) is measured for each of the five specimens, and then the average of the five measurements is calculated. The histogram to the left summarizes the average contamination values for 200 days. Now suppose that a chemical spill has occurred at a manufacturing plant mile from the well. It is not known whether a spill of this nature would contaminate groundwater in the area of the spill and, if so, whether a spill this distance from the well would affect the quality of well water. One month after the spill, five water specimens are collected from the well, and the average contamination is 14.5 ppm. Considering the variation before the spill, would you interpret this as convincing evidence that the well water was affected by the spill? What do we want to know about and from? Population: the entire correction of individuals or objects about which information is desired. Sample: A subset of the population from which information is collected. Descriptive statistics: The type of statistics used to organize and summarizing data. Inferential statistics: The type of statistics used to make inference about the population from the sample. credit: https://databasecamp.de/en/statistics/population-and-sample What is data? Data: A collection of observations on one or more variables Variable: A characteristic whose value may change (= can be variable) from one observation to another. Example Data: Height of humans source: https://www.vulture.com Types of data and datasets Numerical data: Data whose observations are numerical. (e.g., 5’8”, 6’2”, etc) Categorical (qualitative) data: Data whose observations are categorical (e.g., male, female, etc) Univariate data set: A set of data whose observations vary only in one characteristics Multivariate data set: A set of data whose observations vary in multiple characteristics Example Data: Height of humans source: https://www.vulture.com Types of numerical variables Discrete numerical variable: a numerical variable whose possible values are isolated and limited points on the number line (= limited possibility of values). Example Data: Shark attack! source: https://www. eldandstream.com/survival/how-many-shark-attacks-per-year/ fi Types of numerical variables Continuous numerical variable: a numerical variable whose possible values can be anywhere on the number line (= unlimited possibility of values) Example Data: Shark attack! https://matadornetwork.com/read/shark-attack-survey/ Frequency distributions for categorical data Frequency: The frequency of a category is the number of times the particular category is observed within a dataset Relative frequency: The relative frequency of a category is the proportion of the observations that belongs to the particular category. Example Data: Shark attack! source: https://thedailyjaws.com/news/ orida-is-the-shark-bite-capital-of-the-world fl How do we collect data?: Observational study Observational study: A study in which characteristics of a sample selected from one or more existing populations are observed in order to draw some conclusion about the population or the di erence in the populations with regard to the characteristics. Example: “The internet usage across young (21-40) adults?” Methodology: 1000 individuals (gender balanced) from age group 21-40 answered the following question: “Do you use internet everyday?” 100 75 % of “Yes” 50 25 0 Young (21-40) Middle (41-60) Old (61-80) ff How do we collect data?: Observational study Observational study: A study in which characteristics of a sample selected from one or more existing populations are observed in order to draw some conclusion about the population or the di erence in the populations with regard to the characteristics. Example: “The internet usage across young (21-40), middle (41-60), and old (41-60) adults?” Methodology: 1000 individuals (gender balanced) each from age group 20-40, 40-60, and 60-80 answered the following questions: “Do you use internet everyday?” 100 What if you learned 75 that the question was % of “Yes” 50 answered on the online portal of NY times from 25 monthly subscribers? 0 Young (21-40) Middle (41-60) Old (61-80) ff How do we collect data sensibly? Ensure to avoid biases! Selection (or sampling) bias: Tendency for the sample to di er from the population as a result of systematic exclusion of some part of the population Measurement (or response) bias: Tendency for the sample to di er from the population due to speci cs of methodologies employed to measure the characteristics of interest Nonresponse bias: Tendency for the sample to di er from the population because a particular subset of samples did not (or choose not to) contribute to the measurement of the characteristics of interest source: source: https://thedailyjaws.com/news/ orida-is-the-shark-bite-capital-of-the-world image credit: https://www. ickr.com/photos/61056899@N06/ fi fl fl ff ff ff How do we collect data sensibly? Recommended approach = (Simple) Random sampling! Simple random sample size of n: A sample size of n selected by ensuring that every di erent possible sample of size n has the same chance of being selected. Sampling without replacement: Once an individual from the population is selected to be included in a sample, the indvidual cannot be selected again in the sampling process. This ensures that a sample with the size = n will include n di erent individuals from the population Sampling with replacement: After an individual from the population is selected to be included in a sample, the individual is placed back in the population and can be selected again in the sampling process. Thus, any given individual has a chance of being selected into the sample multiple times. Population Population X Sample (n = 4) Sample (n = 4) X X X When sample size is less than 10% of the population (pretty much the case all the time), both samplings are practically equivalent! ff ff How much is enough? Population Distribution of MATH SAT scores of all How big should the random sample be to know applicants at a University (n = 5000) reliably about the population? MATH SAT SCORE With random sampling, 1% (50/5000) sample size already tells us reliably about the population! How do we collect data sensibly? However… Simple Random Sampling is very di cult to implement in real world… Cost: Di culty in implementation e.g.) Human height More practical sampling alternatives? ffi ffi How do we collect data sensibly? Alternative sampling methodologies Strati ed Random Sampling: First, divide the entire population into non-overlapping subgroups (strata), and then do a random sampling within each subgroup (stratum) to collect on the appropriate n based on the relative size of the stratum. This approach can yield more “accurate” estimate of the entire population than simple random sampling especially when the sample size is relatively small. Cluster Sampling: Perform random sampling at a group (=cluster) level instead of an individual level Systematic Sampling After choosing the rst sample randomly, and then sample subsequent data in a systematic manner (e.g., 1 in k systematic sampling) Convenient Sampling Sampling data based on convenience and availability (e.g., Voluntary sampling) For practical reasons, convenient sampling is often used. However, one needs to be extremely careful when generalizing the data to the entire population because the sample is likely not the unbiased representation of the population. fi fi How do we collect data?: Experimental study Experimental study: A study in which one or more explanatory variables are manipulated through experimental conditions in order to observe their e ects on a response variable. Explanatory variables: Variables whose values are controlled or manipulated by the experimenters. Also referred to as independent variables or factors Response variables: Variables whose values are measured but not controlled by the experimenter. Also referred to as dependent variables. Response variables are hypothesized to be related to the explanatory variables. Experimental conditions: Any particular combinations of values for the explanatory variables. Also known as treatments. Extraneous variables: Variables that are not explanatory variables but can a ect the response variables What is a good experiment? A good experiment is designed so that only the explanatory variables, but no extraneous variables, can explain the e ect observed in the response variables. If other extraneous variables can potentially explain the observed e ects, the explanatory variable and the extraneous variables are confounded. The four best friends to design good experiments are: Direct control: Holding the extraneous variables constant across experimental conditions Random assignment: Assigning samples randomly across experimental conditions to even out contribution of the extraneous variables. Blocking: Using the extraneous variables to create groups that are evenly assigned across experimental conditions Replication: Repeating the experiment multiple times to ensure that the observed e ect is reliable and not due to some idiosyncrasy speci c to a particular data set. image credit: https://rhetthammersmithhorror.tumblr.com/post/91177232708 ff ff fi ff ff ff How do we collect data?: Experimental study Example: Does leaving a “Thank you” on check increase tip amount? Explanatory variables: “Thank you” on check or not Experimental conditions (Treatment): You leave the “Thank you” note on check or not. Response variables: % amount of tips a customer leaves. Participants: 200 customers that visit your restaurant during your shifts (Thursday and Friday) How should we assign the experimental conditions? The e ect of extraneous variable (Days of the week) Option A: “Thank you” on Thursday and No “Thank you” on Friday will be confounded with that of the experimental variable! Option B: “Thank you” on Friday and No “Thank you” on Thursday Option C: One half each of “Thank you” and no “Thank you” on each day Blocking of experimental conditions (cf, strati ed sampling) Option D: For each participant, ip a coin and decide “Thank you” or no “Thank you”. But.. aren’t the samples not randomly collected? In reality, it is often di cult (unethical or impossible) to ensure random sampling (e.g., forcing random people to eat food at your restaurant). However, so long as the non-random samples are assigned randomly to experimental conditions, it is possible to assess the e ect of experimental variables in appropriately! ff fi ffi ff fl How do we collect data?: Experimental study Oh, it sounds complex… How do I organize experimental design? Create a ow chart! fl PSY201: Introduction to Quantitative Research in Psychology 1 Lectures: Mondays 9-11 Tutorials: Tuesdays and Wednesdays Instructor: Prof. Keisuke Fukuda (O ce hour: Mondays 11-12 @ CCT4067) ffi Why Visualize Data? To understand and communicate the data accurately and e ectively! “Ideally, how far from home would you like to the college you attend to be? (12715 high school students)” “Ideally, how far from home would you like to the college your child attend to be? (3007 parents)” Image credit: From 2009 College Hopes & Worries Survey Findings https://www.extraspace.com/blog/life-transitions/coping-empty-nest-syndrome-college/ (https://www.princetonreview.com/archival/ir/revu_news_2009_3_25_general.pdf) ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “Ideally, how far from home would you like to the college you attend to be? (12715 high school students)” “Ideally, how far from home would you like to the college your child attend to be? (3007 parents)” What do we want to know about the data? What’s the ideal/worst distance? (Comparing across distances) Do students and parents disagree? (Comparing between students and parents) ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “Ideally, how far from home would you like to the college you attend to be? (12715 high school students)” “Ideally, how far from home would you like to the college your child attend to be? (3007 parents)” What do we want to know about the data? What’s the ideal/worst distance? (Comparing across distances) Do students and parents disagree? (Comparing between students and parents) ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “Ideally, how far from home would you like to the college you attend to be? (12715 high school students)” “Ideally, how far from home would you like to the college your child attend to be? (3007 parents)” 5000 3750 Frequency 2500 1250 0 1000miles ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “Ideally, how far from home would you like to the college you attend to be? (12715 high school students)” “Ideally, how far from home would you like to the college your child attend to be? (3007 parents)” 0.6 0.45 Relative Frequency 0.3 0.15 0 1000miles Relative frequency allows comparison across groups even when they di er in sample size! ff ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “How many typos in a resume would make you not consider a job candidate (150 senior execs)?” What do we want to know about the data? How many typos would a majority of execs accept? (Proportion x # of typos) ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “How many typos in a resume would make you not consider a job candidate (150 senior execs)?” ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “How many typos in a resume would make you not consider a job candidate (150 senior execs)?” ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “How many typos in a resume would make you not consider a job candidate (150 senior execs)?” Pie chart communicates the relative proportion of each category. Slice size of category A = 360° x relative frequency of category A Pie chart can be e ective when the number of categories is relatively small. Image credit: https://commadot.com/pie-charts-are-almost-always-bad-ux/ ff ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “How many typos in a resume would make you not consider a job candidate (150 senior execs)?” 100 75 50 Bar chart can also communicates the relative proportion of each category. 25 0 % ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “When something is run by government, it is usually ine cient and wasteful” (from “Scientists, Public Di er in Outlooks”; USA Today, July 10, 2009) ff ffi ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “How many partners does a queen bee mate with?” (from The Curious Promiscuity of Queen Bees (Annals of Zoology [2001: 255-265]) What do we want to know about the data? What’s the distribution of the number of partners of QB? ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “How many partners does a queen bee mate with?” (from The Curious Promiscuity of Queen Bees (Annals of Zoology [2001: 255-265]) What do we want to know about the data? What’s the distribution of the number of partners of QB? A histogram communicates the distribution of frequencies of observations ff Why Visualize Data? To understand and communicate the data accurately and e ectively! Outlier: a datapoint that is markedly di erent from other datapoints ff ff Why Visualize Data? To understand and communicate the data accurately and e ectively! Outlier: a datapoint that is markedly di erent from other datapoints ff ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “ Do all students report their GPA accurately?” (from “Self-Reports of Academic Performance; Social Methods and Research [November 1981]: 165-185) Reported GPA - Actual GPA ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “ Do all students report their GPA accurately?” (from “Self-Reports of Academic Performance; Social Methods and Research [November 1981]: 165-185) Reported GPA - Actual GPA ff Why Visualize Data? To understand and communicate the data accurately and e ectively! ff Why Visualize Data? To understand and communicate the data accurately and e ectively! ff Why Visualize Data? To understand and communicate the data accurately and e ectively! ff Why Visualize Data? To understand and communicate the data accurately and e ectively! ff Why Visualize Data? To understand and communicate the data accurately and e ectively! ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “How often does draught occur in Albuquerque, New Mexico?” (1950-2008 from The National Climatic Data Center) Cumulative Relative What do we want to know about the data? Frequency How often is the annual rainfall below 8 inches? 0.052 0.155 Cumulative relative frequency: For a given value of x of 0.241 variable A, it computes the sum of all the relative 0.344 frequencies for value of x and less. 0.516 0.585 Cumulative relative frequency for 8 inches 0.792 = 0.052 + 0.103+0.086+0.103 0.895 = 0.344 0.977 1 https://www.ncdc.noaa.gov/oa/climate/research/cag3/city.html ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “How often does draught occur in Albuquerque, New Mexico?” (1950-2008 from The National Climatic Data Center) Cumulative Relative What do we want to know about the data? Frequency How often is the annual rainfall below 8 inches? 0.052 0.155 Cumulative relative frequency: For a given value of x of 0.241 variable A, it computes the sum of all the relative 0.344 frequencies for value of x and less. 0.516 0.585 Cumulative relative frequency for 8 inches 0.792 = 0.052 + 0.103+0.086+0.103 0.895 = 0.344 0.977 1 https://www.ncdc.noaa.gov/oa/climate/research/cag3/city.html ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “How often does draught occur in Albuquerque, New Mexico?” (1950-2008 from The National Climatic Data Center) Cumulative Relative What do we want to know about the data? Frequency How often is the annual rainfall below 8 inches? 0.052 Cumulative relative frequency 0.155 0.241 0.344 A line graph 0.516 communicates the 0.585 change of y as a 0.792 0.344 function of x. 0.895 0.977 1 https://www.ncdc.noaa.gov/oa/climate/research/cag3/city.html ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “Is there a relationship between the height and artistic score of gure skaters?” (from 2006 Winter Olympics) What do we want to know about the data? Bivariate relationship between height and artistic scores! fi ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “Is there a relationship between the height and artistic score of gure skaters?” (from 2006 Winter Olympics) Yevgeny Plushenko (178cm, 41.21pts) Image credit: https://www.goodfon.com/sports/wallpaper-evgeniy-plyuschenko-sportsmen.html fi ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “Is there a relationship between the height and artistic score of gure skaters?” (from 2006 Winter Olympics) There doesn’t appear to be a clear bivariate relationship between height and artistic points of gure skater… Image credit: https://www.goodfon.com/sports/wallpaper-evgeniy-plyuschenko-sportsmen.html fi fi ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “Is there a relationship between the height and artistic score of gure skaters?” (from 2006 Winter Olympics) Image credit: https://www.goodfon.com/sports/wallpaper-evgeniy-plyuschenko-sportsmen.html fi ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “Is there a relationship between the height and artistic score of gure skaters?” (from 2006 Winter Olympics) Image credit: https://www.goodfon.com/sports/wallpaper-evgeniy-plyuschenko-sportsmen.html fi ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “How have the earnings changed over time for di erent academic degrees?” (from U.S. Census Bureau) high school degree graduates (=1) Proportional earning compared to ff ff Why Visualize Data? To understand and communicate the data accurately and e ectively! “How have the earnings changed over time for di erent academic degrees?” (from U.S. Census Bureau) A line graph high school degree graduates (=1) Proportional earning compared to communicates the change of y as a function of x. It can depict multiple sets of data to allow convenient comparisons so long as the display is not too complicated/busy. ff ff Why Visualize Data? To understand and communicate the data accurately and e ectively! 5 Simple rules of e ective communications 1.Choose a type of display wisely! 2.Make sure to include scales and labels of all axes! 3.Make sure to include labels and legends! 4.Keep your displays honest! 5.Keep your displays simple! ff ff Why Visualize Data? To understand and communicate the data accurately and e ectively! What’s wrong in this visualization? How can we improve it? https://www.codeconquest.com/blog/12-bad-data-visualization-examples-explained/ ff Why Visualize Data? To understand and communicate the data accurately and e ectively! What’s wrong in this visualization? How can we improve it? https://www.codeconquest.com/blog/12-bad-data-visualization-examples-explained/ ff Why Visualize Data? To understand and communicate the data accurately and e ectively! What’s wrong in this visualization? How can we improve it? https://www.codeconquest.com/blog/12-bad-data-visualization-examples-explained/ ff Why Visualize Data? To understand and communicate the data accurately and e ectively! What’s wrong in this visualization? How can we improve it? https://www.codeconquest.com/blog/12-bad-data-visualization-examples-explained/ ff Why Visualize Data? To understand and communicate the data accurately and e ectively! What’s wrong in this visualization? How can we improve it? https://www.codeconquest.com/blog/12-bad-data-visualization-examples-explained/ ff Why Visualize Data? To understand and communicate the data accurately and e ectively! What’s wrong in this visualization? How can we improve it? https://www.codeconquest.com/blog/12-bad-data-visualization-examples-explained/ ff PSY201: Introduction to Quantitative Research in Psychology 1 Lectures: Mondays 9-11 Tutorials: Tuesdays and Wednesdays Instructor: Prof. Keisuke Fukuda (Office hour: Mondays 11-12 @ CCT4067) Why Collect Data? To gather information about the Data! What kind of information do you want to know about data? Example 1: Annual Income of GTA Residents How much do people earn on average? How much does the person in the middle (50th percentile) earn? Central tendency: Information about the center of the distribution of the data What’s the highest pay? What’s the lowest pay? What’s the variability of the pay across the population? Variability: Information about the spread of the data among the distribution Why Collect Data? To gather information about the Data! What kind of information do you want to know about data? Example 2: Likelihood of winning for a lottery What’s the chance of winning? Proportion of an outcome: The number of a particular outcome/ the number of total attempts Why Collect Data? Sample vs. Population What we want to know is the information (central tendency, variability) about the population. But in reality, it is infeasible to collect data from the entire population. Instead, we collect data from “representative” samples with a hope that the information gathered from the sample are good estimates of the population-level information. Fortunately, information collected from a “representative” sample is a reliable estimate of the population! image credit : https://objkt.com/tokens/hicetnunc/145583, https://tenor.com/view/hammaya-relaxed-relax-mr-bean-gif-gif-18455676 Central Tendency: Mean vs. Median Mean: The average of all the observations. Sample mean of Xs (X): the mean of a sample of Xs. Population mean of Xs (μx): the mean of all Xs in the population n samples So, if you have n samples of X, the sample mean of Xs will be: Example: Number of raccoon babies produced by a raccoon mom a year Xs = [2 2 1 4 5 4 6] 2+2+1+4+5+4+6 24 3.43 7 7 Central Tendency: Mean vs. Median Median: The middle value (50th percentile) of all the observations. Sample median of Xs: the median of a sample of Xs. Population median of Xs: the median of all Xs in the population So, to compute the median of n samples of Xs (e.g., Raccoon Babies) Xs = [2 2 1 4 5 4 6] 1. First, sort all n samples of xs in the order of the magnitude. 2. Identify the middle value sorted Xs = [1 2 2 4 4 5 6] 1. if n is an odd number, the sample median is the value of the middle sample Median = 4 2. if n is an even number, the sample median is the average values of the middle two samples. Central Tendency: Mean vs. Median Aren’t they pretty much the same? Why do we need both? Central Tendency: Mean vs. Median Example: Income distribution https://doodles.mountainmath.ca/posts/2018-10-28-understanding-income-distributions-across-geographies-and-time/ Variability Range: Largest value - Smallest value Range: 70 - 20 = 50 Range: 70 - 20 = 50 Range: 50 - 40 = 10 Variability Deviation from the mean: The difference of each observation from the mean. so, for a sample with n observations, Example: Price of Big Mac across countries Variability But, how variable is the sample overall? Should we take the sum of the deviations from mean? Example: Price of Big Mac across countries NO!!! Variability But, how variable is the sample overall? Instead, we should first sum up the squared deviations from mean! Example: Price of Big Mac across countries Sum of sample squared deviations from mean Variability we will talk about why it is not n later! But, how variable is the sample overall? and divide it by (n-1) to control for the sample size Example: Price of Big Mac across countries Sample Variance (S2) Sample Variance (S2): of Big Mac price 2.2469 = = 0.3745 7-1 Sample Standard Deviation (S) Sample Standard Deviation (S): of Big Mac price √ = √0.3745 = 0.6120 Variability How about population? Σ (x - μ) 2 Population Variance (σ 2)= N Population Standard deviation = √σ 2 Why is denominator different between sample and population variance? Variability Interquartile range (IQR): More robust metric of variability against a small number of extreme samples (i.e., outliers) Upper quartile (UQ): Median of the upper half of the sample Lower quartile (LQ): Median of the lower half of the sample Interquartile range (IQR) = UQ - LQ Example: Height of two families Sample 1: [160 165 165 170 170 180] Sample 2: [50 165 165 170 170 180] Sample Standard Deviation = 6.83 Sample Standard Deviation = 49.3 IQR = 170 (UQ) - 165 (LQ) = 5 IQR = 170 (UQ) - 165 (LQ) = 5 Variability Boxplot: Visualizing variability based on quartiles LQ UQ Lowest value Extreme outliner: = an observation whose value is more htan 3IQR away from the nearest quartile. Mild outlier: an observation whose valule is more than 1.5 QR away but less than 3 IQR away from the nearest quartile Variability Example: 2009-2010 salaries of NBA players What can we conclude? Interpreting center and variability Why do we care about the center (mean) and variability (standard deviation)? Because we can know a lot about the data. Chebyshev’s Rule Interpreting center and variability Why do we care about the center (mean) and variability (standard deviation)? Because we can know a lot about the data. Example: IQ scores Chebyshev’s Rule Important note: Chevyshev’s rule provides a conservative estimate of data distribution that works for any shapes of distribution! In most cases, it is overly conservative. Interpreting center and variability Why do we care about the center (mean) and variability (standard deviation)? Because we can know a lot about the data. The empirical rule A large majority of the distribution in the real world can be approximated by a normal distribution! Interpreting center and variability Why do we care about the center (mean) and variability (standard deviation)? Because we can know a lot about the data. The empirical rule Example: Height of mothers from Biometrika (1903): Measured heights of 1052 mothers in Measures of Relative Standing Z score: how many standard deviation away a score is from the mean. Measures of Relative Standing Percentile: Percentile of a data point X refers to how many % of the data is at or below X in the distribution. Understanding Bivariate Relationship Correlation: statistical relationship between two variables (= bi- variate relationship) Understanding Bivariate Relationship Correlation: statistical relationship between two variables (= bi- variate relationship) Measuring the strength of a linear bivariate relationship Use the sum of ZxZy (= ΣZxZy)! Measuring the strength of a linear bivariate relationship Sample Correlation Coefficient (r): A measure of the strength and direction of a linear bivariate relationship in a sample 5 key properties of r values 1). The value of r is between -1 to 1 2). When |r| = 1 (r is either 1 or -1), all the data points are on a single line! 3). The value of r is a measure of a linear relationship between variables x and y. 4). The value of r does not depend on the unit of measurement for either variable. 5). The value of r does not depend of which of the two variables are used as x and y. Measuring the strength of a linear bivariate relationship Population Correlation Coefficient (ρ): A measure of the strength and direction of a linear bivariate relationship ρ in a population N 5 key properties of ρ values 1). The value of ρ is between -1 to 1 2). When | ρ | = 1 (r is either 1 or -1), all the data points are on a single line! 3). The value of ρ is a measure of a linear relationship between variables x and y. 4). The value of ρ does not depend on the unit of measurement for either variable. 5). The value of ρ does not depend of which of the two variables are used as x and y. Measuring the strength of a linear bivariate relationship Caution 1: not all bivariate relationship is a straight line! r = 0.09.. Measuring the strength of a linear bivariate relationship Caution 2: Correlation is not causation! r = 0.62! Matthews, 2000 PSY201: Introduction to Quantitative Research in Psychology 1 Lectures: Mondays 9-11 Tutorials: Tuesdays and Wednesdays Instructor: Prof. Keisuke Fukuda (O ce hour: Mondays 11-12 @ CCT4067) ffi Probability: Understanding uncertainty Chance experiment: Any activity or situation in which there is uncertainty about which of the two or more possible outcome will result. Sample Space: The collection of all possible outcome of a chance experiment. Example: Rolling a dice. Sample space = {1, 2, 3, 4, 5, 6} Example: Rolling two die. Sample space = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12} Probability: Understanding uncertainty Chance experiment: Any activity or situation in which there is uncertainty about which of the two or more possible outcome will result. Sample Space: The collection of all possible outcome of a chance experiment. Example: A customer (male or female) purchasing a Honda Civic (hybrid or traditional). Sample space = {(male, hybrid), (male, traditional), (female, hybrid), (female, traditional)} Probability: Understanding uncertainty Chance experiment: Any activity or situation in which there is uncertainty about which of the two or more possible outcome will result. Sample Space: The collection of all possible outcome of a chance experiment. Event: Any collection of outcomes of a chance experiment. Simple event: An event consisting of exactly one outcome Examples of simple event: male purchasing an hybrid Civic (MH) female purchasing a traditional Civic (FT) Examples of event: a customer purchasing an hybrid Civic ({MH, FH}) female purchasing any kind of Civic ({FH, FT}) Probability: Understanding uncertainty Creating and de ning events fi Probability: Understanding uncertainty Creating and de ning events “To spice up your mornings, you decided that you will ip a coin to decide your breakfast (mu n or cereal) for the next three days” Event A: Eating mu n three days Event C: “Boring… eating the same thing 3 days in a row” A B Event B: Eating cereal three days fi ffi fl ffi Probability: Understanding uncertainty Creating and de ning events “To spice up your mornings, you decided that you will ip a coin to decide your breakfast (mu n or cereal) for the next three days” Event A: Eating mu n three days Event D: “Not Boring… some changes in my diet!” A B Event B: Eating cereal three days fi ffi fl ffi Probability: Understanding uncertainty Venn diagram: Graphical representations of events When A and B are mutually exclusive, you can also say A and B are disjoint. Probability: Understanding uncertainty Venn diagram: Graphical representations of events Probability: Understanding uncertainty Venn diagram: Graphical representations of events Probability: Understanding uncertainty Probability of an event E occurring (P(E)) refers to the likelihood that the event E occurs. When a su cient number of chance experiments are performed, the probability of an event E (P(E)) can be estimated as: ffi Probability: Understanding uncertainty Example: Tossing a coin Probability: Understanding uncertainty Rules of Probability 1 Let P(A) = the probability of event A occurring A 0 ≦ P(A) ≦ 1 Probability: Understanding uncertainty Rules of Probability 2 Let P(A) = the probability of event A occurring Let P(not A) = the probability of event A not occurring A P(A) + P(not A) = 1 Probability: Understanding uncertainty Rules of Probability 2 Let P(A) = the probability of event A occurring Let P(B) = the probability of event B occurring Let P(A B) = the probability of event A and B occurring A B Let P(A B) = the probability of event A or B occurring If A and B are mutually exclusive (P(A B) = 0), P(A B) = P(A) + P(B) Probability: Understanding uncertainty Example: At an imaginary car dealership Probability of selling Honda (E1) =.25 Probability of selling Nissan (E2) =.18 Probability of selling Toyota (E3) =.14 How likely is it to sell a Japanese car (Honda, Nissan and Toyota) at this dealership (given that ever customer purchase one car)? Probability: Understanding uncertainty Example: At an imaginary car dealership How likely is it to sell a Japanese car (Honda, Nissan and Toyota) at this dealership (given that ever customer purchase one car)? Probability: Understanding uncertainty Example: At an imaginary car dealership How likely is it to sell a non-Japanese car (Honda, Nissan and Toyota) at this dealership? Probability: Understanding uncertainty Rules of Probability 2 Let P(A) = the probability of event A occurring Let P(B) = the probability of event B occurring Let P(A B) = the probability of event A and B occurring A B Let P(A B) = the probability of event A or B occurring If A and B are not mutually exclusive (P(A B) ≠ 0), P(A B) = P(A) + P(B) - P (A B) Probability: Understanding uncertainty Rules of Probability 2 Let P(A) = the probability of event A occurring Let P(B) = the probability of event B occurring Let P(A|B) = the probability of event A occurring provided that B A B occurred (= Conditional probability of A given B) P(A B) P(A|B) = P(B) Probability: Understanding uncertainty Example: At an imaginary University You became friends with a senior. How likely does this friend live on-campus? Probability: Understanding uncertainty Example: At an imaginary University P(On-campus|Senior) = prob. a senior living on campus P(On-campus senior) = prob. a student lives on campus and is a senior = 150/1600 = 0.0938 P(Senior) = prob. a student is a senior = 425/1600 = 0.2656 P(On-campus|Senior) = P(On-campus senior) / P(Senior) = 0.0938/0.2656 = 0.3529 You became friends with a senior. How likely does this friend live on-campus? Probability: Understanding uncertainty Rules of Probability 2 Let P(A) = the probability of event A occurring Let P(B) = the probability of event B occurring When A and B are independent (i.e., the occurrence of A does not A B give any information about the likelihood of the occurrence of B) P(A|B) = P(A) P(B|A) = P(B) Probability: Understanding uncertainty Example: At an imaginary country How likely would a condo owner purchase adjustable mortgage? Probability: Understanding uncertainty Example: At an imaginary country P(Adj|Condo) = prob. a condo owner purchases an adjustable mortgage P(Adj Condo) = prob. a home owner purchases an adjustable mortgage and an owner of a condo U =.21 P(Condo) = prob. a home owner owns a condo. =.3 P(Adj|Condo) = P(Adj Condo) / P(Condo) U =.21/.3 =.7 How likely would a condo owner purchase adjustable mortgage? Probability: Understanding uncertainty Rules of Probability 2 Let P(A) = the probability of event A occurring Let P(B) = the probability of event B occurring When A and B are independent (i.e., the occurrence of A does not A B give any information about the likelihood of the occurrence of B) P(A B) = P(A)xP(B) Probability: Understanding uncertainty Rules of Probability 2 Let P(A) = the probability of event A occurring Let P(B) = the probability of event B occurring When A and B are dependent (i.e., the occurrence of A gives some A B information about the likelihood of the occurrence of B) P(A B) = P(A|B)xP(B) Probability: Understanding uncertainty Rules of Probability 2 Let P(A) = the probability of event A occurring Let P(B) = the probability of event B occurring Let P(A) + P(B) = 1 A B For any event E, When A and B are mutually exclusive, P(A|E) = P(E|A)xP(A) P(E|A)xP(A) + P(E|B)xP(B) Probability: Understanding uncertainty Rules of Probability 2 Let P(A1) = the probability of event A1 occurring Let P(An) = the probability of event An occurring Let P(A1) + … + P(An) = 1 A B For any event E, When A1 ~An are mutually exclusive, Bayes’ Rule! P(A1|E) = P(E|A 1)xP(A1) P(E|A1)xP(A1) + … + P(E|An)xP(An) Probability: Understanding uncertainty Example: Detecting Lyme Disease (from American Journal of Clinical Pathology : 168-174) Probability: Understanding uncertainty Example: Detecting Lyme Disease (from American Journal of Clinical Pathology : 168-174) If you test positive, how likely are you to have Lyme disease? Probability: Understanding uncertainty Example: Detecting Lyme Disease (from American Journal of Clinical Pathology : 168-174) If you test positive, how likely are you to have Lyme disease? Probability: Understanding uncertainty Example: Detecting Lyme Disease (from American Journal of Clinical Pathology : 168-174) If you test positive, how likely are you to have Lyme disease? PSY201: Introduction to Quantitative Research in Psychology 1 Lectures: Mondays 9-11 Tutorials: Tuesdays and Wednesdays Instructor: Prof. Keisuke Fukuda (Office hour: Mondays 11-12 @ CCT4067) Probability Distribution: Discrete Variable Probability Distribution of variable x: a model that describes the long-run behavior of a variable x. Probability Distribution: Discrete Variable Probability Distribution of variable x: a model that describes the long-run behavior of a variable x. Example: Does your iPod really play favorite? (American Statistician : 263-268) Let’s assume you have 3000 songs in your iPod, of which 50 are made by your favorite artist. When you make a random playlist of 20 songs by shuffling the songs, how likely would the playlist contain the song by your favorite artist? sum of all possibilities = 1 P(x): probability that the 20-song playlist contains x songs by your favorite artist How likely does your random playlist contain at least one song by your favorite artist? Probability Distribution: Discrete Variable Probability Distribution of variable x: a model that describes the long-run behavior of a variable x. Example: Does your iPod really play favorite? (American Statistician : 263-268) Let’s assume you have 3000 songs in your iPod, of which 50 are made by your favorite artist. When you make a random playlist of 20 songs by shuffling the songs, how likely would the playlist contain the song by your favorite artist? P(x≥1) = 1 -.7138 =.2862 How likely does your random playlist contain at least one song by your favorite artist? Probability Distribution: Discrete Variable Probability Distribution of variable x: a model that describes the long-run behavior of a variable x. Probability Distribution: Discrete Variable Probability Distribution of variable x: a model that describes the long-run behavior of a variable x. Example: In a hypothetical country, Any individual is allowed to take a driving test up to 4 times, after which they are not allowed to take the test. Here, let x = the number of attempts made by a randomly selected individual p(x) = a probability that an individual took the test x times. What is the expected value of the number of tests taken by a random individual? Probability Distribution: Discrete Variable Probability Distribution of variable x: a model that describes the long-run behavior of a variable x. Example: In a hypothetical country, Any individual is allowed to take a driving test up to 4 times, after which they are not allowed to take the test. Here, let Probability Distribution: Discrete Variable Probability Distribution of variable x: a model that describes the long-run behavior of a variable x. Probability Distribution: Discrete Variable Probability Distribution of variable x: a model that describes the long-run behavior of a variable x. Example: In a hypothetical TV factory, There are two flat screen suppliers with known distribution of the number of flaws in their product where: x (or y) = number of flaws p(x) (or p(y)) = probability that a flat screen has x (or y) flaws Which supplier would you chose? Probability Distribution: Discrete Variable Probability Distribution of variable x: a model that describes the long-run behavior of a variable x. Example: In a hypothetical TV factory, Variance is different! Mean is the same! Probability Distribution: Continuous Variable Probability Distribution of variable x: a model that describes the long-run behavior of a variable x. Probability Distribution: Continuous Variable Probability Distribution of variable x: a model that describes the long-run behavior of a variable x. Probability Distribution: Continuous Variable Probability Distribution of variable x: a model that describes the long-run behavior of a variable x. Probability Distribution: Continuous Variable Probability Distribution of variable x: a model that describes the long-run behavior of a variable x. Probability Distribution: Continuous Variable Probability Distribution of variable x: a model that describes the long-run behavior of a variable x. Probability Distribution: Continuous Variable Probability Distribution of variable x: a model that describes the long-run behavior of a variable x. Probability Distribution: Continuous Variable Probability Distribution of variable x: a model that describes the long-run behavior of a variable x. Normal distribution Normal distribution: Bell-shaped and symmetric distribution that approximates distributions of many variables in the real world. Normal distribution Normal distribution: Bell-shaped and symmetric distribution that approximates distributions of many variables in the real world. Standard Normal Distribution Standard normal distribution (or z curve): A normal distribution with the mean = 0 and the standard deviation = 1 σ=1 σ=1 -1 0 1 (μ) Standard Normal Distribution Standard normal distribution (or z curve): A normal distribution with the mean = 0 and the standard deviation = 1 Standard Normal Distribution Standard normal distribution (or z curve): A normal distribution with the mean = 0 and the standard deviation = 1 Example: What’s the area more extreme than z = 1.42? Standard Normal Distribution Standard normal distribution (or z curve): A normal distribution with the mean = 0 and the standard deviation = 1 Standard Normal Distribution Standard normal distribution (or z curve): A normal distribution with the mean = 0 and the standard deviation = 1 Standard Normal Distribution Standard normal distribution (or z curve): A normal distribution with the mean = 0 and the standard deviation = 1 Example: What’s the probability that z exceeds 1.96? Standard Normal Distribution Standard normal distribution (or z curve): A normal distribution with the mean = 0 and the standard deviation = 1 Example: How do we find the extreme 5% of the distribution? -z* = -1.96 z* = 1.96! Applying standard to other normal distributions More generally, Applying standard to other normal distributions Example: IQ distribution Applying standard to other normal distributions Example: How extreme is IQ = 130? Applying standard to other normal distributions Example: What’s the proportion of population with IQ between 75 and 125? Applying standard to other normal distributions Checking normality of the distribution (a.k.a normal QQ plot) Observed score The set of normal scores depends on the sample size of the observed scores. Normal score Checking normality of the distribution (a.k.a normal QQ plot) Sampling Distribution Example: In an imaginary upper-year seminar course with 20 students, Sampling Distribution Example: In an imaginary upper-year seminar course with 20 students, Let’s consider taking 4 sets of random samples with 5 observations each Sampling Distribution Example: In an imaginary upper-year seminar course with 20 students, If we were to create 50 random samples of n = 5, If we were to do this for all possible combinations of samples, n = 5, we will get the sampling distribution! Sampling Distribution of sample mean (x) Example: In an imaginary upper-year seminar course with 20 students, sample size = 5 sample size = 10 sample size = 20 μ = 8.25 μ = 8.25 μ = 8.25 As sample size increases, the variability of the sample mean shrinks, and the sample means better approximate the population mean! Sampling Distribution of sample mean (x) Example: Time to first goal in NHL game (American Statistician : 151-154) sample size = 5 sample size = 10 sample size = 30 μ = 13 μ = 13 μ = 8.25 13 The same holds true even when the population distribution is not normal! Also, as the sample size increases, the sampling distribution better approximates normal distribution! Sampling Distribution of sample mean (x) General properties of the sampling distributions of x 1. 2. Sampling Distribution of sample mean (x) General properties of the sampling distributions of x 3. 4. Sampling Distribution of sample mean (x) General properties of the sampling distributions of x Sampling Distribution of sample mean (x) When n is large (n ≥ 30) or the population distribution is normal, then the standardized variable (z) = has (at least approximately) a standard normal (z) distribution! Sampling Distribution of sample mean (x) Example: Fat content of Hot dogs A hotdog company claims that the average fat content of a hotdog is 18 grams with a standard deviation (σ) of 1gram. How do we know the company is telling the truth? 1. Purchase a bag of hotdog (n = 36 hotdogs). 2. Compute the sample mean (x = 18.4 grams) 3. Compute the standard deviation of sample means (σx) Sampling Distribution of sample mean (x) Example: Fat content of Hot dogs A hotdog company claims that the average fat content of a hotdog is 18 grams with a standard deviation (σ) of 1gram. How do we know the company is telling the truth? 1. Purchase a bag of hotdog (n = 36 hotdogs). 2. Compute the sample mean (x = 18.4 grams) 3. Compute the standard deviation of sample means (σx) 4. Compute how likely we would observe the sample mean if the company’s claim is correct! It’s < 1% chance to have observed the sample mean! ^ Sampling Distribution of sample proportion (p) sample proportion of successes ( ) Example: In an imaginary college (n = 19000) with 44% female students, the distribution of the sample proportion of female students in 500 samples are: sample size = 10 sample size = 25 sample size = 50 sample size = 100 μ = 0.44 μ = 0.44 μ = 0.44 μ = 0.44 ^ Sampling Distribution of sample proportion (p) sample proportion of successes ( ) Example: Considering an imaginary disease with prevalence of 7%, the distribution of the sample proportion of infected patients in 500 samples are: sample size = 10 sample size = 25 sample size = 50 sample size = 100 sample size = 50 sample size = 100 p = 0.07 p = 0.07 p = 0.07 p = 0.07 ^ Sampling Distribution of sample proportion (p) ^ General properties of the sampling distributions of p 1. 2. ^ Sampling Distribution of sample proportion (p) ^ General properties of the sampling distributions of p 1. 2. 3. ^ Sampling Distribution of sample proportion (p) ^ General properties of the sampling distributions of p and ^ Sampling Distribution of sample proportion (p) Example: For the imaginary disease with prevalence of 7%, a group of researchers claimed that they came up with a potential cure that reduces its prevalence. How do we know if they are telling the truth? 1. Randomly sample 200 individuals who received the cure (n = 200). ^ 2. Compute the sample proportion (p = 6/200 = 0.03) 3. Compute the standard deviation of sample proportion (σ^ p) ^ Sampling Distribution of sample proportion (p) Example: For the imaginary disease with prevalence of 7%, a group of researchers claimed that they came up with a potential cure that reduces its prevalence. How do we know if they are telling the truth? 1. Randomly sample 200 individuals who received the cure (n = 200). 2. ^ Compute the sample proportion (p = 6/200 = 0.03) 3. Compute the standard deviation of sample proportion (σ^ p) 4. Confirm that the distribution of sample proportion is approximately normal. ^ Sampling Distribution of sample proportion (p) Example: For the imaginary disease with prevalence of 7%, a group of researchers claimed that they came up with a potential cure that reduces its prevalence. How do we know if they are telling the truth? 1. Randomly sample 200 individuals who received the cure (n = 200). 2. ^ Compute the sample proportion (p = 6/200 = 0.03) 3. Compute the standard deviation of sample proportion (σ^ p) 4. Confirm that the distribution of sample proportion is approximately normal. 5. Compute how likely we would have observed the sample proportion if the treatment had no effect. It’s about 1% chance to have observed the sample proportion! PSY201: Introduction to Quantitative Research in Psychology 1 Lectures: Mondays 9-11 Tutorials: Tuesdays and Wednesdays Instructor: Prof. Keisuke Fukuda (O ce hour: Mondays 11-12 @ CCT4067) ffi Estimating Population from a Sample: Point estimate Unbiased statistics are a better statistics! Estimating Population from a Sample: Point estimate Unbiased statistic with a smaller standard deviation is a better statistic! Estimating Population from a Sample: Point estimate When estimating population mean, both sample median and sample mean are unbiased estimates. HOWEVER, sample mean has a smaller standard deviation. When estimating population variance, is a biased estimate. Instead, using n-1 as a denominator instead makes it unbiased. Therefore, the sample variance is a good estimate! Estimating Population from a Sample: Point estimate When estimating population mean, both sample median and sample mean are unbiased estimates. HOWEVER, sample mean has a smaller standard deviation. When estimating population variance, is a biased estimate. Instead, using n-1 as a denominator instead makes it unbiased. Therefore, the sample variance is a good estimate! Estimating Population from a Sample: Confidence Interval For example, 95% Con dence Interval will contain the population characteristic 95% of the time! fi Estimating Population from a Sample: Confidence Interval How do we compute 95% Con dence Interval using normally distributed and unbiased sample statistic (e.g., sample proportion)? fi Estimating Population from a Sample: Confidence Interval When your sample proportion estimate is unbiased and normally distributed, the 95% Con dence interval of the population proportion can be calculated as follows. fi Estimating Population from a Sample: Confidence Interval When 95% con dence interval was computed from 100 di erent random samples, ff fi Estimating Population from a Sample: Confidence Interval or Estimating Population from a Sample: Confidence Interval How about 99% Con dence Interval? -2.58 2.58 fi Estimating Population from a Sample: Confidence Interval Estimating Population from a Sample: Confidence Interval For more accurate con dence level, ^ use p ^ mod instead of p. fi Estimating Population from a Sample: Confidence Interval General formula of Con dence Interval Standard Error Bound on error fi Estimating Population from a Sample: Confidence Interval Example: In an imaginary country, A random set of 1031 adults were asked whether the university education was essential for success. Of those surveyed, 567 adults said yes. Whats’s 95% con dence interval for the proportion of entire adult population who believe so? ^ = 567/1031 =.55) 1. Compute the sample proportion (p. 2. Check whether we can safely use the con dence interval fi fi Estimating Population from a Sample: Confidence Interval Example: In an imaginary country, A random set of 1031 adults were asked whether the university education was essential for success. Of those surveyed, 567 adults said yes. Whats’s 95% con dence interval for the proportion of entire adult population who believe so? ^ 1. Compute the sample proportion (p. = 567/1031 =.55) 2. Check whether we can safely use the con dence interval 3. Compute 95% con dence interval fi fi fi Estimating Population from a Sample: Confidence Interval Deciding Sample size (n) based on Bounds of Error (B) for 95% con dence interval fi Estimating Population from a Sample: Confidence Interval Deciding Sample size (n) based on Bounds of Error (B) for 95% con dence interval Example: Can a dog sni out cancers? (Integrative Cancer Therapies : 1-10) If we want to estimate the likelihood of medical dog di erentiating a cancer tissue from a healthy tissue with a bound of error of.05 with 95% con dence level, what big should the sample size be? The answer is 385 (notice that we always round UP for sample size estimation) ff fi ff fi Estimating Population from a Sample: Confidence Interval Estimating Population from a Sample: Confidence Interval When population standard deviation (σ) is unknown… sigma needs to be replaced by sample standard deviation (s) Notice that it also changes the statistics from z to t because s changes from sample to sample but σ doesn’t ! Estimating Population from a Sample: Confidence Interval Estimating Population from a Sample: Confidence Interval As degrees of freedom (df) gets larger, the tale gets atter and the distribution better approximates z distribution. As a result, the critical t value for a speci c con dence interval changes depending on degrees of freedom! fi fi fl Estimating Population from a Sample: Confidence Interval identical to z critical values Estimating Population from a Sample: Confidence Interval Estimating Population from a Sample: Confidence Interval Example: Drive-through medicine? (Annals of Emergency Medicine : 268-273) This study found that by doing medical screening and simple check-up on u patients via drive-through, the processing time for patients became shorter than a conventional approach. Here is the data derived from the study. Sample size (n) = 38 Sample mean (x) = 26mins Sample standard deviation (s) = 1.57 What is the 95% con dence interval for the population mean of the processing time? 1. Identify degrees of freedom (df) df = n -1 = 38-1 = 37 fi fl Estimating Population from a Sample: Confidence Interval Example: Drive-through medicine? (Annals of Emergency Medicine : 268-273) This study found that by doing medical screening and simple check-up on u patients via drive-through, the processing time for patients became shorter than a conventional approach. Here is the data derived from the study. Sample size (n) = 38 Sample mean (x) = 26mins Sample standard deviation (s) = 1.57 What is the 95% con dence interval for the population mean of the processing time? 1. Identify degrees of freedom (df) 2. Identify the t critical value By using a look-up table, t critical value for df = 37 is approximately 2.02. fi fl Estimating Population from a Sample: Confidence Interval Example: Drive-through medicine? (Annals of Emergency Medicine : 268-273) This study found that by doing medical screening and simple check-up on u patients via drive-through, the processing time for patients became shorter than a conventional approach. Here is the data derived from the study. Sample size (n) = 38 Sample mean (x) = 26mins Sample standard deviation (s) = 1.57 What is the 95% con dence interval for the population mean of the processing time? 1. Identify degrees of freedom (df) 2. Identify the t critical value 3. Compute the CI! fi fl Estimating Population from a Sample: Confidence Interval Example: Are Chimp Charitable? (Newsday, November 2, 2005) A research study examined how often chimpanzees chose an action that also provides food to a neighbour over an action that only provides food to themselves. Here is the data derived from the study. For seven chimpanzee’s tested, the number of times they chose charitable action out of 36 trials are: What is the 95% con dence interval for the number of charitable actions? 1. Check normality of distribution! fi Estimating Population from a Sample: Confidence Interval Example: Are Chimp Charitable? (Newsday, November 2, 2005) A research study examined how often chimpanzees chose an action that also provides food to a neighbour over an action that only provides food to themselves. Here is the data derived from the study. For seven chimpanzee’s tested, the number of times they chose charitable action out of 36 trials are: What is the 95% con dence interval for the number of charitable actions? 1. Check normality of distribution! 2. Identify degrees of freedom (df) df = n -1 = 7-1 = 6 fi Estimating Population from a Sample: Confidence Interval Example: Are Chimp Charitable? (Newsday, November 2, 2005) A research study examined how often chimpanzees chose an action that also provides food to a neighbour over an action that only provides food to themselves. Here is the data derived from the study. For seven chimpanzee’s tested, the number of times they chose charitable action out of 36 trials are: What is the 99% con dence interval for the number of charitable actions? 1. Check normality of distribution! 2. Identify degrees of freedom (df) 3. Identify the t critical value By using a look-up table, t critical value for df = 6 is approximately 3.71. fi Estimating Population from a Sample: Confidence Interval Example: Are Chimp Charitable? (Newsday, November 2, 2005) A research study examined how often chimpanzees chose an action that also provides food to a neighbour over an action that only provides food to themselves. Here is the data derived from the study. For seven chimpanzee’s tested, the number of times they chose charitable action out of 36 trials are: What is the 95% con dence interval for the number of charitable actions? 1. Check normality of distribution! 2. Identify degrees of freedom (df) 3. Identify the t critical value 4. Compute the CI! fi Estimating Population from a Sample: Confidence Interval Deciding Sample size (n) based on Bounds of Error (B) for 95% con dence interval fi Estimating Population from a Sample: Confidence Interval Deciding Sample size (n) based on Bounds of Error (B) for 95% con dence interval Example: In an imaginary University, a student wants to estimate the average cost of textbooks per semester with a bound of error (B) of $20. Although the student does not know the standard deviation of the textbook costs, he is aware that the textbook cost varies widely and he estimates its range to be $50 to $450. How large should the sample of students the student survey to compute the 95% con dence interval for a population mean of the cost of textbooks per semester? 1. Compute the range/4 as an alternative to σ fi fi Estimating Population from a Sample: Confidence Interval Deciding Sample size (n) based on Bounds of Error (B) for 95% con dence interval Example: In an imaginary University, a student wants to estimate the average cost of textbooks per semester with a bound of error (B) of $20. Although the student does not know the standard deviation of the textbook costs, he is aware that the textbook cost varies widely and he estimates its range to be $50 to $450. How large should the sample of students the student survey to compute the 95% con dence interval for a population mean of the cost of textbooks per semester? 1. Compute the range/4 as an alternative to σ 2. Compute the required sample size. The answer is 97 (notice that we always round UP for sample size estimation) fi fi Estimating Population from a Sample: Confidence Interval Things to look out for in the statistics in the real world! Estimating Population from a Sample: Confidence Interval Things to look out for in the statistics in the real world! Estimating Population from a Sample: Confidence Interval Things to look out for in the statistics in the real world! Estimating Population from a Sample: Confidence Interval Things to look out for in the statistics in the real world! Estimating Population from a Sample: Confidence Interval Things to look out for in the statistics in the real world! Estimating Population from a Sample: Confidence Interval Things to look out for in the statistics in the real world! PSY201: Introduction to Quantitative Research in Psychology 1 Lectures: Mondays 9-11 Tutorials: Tuesdays and Wednesdays Instructor: Prof. Keisuke Fukuda (O ce hour: Mondays 11-12 @ CCT4067) ffi Hypothesis Testing Hypothesis Testing Hypothesis Testing Hypothesis Testing Hypothesis Testing Example: In an imaginary hospital, a team of researchers came up with a new laser treatment of tumors. Consider the following scenario. Upper-tail Test (One tail) Hypothesis Testing Example: In an imaginary hospital, a team of researchers came up with a new laser treatment of tumors. Consider the following scenario. Lower-tail Test (One tail) Hypothesis Testing Example: In an imaginary tennis ball factory, you are a quality control manager, and your job is to test whether the new machine properly calibrated to produce the tennis ball with a population mean diameter of 3 inches. Two-tail Test Hypothesis Testing: Types of Errors Hypothesis Testing: Types of Errors Hypothesis Testing: Types of Errors Hypothesis Testing: Types of Errors 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 Alpha Alpha 0 0 -4 -2 0 2 4 6 8 10 -4 -2 0 2 4 6 8 10 calculated t-value calculated t-value Hypothesis Testing: Types of Errors 0.4 0.4 0.3 0.3 0.2 0.2 0.1 Beta 0.1 Alpha Alpha Beta 0 0 -4 -2 0 2 4 6 8 10 -4 -2 0 2 4 6 8 10 calculated t-value calculated t-value When Alpha increases, Beta decreases! Hypothesis Testing: Types of Errors Type 1 Type 2 Hypothesis Testing Hypothesis Testing Hypothesis Testing: Large-Sample hypothesis test for population proportion Hypothesis Testing: Large-Sample hypothesis test for population proportion Example: In an imaginary University, a third of the entire students is believed to have heard their classmates commit academic o ense. In one Psychology course, 58 out of 171 enrolled students admitted that their classmates committed academic o ense. Assuming that this sample is representative of the students at this University, does this sample provide convincing evidence that the more than a third of the students commit academic o ense at this university? 1. Describe the population characteristics of interest 2. State the null hypothesis 3. State the alternative hypothesis 4. Select the signi cance level alpha p: probability that a student commits academic o ence alpha = 0.05 fi ff ff ff ff Hypothesis Testing: Large-Sample hypothesis test for population proportion Example: In an imaginary University, a third of the entire students is believed to have heard their classmates commit academic o ense. In one Psychology course, 58 out of 171 enrolled students admitted that their classmates committed academic o ense. Assuming that this sample is representative of the students at this University, does this sample provide convincing evidence that the more than a third of the students commit academic o ense at this university? 1. Describe the population characteristics of interest 2. State the null hypothesis 3. State the alternative hypothesis 4. Select the signi cance level alpha 5. Determine the test statistics to be used 6. Check the assumptions! fi ff ff ff Hypothesis Testing: Large-Sample hypothesis test for population proportion Example: In an imaginary University, a third of the entire students is believed to have heard their classmates commit academic o ense. In one Psychology course, 58 out of 171 enrolled students admitted that their classmates committed academic o ense. Assuming that this sample is representative of the students at this University, does this sample provide convincing evidence that the more than a third of the students commit academic o ense at this university? 1. Describe the population characteristics of interest 2. State the null hypothesis 3. State the alternative hypothesis 4. Select the signi cance level alpha 5. Determine the test statistics to be used 6. Check the assumptions! 7. Compute the statistics 8. Determine the P-value fi ff ff ff Hypothesis Testing: Large-Sample hypothesis test for population proportion Example: In an imaginary University, a third of the entire students is believed to have heard their classmates commit academic o ense. In one Psychology course, 58 out of 171 enrolled students admitted that their classmates committed academic o ense. Assuming that this sample is representative of the students at this University, does this sample provide convincing evidence that the more than a third of the students commit academic o ense at this university? 1. Describe the population characteristics of interest 2. State the null hypothesis 3. State the alternative hypothesis 4. Select the signi cance level alpha 5. Determine the test statistics to be used 6. Check the assumptions! 7. Compute the statistics Since p-value > α, 8. Determine the P-value we failed to reject the null hypothesis that a third of the students commit academic o ense. 9. State the conclusion! fi ff ff ff ff Hypothesis Testing: Large-Sample hypothesis test for population proportion Example: In an imaginary country, 61% of high-school graduates pursue university degree after graduation. In one state, 55% of 1500 high-school graduates pursued university degree. Can we reasonably conclude that this state’s graduation statistics is di erent from the norm of the country? 1. Describe the population characteristics of interest 2. State the null hypothesis 3. State the alternative hypothesis 4. Select the signi cance level alpha fi ff Hypothesis Testing: Large-Sample hypothesis test for population proportion Example: In an imaginary country, 61% of high-school graduates pursue university degree after graduation. In one state, 55% of 1500 high-school graduates pursued university degree. Can we reasonably conclude that this state’s graduation statistics is di erent from the norm of the country? 1. Describe the population characteristics of interest 2. State the null hypothesis 3. State the alternative hypothesis 4. Select the signi cance level alpha 5. Determine the test statistics to be used 6. Check the assumptions! fi ff Hypothesis Testing: Large-Sample hypothesis test for population proportion Example: In an imaginary country, 61% of high-school graduates pursue university degree after graduation. In one state, 55% of 1500 high-school graduates pursued university degree. Can we reasonably conclude that this state’s graduation statistics is di erent from the norm of the country? 1. Describe the population characteristics of interest 2. State the null hypothesis 3.

Use Quizgecko on...
Browser
Browser