Chapter 1 & 2 Introduction to Statistics PDF
Document Details
![SeamlessMoonstone8761](https://quizgecko.com/images/avatars/avatar-16.webp)
Uploaded by SeamlessMoonstone8761
UC San Diego
Tags
Summary
This document provides an introduction to statistics, covering basic concepts like descriptive and inferential statistics. It touches upon the importance of statistics and introduces various types of data analysis techniques.
Full Transcript
Chapter 1 Introduction to Statistics Outline The Definition of Statistics Descriptive vs. Inferential Statistics The Basics of Experiments Scales of Measurement Classification of Variables Order of Operations...
Chapter 1 Introduction to Statistics Outline The Definition of Statistics Descriptive vs. Inferential Statistics The Basics of Experiments Scales of Measurement Classification of Variables Order of Operations 2 The Definition of Statistics Statistics: 1. A branch of mathematics that deals with gathering, organizing, analyzing, interpreting, and presenting data 2. Procedures for analyzing data 3. Numbers that represent sample data 3 Statistics are important because they are part of what makes psychology a science Science: a body of knowledge, a field, or an approach to studying phenomena that produces verifiable results Statistics are important because they influence the conclusions we draw from data Data: scores, measurements, or observations that are typically numeric (data is plural; datum is singular) 4 Descriptive vs. Inferential Statistics There are two general types of procedures in statistics: 1. Descriptive statistics: procedures for organizing and summarizing information Descriptive statistics (covered in Ch. 2 to 4) typically include: Percentages, e.g., 64% of people voted in favor of the policy Measures of central tendency, e.g., humans’ mean height is 5’4” Measures of dispersion, e.g., test scores ranged from 15 to 50 5 2. Inferential statistics: procedures to draw conclusions about a population based on sample data from that population Inferential statistics (covered in Ch. 8 to 17) include t tests, F tests, correlation, regression, chi-square tests, etc. Examples of questions that are answered with inferential statistics: Are personality traits related to physical exercise? Do verbal skills differ between boys and girls? Is there a heritable component to schizophrenia? 6 Thus, scientists are interested in populations Population: a set of all individuals, items, or data of interest E.g., all people with a gambling addiction, all automobiles in California, all scores on the Scholastic Aptitude Test (SAT), etc. It’s often not possible to study everyone in a population so we study a subset of the population, i.e., a sample, and generalize findings to the population 7 Sample: a set of selected individuals, items, or data selected from a population of interest E.g., 30 people with a gambling addiction, 100 automobiles in California, 300 scores on the SAT, etc. A group can be a sample in one context and a population in another, e.g., all cars in CA can be a population, or a sample of cars in the U.S. 8 Parameter: a characteristic (usually numeric) that describes a population E.g., if all SAT scores are the population, then the average of all SAT scores (which is approximately 1050) is a parameter Statistic: a characteristic (usually numeric) that describes a sample E.g., if the SAT scores of 30 students is the sample, then the average SAT score of this sample is a statistic 9 To accurately draw a conclusion about a population from sample data (i.e., to generalize findings from a sample to a population), the sample must reflect or resemble the population, i.e., not be biased E.g., to assess the general population’s view on increasing taxes to address homelessness, you wouldn’t want a biased sample that doesn’t represent the population’s concern for homelessness Random sampling is the best way to obtain a representative sample Random sample: A sample in which each member of the population has an equal chance of inclusion 10 How is a truly random sample obtained? Assign everyone in the population a number, then use a random number generator or table of random numbers to select a sample of numbers. Those with the selected numbers become the sample. But this is rarely feasible/possible in the behavioral sciences. Thus, other sampling procedures are used which are acceptable if they create representative samples 11 Use records, locations, incentives, etc. that provide representative samples. As examples: State records (health, education, police records, etc.) are more representative than county records DMVs are locations with more representative samples of people than luxury car dealerships Incentivizing research participation with money produces representative samples more so than offering jazz concert tickets 12 The Basics of Experiments Scientists often conduct experiments on samples of subjects (Ss) or participants to study phenomena, i.e., to research questions Ss refers to subjects and participants interchangeably in this class E.g., consider the research question: Does sugar cause hyperactivity in children? 13 You could answer the question with the following steps: Obtain a random sample of children Randomly assign them to eat either sugar or artificial sweetener After eating, measure hyperactivity by using a pedometer Compare the hyperactivity between groups Control other variables, e.g., food eaten before the experiment If there is a difference in hyperactivity between groups, you may conclude sugar causes hyperactivity. 14 Experiment: a test for inferring causation that includes: Random assignment of Ss to experimental and control conditions Manipulation of in independent variable (IV) Measurement of a dependent variable (DV) Control of extraneous variables Comparison of conditions’ scores 15 Variables: any factor that varies E.g., height, weight, eye color, aggression, apathy, attraction, etc. Independent variable (IV): The variable that is manipulated in a study E.g., if you test if sugar causes hyperactivity, sugar is the IV Dependent variable (DV): The variable that is measured in a study E.g., hyperactivity in the previous example is the DV 16 Experimental condition: the group that receives the IV (i.e., the treatment condition) E.g., in the sugar study, those who get sugar are in the experimental condition Control condition: the group that does not receive the IV (i.e., the non-treatment condition) E.g., in the sugar study, those in the control condition get artificial sweetener 17 Extraneous variable: any variable other than the IV and DV E.g., if you test the effect of sugar on hyperactivity, food consumption before the study is an extraneous variable Controlling extraneous variables is important because if there is a difference between the groups in the DV, you’ll know it’s because of the IV 18 Random assignment: the allocation of Ss to conditions by a random process so that Ss have an equal chance of being assigned to any condition How is random assignment performed? E.g., use a random number generator. If the number is even, the Ss goes in the experimental condition. If it’s odd, the Ss goes in the control condition. 19 Random assignment is important because it helps ensure that any difference between the groups in the DV is due to the IV E.g., some Ss are naturally more hyperactive. Random assignment helps ensure that naturally hyperactive Ss are distributed relatively equally between groups. Thus, any remaining difference between groups in hyperactivity can be attributed to sugar 20 Random assignment is important also because it helps ensure that if there is no difference in the DV between groups, it’s not because Ss self-selected themselves to one of the conditions E.g., examine if Ss being separated from their cell phones for 8 hours worsens their mood (IV = cell phone separation; DV = mood). Experimental condition = Ss have no cell phone for 8 hours Control condition = Ss have cell phones with them as usual 21 Continued from previous slide: If you let Ss choose their condition, i.e., don’t perform random assignment, those who're addicted to their phone may choose the control condition If so, you may not see a difference between conditions in mood, and wrongly conclude that such separation doesn't impact mood 22 Scales of Measurement Measurement scales are used to quantify or categorize variables The four scales include: Nominal scales Ordinal scales Interval scales Ratio scales 23 Nominal scale: a scale that categorizes items or individuals, and numbers reflect categories or labels and not order or quantity E.g., language, ethnicity, gender, eye color, etc. are measured with nominal scales Nominal data can be coded, i.e., converted to numbers, e.g., language can be coded as English = 1, Spanish = 2, Mandarin = 3, etc. Some nominal values are already numbers, e.g., zip codes, social security numbers, license plate numbers, etc. 24 Ordinal scale: a scale in which numbers convey order* or rank—that some value is greater or less than another value E.g., 1st, 2nd, 3rd place in a race E.g., A, B, C, D, F letter grades in a course Ordinal scales make no assumptions about the magnitude of differences between points on the scale *Time-based orders, such as seasons, are measured on nominal scales. One season is not greater than another season. 25 Interval scale: a scale in which numbers are (or are assumed to be) equidistant and do not have a true zero E.g., temperature E.g., a satisfaction scale ranging from 1 (completely unsatisfied) to 7 (completely satisfied) A true zero reflects an absence of the variable. E.g., temperature is measured on an interval scale; 0 degrees ≠ no temperature. 26 Ratio scale: a scale with a true zero point and equally distributed units E.g., hormone level, reaction time, working memory capacity, etc. 27 Classification of Variables Variables can be continuous or discrete Continuous variable: a variable that theoretically can be quantified to any number of decimal points E.g., height, adrenaline, conscientiousness, etc. Discrete variable: a variable measured in whole units or categories that are not distributed along a continuum E.g., sex chromosomes, marital status, number of TVs/home, etc. 28 Variables can be also be quantitative or qualitative Quantitative variable: a variable that varies by amount Measurements reflect amounts or quantities E.g., height, adrenaline, number of TVs/home, etc. Qualitative variable: a variable that varies by category Measurements reflect categories or qualities E.g., sex chromosomes, marital status, gender, etc. 29 Quantitative variables are continuous or discrete E.g., food consumed in calories is continuous E.g., food consumed in pieces such as slices is discrete Qualitative variables are necessarily discrete E.g., the type of anxiety disorder (obsessive compulsive disorder, generalized anxiety disorder, phobias, etc.) is discrete 30 31 32 Order of Operations Statistics involve calculations Σx = 1 + 2 + 3 = 6 that follow an order: Σx2 = 1 2 + 22 + 3 2 1. Parentheses = 1 + 4 + 9 = 14 2. Square 3. Sigma (Σ) means “sum” (Σx)2 = (1 + 2 + 3)2 = (6)2 = 36 E.g., x y 1 4 2 5 3 6 33 Chapter 2 Frequency Distributions Outline The Definition of Frequency Distributions Examples of Frequency Distributions o Pictogram o Histogram o Frequency polygon o Ogive o Bar graph o Pie chart 35 The Definition of Frequency Distributions Skim the data set on the right Details like the highest and lowest scores and most common scores are not readily apparent So, create a frequency distribution 36 Frequency distribution: a summary display that presents how often a category, score, or range of scores occurs It’s useful because it efficiently communicates important information, e.g., the most frequency occurring scores 37 Data on the left is presented in a frequency distribution on the right 38 Examples of Frequency Distributions Frequency distributions are typically in the form of a table or graph Pictogram/pictograph: a summary display that uses symbols or illustrations to reflect the data being reported 39 A pictogram of the % of U.S. adults who are overweight and obese 40 Histogram: a graph displaying the frequency of continuous data with rectangles that are adjacent at the boundaries of each interval 41 Frequency polygon: a graph that displays the frequency of continuous data with a line that connects dots plotted at the midpoints of intervals 42 Ogive: a graph that displays cumulative percentages or frequencies with a line that connects dots plotted at the upper boundaries of intervals 43 Stem-and-leaf display: a display of all scores where common digits shared by scores are listed to the left (in the stem) and remaining digits are listed to the right (in the leaf) 44 Stem-and-leaf display example for the scores: 12, 14, 10, 47, 33, 23, 16, 52, 24, 32, 26, 44, 42, 46, 29, 19, 11, 50, 30, 15 Organize the scores: Create the stem-and-leaf display: 10, 11, 12, 14, 15, 16 23, 24, 26, 29 30, 32, 33 42, 44, 47, 47 50, 52 45 Given the following hypothetical values (unrelated to the previous slide): 1.20, 1.23, 1.24, 1.28, 2.22, 2.25 Stem-and-leaf displays can have multiple digits in the stem: or multiple digits in the leaf: 46 Bar chart: a graph that displays the frequency of discrete and categorical data with rectangles Bar chart criteria: 1. Each rectangle represents a whole unit or class 2. Rectangle heights represent frequencies of units/classes 47 Bar chart example 48 Pie chart: a circle with sectors that represent the relative percentage of discrete and categorical data 49 How data are presented affect interpretation, e.g., consider the following graph of U.S. unemployment rates in 2019: 50 The previous slide’s data with the y-axis beginning at 0, as shown below, is less dramatic: 51