PRII Basic Statistics PDF
Document Details
Uploaded by CostSavingGlockenspiel
Karen Collins Ramos
Tags
Summary
This presentation covers basic statistics for practical research. It discusses the role of statistics in research, different types of data, and the process of research. The presentation is focused on a basic understanding and not a detailed research study.
Full Transcript
Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Basic...
Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Basic Practical Research Statistics Karen Collins Ramos Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research “In Wrong Shui life is seen as a cosmic journey, a struggle to overcome unseen and unexpected obstacles at the end of which the traveler will find illumination and enlightenment. Replicate Practical Research this quest in your home by moving light switches away from doors and over to the far side of each room.” Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research RESEARCH Practical Research INVOLVES GATHERING OF INFORMATION Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research To make sense of data Practical Research Statistics serve two general purposes: Used to organize and summarize the information so that 01 the researcher can see what happened in the study and communicate the result to others Help the researcher answer the general questions that 02 initiated the research by determining exactly what conclusions are justified based on the results that were obtained STATISTICS DEFINITION Refers to a set of mathematical procedures for organizing, summarizing, and interpreting information Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research STATISTICS Practical Research Standardized Techniques Recognized & understood throughout Practical Research the scientific community Transfer of knowledge Like currency, statistical methods used by one researcher will be Practical Research familiar to other researchers Practical Research 01 Types & Levels Practical Research of Data Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research PROCESS POSE TARGET POPULATION SAMPLE QUESTION POPULATION About a population Actual Usually large in Narrowed down participants or number units in the study Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Interest in Practical Research Specific characteristics of individuals in the population Outside factors that may influence the individuals Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research What is something that changes or Practical Research can have different values? Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research A characteristic or condition that changes or has different values for different individuals Practical Research NOTE! To demonstrate changes in variables, it is necessary to make MEASUREMENTS of the variables being examined MEASUREMENTS FORMAL DEFINITION Datum (singular) is a single measurement or observation, commonly called a score or raw score Data (plural) is a collection of of measurements of observations EXAMPLE Measurement of each individual = datum, score, or raw score Complete set of scores = data, data set DATUM MEAN, DATA MEAN IN STATISTICS In statistics, there are The nature of data plays a Mostly, every sampled data various kinds of data: vital role in the field of belongs to either categorical discrete & continuous, levels statistics or numerical of measurement 1 2 3 Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Is it necessary to distinguish whether the data come from a population or sample? Practical Research PARAMETERS & STATISTICS A value that describes a A value that describes a population. It is usually sample. A statistic is usually derived from measurements derived from measurements of the individuals in the of the individuals in the population. sample. PARAMETER STATISTIC PARAMETERS & STATISTICS Average IQ of the population Average IQ of the sample PARAMETER STATISTIC Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research In what way can data types guide Practical Research research endeavors? Practical Research OPERATIONAL DEFINITION ✘ a universally-accepted meaning that is clear to all associated with an analysis. Without operational definitions, confusion can occur. ✘ How a researcher decides to measure the variable in the study DATA TYPES DATA Qualitative Quantitative (descriptive/categorical) (numerical) Nominal Ordinal Discrete Continuous 26 QUALITATIVE DATA ✘ Non-numerical in nature (but can be coded) ✗ Ex. Low = 1, medium = 2, high = 3 ✘ Could be considered a label in some cases ✗ Ex. Political affiliation (dem, rep, ind) ✗ ID number ✗ Education level (HS, 2-yr, 4-yr, MS, PhD) ✗ Numbers on a basketball uniform #90 isn’t larger than #45 in the mathematical sense 27 QUALITATIVE DATA ✘ Can’t be used meaningfully in a computation ✗ Can you take the average of observed political affiliations? ✘ If variable is represented by numbers, as with IDs, ask yourself if an average makes sense. If not, then it’s qualitative. 28 EXAMPLE QUANTITATIVE QUALITATIVE ✘ Number of medals ✘ Medal type: won by the US in a ✗ Gold/silver/bronze given year 29 TYPES OF DATA 1 DISCRETE DATA 2 CONTINUOUS DATA QUANTITATIVE DATA DISCRETE DATA CONTINUOUS DATA ✘ Can only take on ✘ Can take any value within a certain values (whole range or in some interval numbers) ✘ Measurable ✘ Countable ✘ Contains finite values with nothing in between DISCRETE DATA A fun rule of thumb is that, in many cases, discrete data can be preceded by “the number of.” Example: ✘ Number of customers who bought different items ✘ Number of students who buy food from the canteen ✘ Number of items you buy at the grocery store each week CONTINUOUS DATA Infinite possibilities Example: ✘ Weight of newborn babies ✘ Daily wind speed ✘ Temperature of a freezer ✘ Height of students in class ✘ Distance travelled between the MPH and the G12 classrooms CHALLENGE IDENTIFY IF VARIABLE IS DISCRETE OR CONTINUOUS 34 1. Pace of a triathlete in the swim leg of a triathlon competition. 2. The time it takes a student selected at random to register for the UPCAT. 3. Bouncing checks received by BDO on a day selected at random. 4. The cumulative weight of all the animals in Manila Zoo. 5. The amount of gasoline needed to drive your car 200 miles. 35 6. Traffic injuries per hour in an orthopedic hospital. 7. Distance of the Ninoy Aquino Airport to the Tao Yuan Airport to the last kilometer. 8. The distance a golf ball travels after being hit with a golf club. 9. Coldness or warmness in a 25 sq. meter room filled with 50 people. 10. Your weight before breakfast each morning. 36 LEVELS OF MEASUREMENT ✘ Stevens (1946) – assignment of numerals to things so as to represent facts and conventions about them ✘ The estimation or discovery of the ratio of some magnitude of a quantitative attribute to a unit of the same attribute LEVELS OF MEASUREMENT To understand nature of data 1. Levels guide interpretations of the differences between the values ✘ when comparing the values, the level of measurement characterizes the numerical differences between the values ✘ Indicates whether difference is arbitrary, relative, or equidistant LEVELS OF MEASUREMENT To understand nature of data 2. Determines the appropriate statistical analyses one may perform on the data ✘ Helps researchers interpret their data correctly and avoid meaningless analyses LEVELS OF MEASUREMENT 1 3 NOMINAL INTERVAL 2 4 ORDINAL RATIO LEVELS OF MEASUREMENT NOMINAL ✘ Categorization ✘ Assign values to variables to organize them into groups within data sets ✘ Difference between values are arbitrary Ex. Political party affiliation Conditions for Nominal Data ✘ EXHAUSTIVE ✗ categorize all cases appropriately The case of “none” or “other” ✘ MUTUALLY EXCLUSIVE ✗ Means a researcher can classify a case only into a single category Urban, suburban, rural and farm 42 LEVELS OF MEASUREMENT ORDINAL ✘ Indicate rank order of cases ✘ Know relative difference between values and order them appropriately ✘ Intervals between values are inconsistent Ex. Likert scale Conditions for Ordinal Data ✘ Know relative difference among the values and order them appropriately ✘ Allows a researcher to make accurate judgments about a value assigned to a variable compared to another value assigned to the same variable 44 LEVELS OF MEASUREMENT INTERVAL ✘ Measurements that have consistent distances between values ✘ Uses standard uses of measurement or metric (vs ordinal) ✘ No absolute zero, zero is an arbitrary point on the scale Ex. temperature Interval Level ✘ IQ tests ✗ Don’t have meaning for a 0 IQ ✗ A 120 IQ is not twice as intelligent as a 60 IQ ✘ Calendar years ✗ An interval of one calendar year (2005-2006, 2014-2015) always has the same meaning ✗ Ratios do not make sense because saying year 0 does not mean beginning of time 46 Interval Level ✘ To be an interval measurement, each sequential difference should represent the same quantitative change 1 5 4 3 2 Strongly Strongly Agree Neither Disagree Disagree Agree 47 LEVELS OF MEASUREMENT RATIO ✘ Has absolute zero ✘ Differences between values and ratios are meaningful Ex. Monetary quantity, mass, length, electrical current Ratio Level ✘ Makes multiplication and division are possible ✘ Other examples: income, years married, reaction time 49 SUMMARY RATIO Absolute zero INTERVAL Distance is meaningful ORDINAL Attributes can be ordered NOMINAL Attributes are only named 50 PRACTICE A. Determine level of measurement for the data/variable provided. 51 Determine level of measurement for the ff: 1. Money spent during a hangout at Starbucks 2. City of birth 3. Direction measured in degrees from true north 4. Rating of fiction books Excellent, good, fair, poor 52 5. Favorite color 6. Years of work experience 7. Heat measured in degrees centigrade 8. SAT score 9. Writing proficiency classification -proficient, basic, beginner 10. GPA needed to enter Princeton U. 53 11. Score on a 10-point quiz measuring knowledge of algebra 12. Years that a senator served in office 13. A movie critic lists top 50 greatest movies about aliens 14. Cumulative weight of everyone in the MPH right now 15. Level of education of social media influencers 54 16. Number of slides in this presentation 17. Final letter grade for students in a Math class 18. Amount of calories in a Starbucks frappe 19. Dog breeds 20. Milligrams of tar in a cigarette 55 Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research How do you determine the correct sample size? Practical Research Practical Research 02 Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research CLASSIFICATION 1 2 DESCRIPTIVE INFERENTIAL DESCRIPTIVE STATISTICS Statistical procedures used to summarize, organize, and simplify data. DESCRIPTIVE STATISTICS Techniques that take raw scores and organize or summarize them in a form that is more manageable Often, the scores are organized in aa table or a graph so that it is possible to see the entire set of scores Another common technique to summarize a set of scores is by computing an average (note that even if the data set has hundreds of scores, the average provides a single descriptive value for the entire set) INFERENTIAL STATISTICS Consist of techniques that allow us to study samples and then make generalizations about the population from which they were selected INFERENTIAL STATISTICS Because populations are typically very large, a sample is selected to represent the population By analyzing the results from the sample, we hope to make general statements about the population Researchers often use SAMPLE STATISTICS as the basis for drawing conclusions about the POPULATION PARAMETERS ○ One problem: sample provides only limited information about the population ○ Although samples are generally representative of their populations, a sample is not expected to give a perfectly accurate picture of the whole population Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research What do you call the discrepancy between a sample statistic and the corresponding population parameter? Practical Research SAMPLING ERROR Discrepancy or amount of error that exists between a sample statistic and the corresponding population parameter DATA GATHERING 1: GRADE 12 forensic breakthrough ACTUAL SCENARIO 01 02 Population Respondents forensic breakthrough MIIS Grade 12 Batch 2025 Total people who made an attempt to n=49 answer between 1-5 surveys n=49 03 04 Deleted Data Study Sample Respondents who did not Respondents included in the forensic breakthrough complete the battery of tests study n=6 n=43 DATA GATHERING 1: GRADE 12 ACTUAL SCENARIO PERMA PROFILER POSITIVE 01 05 ACCOMPLISHMENT EMOTION M = 6.11 M = 5.99 M = 5.93 M = 5.43 03 ENGAGEMENT 02 RELATIONSHIPS 04 MEANING M = 7.66 M = 6.75 M = 6.19 M = 7.54 M = 5.74 M = 5.19 DATA GATHERING 1: GRADE 12 forensic breakthrough HYPOTHETICAL SCENARIO 01 02 Population Respondents Convenience Sampling forensic breakthrough MIIS Grade 12 Batch 2025 n=49 15 respondents each for 12A and 12B Survey given to first 15 students to exit 1:10pm Monday class 03 04 Deleted Data Study Sample Respondents who did not Respondents included in the forensic breakthrough complete the battery of tests study n=0 n=30 Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Would there be a possible difference in the means of the 2 parts of the sample taken? Practical Research DATA GATHERING 1: GRADE 12 ACTUAL SCENARIO PERMA PROFILER POSITIVE 01 05 ACCOMPLISHMENT EMOTION 12A m = ? 12A m = ? 12B m = ? 12B m = ? 03 ENGAGEMENT 02 RELATIONSHIPS 04 MEANING 12A m = ? 12A m = ? 12A m = ? 12B m = ? 12B m = ? 12B m = ? SCENARIO Divide class in Draw a line from front to back, 01 two through the middle of the room Compute average PERMA Subscales (Positive 02 for a variable emotion, engagement, meaning, relationships, achievement) Will the two groups have exactly 03 Compare means the same average? Comparison PERMA MIIS G12 Validation PERMA PROFILER n=45 Study n=31,965 Positive Emotion 5.93 6.69 Engagement 7.54 7.25 Relationship 5.74 6.90 Meaning 5.43 7.06 Accomplishment 5.19 7.21 NOTE! The DIFFERENCE in the means you obtain does not necessarily mean that there is a SYSTEMATIC DIFFERENCE between the two groups (which we proved when we analyzed differences between mean scores of males & females in the Data Gathering Simulation) Practical Research 03 Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Practical Research Concerns of Descriptive Statistics 1 3 Where is the approximate What is the overall shape middle, or center of the of the graph? graph? 2 4 How spread out are the Does it have any data values on the graph? interesting patterns? OUTLIER A data point that is significantly greater or smaller than other data points in a data set Affects the calculation of descriptive statistics Can occur in any given data set and in any distribution They may occur by chance ○ An outlier must be excluded if it is due to measurement error or human error MEASURES TO DESCRIBE A DATA SET Measures of Position Measures of Spread Measures of Shape 1 2 3 1. MEASURES OF POSITION Measures the data central tendency Refers to where the data is centered Calculate an average of some kind ○ Mean ○ Median ○ Mode MEASURES OF POSITION 01 Mean - Arithmetic average of scores x = Σx/n Middle value of a distribution 02 Median (n + 1)/2 Most frequent value in a data 03 Mode set 1.A. MEAN The total of all the values divided by the size of the data set Most commonly used statistic of position Works well when the distribution is symmetric and there are no outliers - Mean of a sample = x Mean of a population = μ What is the mean of the data set below? x- = Σx/n 25 20 17 15 21 24 x- = (25 + 20 + 17 + 15 + 21 + 24) / 6 = 20.33 1.B. MEDIAN The middle value where exactly half of the data values are above it and half are below it Less widely used Can reduce effects of outliers Often used when the data is nonsymmetrical MEDIAN ODD: Arrange the values from smallest to largest and get the middlemost value EVEN: Median = (n + 1)/2 If whole number: keep as is If not a whole number: get the first whole number less than the location value and the first whole number greater than the location value WHAT IS THE MEDIAN OF THE DATA SET BELOW? 25 20 17 15 21 24 1. Arrange the values from least to greatest: a. 15, 17, 20, 21, 24, 25 b. Solve: (n + 1) / 2, where n is 6 c. (6 + 1) / 2 = 3.5 (not a whole number) d. Get the 3rd and 4th value (20, 21) as the median What is the MEDIAN of the data set below? 100 120 93 102 114 107 116 93, 100, 102, 107, 114, 116, 120 (n + 1) / 2 (8)/2 = 4 4th value = 107 1.C. MODE The value that occurs the most often in a data set Rarely used as a central tendency measure More useful to distinguish between unimodal and multimodal distributions ○ When the data has more than one peak What is the MODE of the data set below? 100 120 93 100 114 107 116 Mode = 100 MEASURES OF POSITION G12 DATA: PERMA (Meaning) 6.19 01 Mean 6.67 02 Median 6.66 03 Mode Outlier: 1, 1.67, 2.67 2. MEASURES OF SPREAD The spread refers to how the data deviates from the position measure Gives an indication of the amount of variation in the process Important indicator of the quality of the data Statistics used to describe the spread of the data: Range Standard deviation MEASURES OF SPREAD 1 2 STANDARD RANGE DEVIATION -3 -2 -1 0 1 2 2.A. RANGE The difference between the highest and lowest values in a data set The simplest measure of variability Denoted by “R” Does not make full use of the data ○ Can be misleading when the data is skewed or in the presence of outliers ○ Just one outlier can increase the range dramatically 2.A. RANGE GRADE 12 DATA SET Range for PERMA - LONELINESS Possible score: 0-10 R = 0- 10 Count: 0 = 1 occurrence 10 = 3 occurrences 2.B. STANDARD DEVIATION The average distance of the data points from their own mean ○ A low standard deviation indicates that that the data points are clustered around the mean ○ A large standard deviation indicates that the data points are widely scattered around the mean Standard deviation of a sample = s Standard deviation of the population = μ 2.B. STANDARD DEVIATION GRADE 12 DATA SET Standard Deviation for PERMA - LONELINESS m = 5.80 s = 2.65 2.B. Standard Deviation s = standard deviation x bar = mean of the data set x = values of the data set x - x bar = each data point subtracted by the mean n = size of the data set 3. MEASURES OF SHAPE Data can be plotted into a histogram to have a general idea of its shape or distribution The shape can reveal a lot of information about the data Data will ALWAYS follow some known distribution Symmetrical distribution - two sides of a distribution are a mirror image of each other 3. MEASURES OF SHAPE The shape helps identify which descriptive statistic is more appropriate to use in a given situation ○ Symmetrical: mean or median ○ Skewed: median as central tendency Two common statistics that measure the shape of the data Skewness Kurtosis MEASURES OF SHAPE 1 2 SKEWNESS KURTOSIS 3.A. SKEWNESS Describes whether the data is distributed symmetrically around the mean ○ Skewness value of zero = perfect symmetry, normal distribution ○ Negative value = left-skewed data ○ Positive value = right-skewed data 3.A. SKEWNESS Skewness is usually described as a measure of a dataset’s symmetry – or lack of symmetry. If the skewness is between -0.5 and 0.5, the data are fairly symmetrical If the skewness is between -1 and – 0.5 or between 0.5 and 1, the data are moderately skewed If the skewness is less than -1 or greater than 1, the data are highly skewed 3.B. KURTOSIS Kurtosis is a measure of the combined weight of a distribution's tails relative to the center of the distribution. ○ Measure that describe the shape of a distribution’s tails in relation to its overall shape ○ Not merely a measure of “peakedness” 3.B. KURTOSIS Types: ○ Mesokurtic (kurtosis = 3.0) distributions moderate in breadth & curves with a medium peaked height ○ Leptokurtic (kurtosis > 3.0) more values in the tails and more values close to the mean ○ Platykurtic (kurtosis < 3.0) fewer values in tails and fewer values close to the mean (ie the curve has a flat peak and more dispersed scores with lighter tails)