Chapter II: Statistics Refresher (PDF)
Document Details
Uploaded by RicherPeony7394
University of Batangas
Tags
Summary
This document provides a refresher course on statistics and its application in psychological testing. It covers descriptive and inferential statistics, various measurement scales, and the properties of different scales. The goal is to give readers useful knowledge to make sense of data.
Full Transcript
Chapter II. Introduction Statistics gives us ways to transform data into information, and information into practice. As citizens of an information-based society, we owe it to ourselves to understand statistics, at least at a basic level. Without such understanding, we are under the dictato...
Chapter II. Introduction Statistics gives us ways to transform data into information, and information into practice. As citizens of an information-based society, we owe it to ourselves to understand statistics, at least at a basic level. Without such understanding, we are under the dictatorship of those who would manipulate data to their own advantage, or, worse yet, ignore all data and go with whatever sounds good at the moment. Our world runs on information. Information comes from data which is processed into meaning through the mathematical discipline known as statistics. Understanding statistics, even at a basic level, is as vital as literacy in today’s world. As an ordinary person, it is good to know basic statistics so that one can make sense of the data in his/her life. For example, how much is one’s utility bill, on average? How does it vary? What more if one is a psychologist who makes conclusions about human behavior. To make any kind of conclusions about the human behavior, we must first come up with some sort of question, state our presumed conclusion to the question, and then collect data from a sample of the population, most preferably a random sample, so as to minimize bias as much as possible. Then, from the data we’ve collected, to make a true rigorous conclusion, we must run through statistical methodologies. It can be as basic as finding mean and standard deviation, to constructing a full-fledged predictive model. Without statistics, the whole field would break down as there would be no solid backing for hypotheses being true or untrue. This chapter is designed to provide students with a refresher course about statistics and its application in psychological testing and assessment. Lesson Proper: Why do we need Statistics? 1. Statistics are used for purposes of description. Numbers provide convenient summaries and allow us to evaluate some observations relative to others. 2. We can use statistics to make inferences, which are logical deductions about events that cannot be observed directly. Descriptive statistics: methods used to provide a concise description of a collection of quantitative information. Inferential statistics: methods used to make inferences from observations of a small group of people (sample) to a larger group of individuals (population). Measurement: Act of assigning numbers or symbols to characteristics of things according to rules Scales - Set of numbers whose properties model empirical properties of the objects to which the numbers are assigned - Continuous Scale – measures continuous variable - Discrete Scale – categorization has no much meaning Properties of Scale 1. Magnitude - The property of “moreness.” - Particular instance of the attribute represents more, less, or equal amounts of the given quantity than does another instance. Ex: Height : taller, shorter 2. Equal Intervals - The difference between two points at any place on the scale has the same meaning as the difference between two other points that differ by the same number of scale units. - Ex: Ruler: Inches - A psychological test rarely has the property of equal intervals. (Ex: IQ levels & their meanings per level) 3. Absolute Zero - Obtained when nothing of the property being measured exists. Ex: dead heart rate - For many psychological qualities, it is extremely difficult, if not impossible, to define an absolute 0 point. - Ex: Measuring and defining “0” shyness from a scale of 0to10 Types of Measurement Scale A. Nominal scale: - are really not scales at all; their only purpose is to name objects - used when the information is qualitative rather than quantitative. - Social science researchers commonly label groups in sample surveys with numbers (such as 1 = African American, 2 = white, and 3 = Mexican American) B. Ordinal scale: - with the property of magnitude but not equal intervals or an absolute 0 - rank individuals or objects but not to say anything about the meaning of the differences between the ranks. - Height; IQ C. Interval scale: - has the properties of magnitude and equal intervals but not absolute 0 - the measurement of temperature in degrees Fahrenheit. D. Ratio scale: - has all three properties (magnitude, equal intervals, and an absolute 0) - speed of travel, 0 miles per hour (mph) Type of Scale Magnitude Equal Absolute Interval Zero Nominal No No No Ordinal Yes No No Interval Yes Yes No Ratio Yes Yes Yes Nominal: Ordinal: Classification or Categorization based on one or more Classification + Ranking or Ordering distinguishing characteristics Interval: Ratio: All math operations can be meaningfully Classification + Ranking + Equal Intervals performed Has absolute zero Describing Data Distributions - A set of test scores arrayed for recording or study Raw Score - Straightforward, unmodified accounting of performance that is usually numerical Frequency distribution - displays scores on a variable or a measure to reflect how frequently each value was obtained; defines all the possible scores and determines how many people obtained each of those scores. Graphic Form (describing data) 1. Histogram – a graph with vertical lines drown at the true limits of each test score forming a series of contiguous rectangles. 2. Bar Graph – numbers indicative of frequency appear of Y- Axis; categorization in X- Axis 3. Frequency Polygon – expressed by a continuous line connecting the points where test scores or class intervals meet frequencies Percentile Ranks - Answers the question, “What percent of the scores fall below a particular score (Xi)?” 𝐵 - P r= 𝑥100 𝑁 Percentiles - the specific scores or points within a distribution - divide the total frequency for a set of observations into hundredths Measures of Central Tendency - A statistic that indicates the average or midmost score between the extreme scores in a distribution Mean - Most commonly used - Most appropriate measure of central tendency for INTERVAL AND RATIO DATA - arithmetic average score in a distribution - total the scores and divide the sum by the number of cases Median - Middle score in distribution - Most appropriate for ORDINAL, INTERVAL AND RATIO DATA - Useful when few scores fall at the high end or relatively few scores fall at the low end Mode - Most frequently occurring score - Appropriate in NOMINAL data - NOT COMMONLY USED - It is useful in analysis of a qualitative or verbal nature Measures of Variability Variability - An indication of how scores in a distribution are scattered or dispersed Range (Highest Score – Lowest Score) Interquartile range (Q3 – Q1) Standard Deviation (s2): a measure of variability equal to the square root of the average squared deviations about the mean. It is equal to the square root of the variance Variance - it is equal to the arithmetic mean of the squares of the differences between the scores in a distribution and their mean Skewness - Symmetry is absent - Presence or Absence of symmetry in a distribution is simply one characteristic by which a distribution can be described Kurtosis - The steepness of a distribution in its center - Platy – flat - Lepto – peaked - Meso – middle Norms, Normal Curve and Normal Distribution Norms - Refers to the performances by defined groups on particular tests. - The norms for a test are based on the distribution of scores obtained by some defined sample of individuals. - The mean is a norm, and the 50th percentile is a norm. - Norms are used to give information about performance relative to what has been observed in a standardization sample. - Norm-referenced test: compares each person with a norm - Criterion-referenced test: describes the specific types of skills, tasks, or knowledge that the test taker can demonstrate such as mathematical skills Normal Curve - Scientists referred to it as Laplace-Gaussian Curve - Karl Pearson is credited with being the first to refer to the curve as NORMAL CURVE - Also called as Gaussian - A bell-shaped, smooth, mathematically defined curve that is highest at its center - The mean, the median, and the mode all have the same value - A normal curve HAS TWO TAILS - Tails – area on the normal curve between 2 and 3 standard deviations above the mean Normal Distribution Standard Score -A raw score that has been converted from one scale to another scale, where the latter scale has some arbitrarily set mean and standard deviation -More easily INTERPRETABLE than raw scores] 1. Z – Scores: Mean = 0; SD = 1 - Is equal to the difference between a particular raw score and the mean divided by standard deviation 2. T-Scores/ McCall’s T: Mean = 50; SD = 10 - Devised by W.A. McCall -Named a T-score in honor of his professor E.L. Thorndike -None of the scores is Negative 3. Stanine: Mean = 5; SD = 2 4.STEN: Mean = 5.5; SD = 2 Correlation and Inference Inferences (deduced conclusions) - How some things (such as traits, abilities, or interests) are related to other things (such as behavior) Coefficient of correlation (or correlation coefficient) - A number that provides us with an index of the strength of the relationship between two things. - mathematical index that describes the direction and magnitude of a relationship. Three hypothetical relationships: - positive correlation: X and Y have high scores - negative correlation: X & Y have high & low scores - no correlation. Coefficient of determination: the correlation coefficient squared; this value tells us the proportion of the total variation in scores on Y that we know as a function of information about X. Coefficient of alienation is a measure of non-association between two variables. The Concept of Correlation Correlation - An expression of the degree and direction of correspondence between two things - Degree (Weak – Strong) - Direction (Positive, Negative, No Correlation) - LINEAR relationship - ONLY TWO (2) VARIABLES - Numerical in nature - NO CAUSATION but CAN PREDICT Tests of Correlation 1. Pearson R - Most widely used - Also known as the Pearson correlation coefficient and the Pearson product-moment coefficient of correlation - Used when variables are Linear and Continuous - Pearson and Z-Score are correlated because both are concerned with the location of an individual in a distribution - The smaller the P-VALUE, the more significant the relationship - Larger CORRELATION, means more related to each other 2. Spearman Rho - One commonly used ALTERNATIVE statistic - Aka Rank-order Correlation Coefficient/ Rank- difference correlation coefficient - Used when Small sample size (fewer than 30 pairs) and Ordinal Data 3. Point-Biserial Correlation - Relationship when one of the variables are Dichotomous and the other is Continuous 4. Phi-Coefficient - Used when BOTH variables are Dichotomous Dichotomous variables: have only two levels. Examples are yes–no, correct–incorrect, and male–female. True dichotomous variables: because they naturally form two categories. Ex: gender Artificially dichotomous variables: because they reflect an underlying continuous scale forced into a dichotomy. Ex: Pass or fail in a test Graphic Representation of Correlation Scatterplot - a simple graphing of the coordinate points for values of the X-variable (placed along the graph’s horizontal axis) and the Y-variable (placed along the graph’s vertical axis) - Provides a quick indication of the direction and magnitude of the relationship, if any, between the two variables - also makes the spotting of outliers relatively easy - Outliers - an extremely atypical point located at a relatively long distance—an outlying distance— from the rest of the coordinate points in a scatterplot - WHY DO OUTLIERS EXIST? - Simply the result of administering a test to a very small sample of test-takers - Sometimes help identify a test-taker who did not understand the instructions, was not able to follow instructions, or was simply oppositional and did not follow instructions - Sometimes provides a hint of deficiency in testing or scoring The Concept of Regression Regression - used to make predictions about scores on one variable from knowledge of scores on another variable. - Analysis of relationships among variables of understanding how one variable may predict other (X) IV – Predictor Variable (Y) DV – Outcome Variable Multiple Regression - The use of more than one score to predict Y - More predictors are NOT necessarily better