Psych Assessment - Statistics PDF
Document Details
Uploaded by ConsummateNovaculite7174
Tags
Summary
This document outlines key statistical concepts central to psychology, focusing on descriptive statistics, measures of central tendency (mean, median, mode), and variability (range, standard deviation). It's likely part of an introductory psychology course or textbook.
Full Transcript
psych assessment STATISTICS Measures of Central Tendencies Primary goals of Statistics in Psychology Is a statistics that indicates the average or To describe by doing descriptive statistics midm...
psych assessment STATISTICS Measures of Central Tendencies Primary goals of Statistics in Psychology Is a statistics that indicates the average or To describe by doing descriptive statistics midmost score between the extreme scores in a To be able to do or conduct prediction based on distribution the data Mean Descriptive Statistics - referred to in everyday language as the “average” This is the field in statistics that is used to - considers the actual numerical value of describe our data every score - usually used in interval Raw score - usually denoted with x-bar - the unmodified accounting of performance of the testtaker in a test Median ○ usually numerical but not always - defined as the middle score in a ○ nakukuha madalas pag tina-tally ang distribution scores, pag-kinukuha ang correct answers - another commonly used measure of central tendency Frequency Distribution - usually used in ordinal data - a table of scores and how many times they occurred. They can be presented in several ways Mode - most frequently occurring score in a distribution of scores - usually used in nominal data Measures of Variability Variability - is an indication of how scores in a distribution are scattered or dispersed - gaano sila kalapit o kalayo sa isa’t isa in Grouped Frequency Distribution a certain data set - more preferred than simple frequency distribution Range - pina-pakita ang class interval - equal to the difference between the - mas convenient highest and the lowest scores. - may upper limit and lower limit - quick but gross description of the spread of scores - not the most accurate out of the measures of variability - because it is very dependent on the extreme scores (which is the high and low scores) - may chance pa na ma-alter ng high and low scores ang value ng range - ginagamit lang ang range when you want to make your own frequency distribution table - but for more complex Histogram and Graph statistical analysis hindi ginagamit si - a diagram that is composed of lines, range kasi may mas better pa na points, bars or other symbols gamitin. - describe and illustrate any data Interquartile and Semi-interquartile Ranges - 25% of each scores occur in each quarter - understanding this will lead us to the most widely used measures variability (which is the standard deviation) Standard Deviation - equal to the square root of the average squared deviation about the mean CLD | 1 psych assessment - a very useful measure of variation because each The Normal Curve individual score’s distance from the mean of the - in theory, the distribution of the normal curve distribution is factored into its computation ranges from negative infinity to positive infinity - you will come across this measure of variation - hindi natatapos, napupunta lang sa negative or frequently in the study and practice of positive values measurement in psychology - the curve is perfectly symmetrical, with no - gaano kalayo yung score doon sa mean in the skewness normal curve or in the distribution - mean, median and mode are similar. - para makita ang variability nila dapat alam ang - as a general rule (with ample exceptions), the SD larger the sample size and the wider the range of - in assessment ang pinaka-pinapansin talaga ay abilities measured by a particular test, the more ang SD the graph of the test scores will approximate the normal curve. Variance - the more na dumadami ang sample size na - equal to the arithmetic mean of the squares of kinukuha, the more na pinapakita niya ang the differences between the scores in a normality ng curve distribution and their mean. - the more your data looks like a normal curve, the - Square of SD more reliable your inferences would be. - hindi masyadong focus ito sa assessment dahil - Bell-shaped, smooth, mathematically defined ang main focus ay ang SD but in other forms of curve that is highest at its center. researches in psychology, variance is very important (like in expe) Standard Scores - raw scores that has been converted from one Skewness scale to another scale, where the latter scale has - nature and extent to which score symmetry is more arbitrarily set mean and standard absent deviation. - - Why? Converting raw scores to standard scores Positively Skewed makes it easier to interpret raw scores - few higher scores, most scores are on - from raw scores to standard scores the lower end. - used for easier interpretation of raw scores and - kapag yung tail nasa right side for easier comparison of a person’s test scores to other test takers of a particular test. Negatively Skewed - few lower scores, most scores are on the Z-scores higher end - results from the conversion of a raw score into a - kapag yung tail nasa left side number indicating how many standard deviation units the raw score is below or above the mean ex. the scores of this section is falling on the positively ̄ =0) of the distribution. (s=1, x skewed note, means konti lang ang mataas ang scores and majority of the scores are on the lower end. T-scores - same with z-scores with different SD and Mean ex. your scores are negatively skewed, means mas konti ̄ =50) (s=10, x ang mga scores na mababa and majority of the scores are on the higher end. - walang makukuhang negative unlike z-scores Stanine Kurtosis - ̄ =5) (s=2, x - means the stickness of the distribution in its - usually used in achievement test or attitude test center. IQ Score Kinds of Kurtosis - (s=15, x ̄ =100) Platykurtic - flat; scores are distributed almost equally Standard Scores in the Normal Curve through the higher and lower end Leptokurtic - peaked; scores are mostly in the center, with fewer scores in the higher and lower end Mesokurtic - middle; the ideal kurtosis CLD | 2 psych assessment Inferential Statistics - making conclusions Correlation - one of the most basic way to make an inference in a given data - an expression or degree of correspondence between two things - tinitignan niya lang at kung gaano kalakas yung correspondence nung dalawang bagay Correlation Coefficient - a number that provides us with an index of the strength of the relationship between two things - it ranges from -1 to +1 - if a variable goes up and the other variable goes up we can say that this two variable are directly correlated with one another - basta dapat parehas sila ng direction - kapag yung isang variable bumababa tapos yung isa tumataas, we can say that this two variables have a negative or inverse kind of relationship. - kapag walang relationship or 0 ang nakuha is possible but it is very rare kagaya ng pagkakaroon ng perfect correlation. Pearson r - for linear relationship - statistical tool of choice when the relationship between the variables are linear and when the two variable being correlated are continuous Spearman Rho - for ordinal variables - 30 pairs of measurement Scatterplot - graphing of the coordinate points of values of the x-variable and y-variable Meta-analysis - family techniques used to statistically combine information across studies to produce single estimates of the data under study - usually being derived in the effect size - in most studies, ginagamit siya kapag gustong mag gather ng iba’t ibang researches at gamitin nalang kung anong meron sila. - for undergraduate, this is not very common as technique figures, kasi it is a bit more complex. CLD | 3 psych assessment Chapter 3: A Statistics Refresher - rank ordering on some characteristic is also permissible with ordinal scales. - have no absolute zero point. Test scores - the ways in which data fro such scales can be are frequently expressed as numbers, and analyzed statistically are limited statistical tools are used to describe, make inferences from, and draw conclusions about Interval Scales numbers - contain equal intervals between numbers Measurement - Each unit on the scale is exactly equal to any the act of assigning numbers or symbols to other unit on the scale. But like ordinal scales, characteristics of things (people, events, interval scales contain no absolute zero point. whatever) according to rules. any other unit on the scale For example, Assign the number 12 to all lengths - a presumption inherent in their use is that no that are exactly the same length as a 12-inch testtaker possesses none of the ability or trait (or ruler whatever) being measured Scale Ratio Scales a set of numbers (or other symbols) whose - has a true zero point properties model empirical properties of the - All mathematical operations can meaningfully be objects to which the numbers are assigned. performed because there exist equal intervals between the numbers on the scale as well as a Continuous Scale true or absolute zero point. exists when it is theoretically possible to divide - In psychology, ratio-level measurement is any of the values of the scale employed in some types of tests and test items, Categorization scale would be characterized as perhaps most notably those involving discrete because it would not be accurate or assessment of neurological functioning. meaningful to categoriz any of the subjects in the study as anything other than two discrete Measurement Scales in Psychology groups (e.g. “previously hospitalized” or “not ordinal level of measurement is most frequently previously hospitalized.” used in psychology. These tests indicate with more or less accuracy Error not the amount of intelligence, aptitude, and he collective influence of all of the factors on a personality traits of individuals, but rather the test score or measurement beyond those rank-order positions of the individuals.” specifically measured by the test or Dynamometer measurement. - an instrument used to measure strength of hand very much an element of all measurement, and it grip. The examinee is instructed to squeeze the is an element for which any theory of grips as hard as possible measurement must surely account. The attraction of interval measurement for users of Levels or Scales of Measurement psychological tests is the flexibility with which such data Black = french word is NOIR can be manipulated statistically N - nominal O - ordinal Describing Data I - interval R - ratio Distribution may be defined as a set of test scores arrayed Nominal Sales for recording or study. - simplest form of measurement. - Involve classification or categorization based on 3 Kinds of Graph one or more distinguishing characteristics, where Histogram all things measured must be placed into mutually - a graph with vertical lines drawn at the true exclusive and exhaustive categories limits of each test score (or class interval), - exclusively for classification purposes and could forming a series of contiguous rectangles not be meaningfully added, subtracted, ranked, Bar Graph or averaged - numbers indicative of frequency also appear on - nominal scaling, including yes or no responses. the Y-axis, and reference to some categorization (e.g., yes/no/maybe, male/female) appears on Ordinal Scales the X-axis. Here the rectangular bars typically - scales permit classification are not contiguous CLD | 4 psych assessment Frequence Polygon have a direct numerical relationship to the - expressed by a continuous line connecting the original, raw score. points where test scores or class intervals (as indicated on the X-axis) meet frequencies (as Correlation and Inference indicated on the Y-axis). Coefficient of Correlation (or correlation coefficient) Measures of Variability - is a number that provides us with an index of the strength of the relationship between two things Quartiles - dividing points between the four quarters in the Correlation distribution. - an expression of the degree and direction of - There are three of them, respectively labeled Q1, correspondence between two things Q2, and Q3. [maddivide into 3 - refers to a specific point whereas quarter refers Coefficient of Correlation to an interval. [quarter - divide by 4] - numerical index that expresses this relationship: It tells us the extent to which X and Y are Interquartile Range “co-related.” - a measure of variability equal to the difference between Q3 and Q1. Positive Correlation - Like the median, it is an ordinal statistic - exists when two variables simultaneously decrease Semi-interquartile Range - equal to the interquartile range divided by 2 Negative (or inverse) Correlation - occurs when one variable increases while the Average Deviation other variable decreases - tool that could be used to describe the amount of variability in a distribution is the average Correlation is zero, then absolutely no relationship deviation, or AD for short. exists between the two variables. And some might - rarely used. Perhaps this is so because the consider “perfectly no correlation” deletion of algebraic signs renders it a useless measure for purposes of any further operations Standard Deviation - a measure of variability equal to the square root of the average squared deviations about the mean - it is equal to the square root of the variance. The Normal Curve Through the early nineteenth century, (Karl Friedrich Gauss) scientists referred to it as the “Laplace-Gaussian curve.” Karl Pearson is credited with being the first to refer to the curve as the normal curve, perhaps in an effort to be diplomatic to all of the people who helped develop it Linear Transformation - that retains a direct numerical relationship to the original raw score. - The magnitude of differences between such standard scores exactly parallels the differences between corresponding raw scores Nonlinear Transformation - may be required when the data under consideration are not normally distributed yet comparisons with normal distributions need to be made. In a nonlinear transformation, the resulting standard score does not necessarily CLD | 5 psych assessment CHAPTER 4: OF TESTS AND TESTING Stated another way, exactly how a particular trait manifests itself is, at least to some extent, situation-dependent SOME ASSUMPTIONS ABOUT PSYCHOLOGICAL TESTING AND ASSESSMENT PSYCHOLOGICAL TRAIT OF SENSATION SEEKING Defined as “the need for varied, novel, and complex ASSUMPTION 1: Psychological Traits and States Exist sensations and experiences and the willingness to take physical and social risks for the sake of such TRAIT experiences” (Zuckerman, 1979, p. 10). Has been defined as “any distinguishable, 22-item Sensation-Seeking Scale (SSS) seeks to relatively enduring way in which one individual identify people who are high or low on this trait. varies from another” (Guilford, 1959, p. 6). The reference group with which comparisons are made The trait term that an observer applies, as well can greatly influence one’s conclusions or judgments. as the strength or magnitude of the trait presumed to be present, is based on observing a ASSUMPTION 2: Psychological Traits and States can sample of behavior be quantified and measured (Stable, relatively enduring, nagbabago pa rin but not Once it’s acknowledged that psychological traits permanent) (Trait theory OCEAN: O= Openness, C= and states do exist, the specific traits and states to Conscientiousness, E= Extraversion, A= Agreeableness, be measured and quantified need to be carefully N= Neuroticism) (You may get the same or almost the defined. same result after 10 years) First step in understanding the meaning of that score is understanding how the word was defined by STATES the test developer. Ideally, the test developer has also distinguish one person from another but are provided test users with a clear operational relatively less enduring (Chaplin et al., 1988) definition of the construct under study. Once having defined the trait, state, or other construct to be (Mabilis magbago, e.g. morning= happy, afternoon= measured, a test developer considers the types of sad/angry) (Trait/state nagiging construct = kapag item content that would provide insight into it. nagiging measurable with the use of scales of test) A test developer has a world of possible items that can be written to gauge the strength of that trait in Samples of behavior may be obtained in a number of test takers. One question that arises is: Should both ways, ranging from direct observation to the analysis of items be given equal weight? That is, should we self-report statements or pencil-and-paper test answers place more importance on—and award more points for—an answer keyed “correct” to one or the other The definitions of trait and state we are using also refer of these two items? to a way in which one individual varies from another Weighting the comparative value of a test’s items comes about as the result of a complex interplay PSYCHOLOICAL TRAIT among many factors, including technical Covers a wide range of possible characteristics. considerations, the way a construct has been Thousands of psychological trait terms can be found defined for the purposes of the test, and the value in the English language (Allport & Odbert, 1936). society (and the test developer) attaches to the Among them are psychological traits that relate to behaviors evaluated. intelligence, specific intellectual abilities, cognitive Measuring traits and states by means of a test style, adjustment, interests, attitudes, sexual entails developing not only appropriate test items orientation and preferences, psychopathology, but also appropriate ways to score the test and personality in general, and specific personality traits interpret the results. The test score is presumed to Exists only as a construct—an informed, scientific represent the strength of the targeted ability or trait concept developed or constructed to describe or or state and is frequently based on cumulative explain behavior. We can’t see, hear, or touch scoring. constructs, but we can infer their existence from (To prove something exist/ for conclusion, kasi hindi overt behavior nakikita pinagaaralan natin, scientific methodology, dapat interval level) OVERT BEHAVIOR Observable action or the product of an observable ASSUMPTION 3: Test-Related behavior predicts action, including test- or assessment-related non-test related behavior responses. The objective of the test is to provide some RELATIVELY ENDURING indication of other aspects of the examinee’s In our definition of trait is a reminder that a trait is behavior. not expected to be manifested in behavior 100% of The tasks in some tests mimic the actual behaviors the time. that the test user is attempting to understand. By CLD | 6 psych assessment their nature, however, such tests yield only a sample CLASSICAL TEST THEORY (CTT) of the behavior that can be expected to be emitted variously referred to as true score theory under non-test conditions. The assumption is made that each test taker has The obtained sample of behavior is typically used to a true score on a test that would be obtained but make predictions about future behavior, such as for the action of measurement error. work performance of a job applicant. Alternatives to CTT exist, such as a model of It is beyond the capability of any known testing or measurement based on item response theory assessment procedure to reconstruct someone’s (IRT) state of mind. Still, behavior samples may shed light, under certain circumstances, on someone’s state of ASSUMPTION 6: Testing and Assessment Can Be mind in the past. Conducted in a Fair and Unbiased manner [E.g. Kapag mag conduct ng IQ test dapat hindi lang IQ more controversial dapat pati motivation, self-esteem (ex. Matalino in some tools can be used properly or improperly way), mental health (analyze the situation during the Today all major test publishers strive to develop time na nagtest), other factors na nagcocontribute sa instruments that are fair when used in strict result ng test] accordance with guidelines in the test manual. One source of fairness-related problems: is the test ASSUMPTION 4: Test and Other measurement user who attempts to use a particular test with techniques have strength and weaknesses people whose background and experience are Competent test users understand a great deal about different from the background and experience of the tests they use. people for whom the test was intended. They understand, among other things, how a test “What do we as a society wish to accomplish by the was developed, the circumstances under which it is use of this test or assessment procedure? appropriate to administer the test, how the test should be administered and to whom, and how the ASSUMPTION 7: Testing and Assessment Benefit test results should be interpreted. Society Competent test users understand and appreciate Without test and assessment, people could present the limitations of the tests they use as well as how themselves as surgeons, bridge builders, or airline those limitations might be compensated for by data pilots regardless of their background, ability, or from other sources professional credentials. Without tests or other (Made by humans, not perfect, may flaws, wag gagawa assessment procedures, personnel might be hired on ng test na isa lang kaya nga may test battery, decision the basis of nepotism rather than documented merit. based sa battery of test) Considering the many critical decisions that are based on testing and assessment procedures, we ASSUMPTION 5: Various sources of error are part of can readily appreciate the need for tests, especially the assessment good tests. ERROR to refer to mistakes, miscalculations, and the (if walang test and assessment, walang basis or standard like. In the context of assessment, error need not and anyone can be anything) refer to a deviation, an oversight, or something that otherwise violates expectations. WHAT IS A GOOD TEST? Error traditionally refers to something that is more clear instructions for administration, scoring, and than expected; it is actually a component of the interpretation measurement process. Refers to a long-standing Plus if a test offered economy in the time and assumption that factors other than what a test money it took to administer, score, and interpret it. attempts to measure will influence performance on measures what it purports to measure the test. reliability and validity Test scores are always subject to questions about Useful test, one that yields actionable results that the degree to which the measurement process will ultimately benefit individual test takers or includes error society at large. ERROR VARIANCE RELIABILITY The component of a test score attributable to Criterion of reliability involves the consistency of the sources other than the trait or ability measured. measuring tool: the precision with which the test Potential sources of error variance: assessee measures and the extent to which error is present in measurements. Error is random, or, for lack of a better term, just a Psychological tests, like other tests and instruments, matter of chance themselves, assessors, measuring are reliable to varying degrees. instruments. Reliability is a necessary but not sufficient element of a good test. Tests must be reasonably accurate CLD | 7 psych assessment SAMPLING TO DEVELOP NORMS VALIDITY Measure what it purports to measure. STANDARDIZATION OR TEST STANDARDIZATION Questions regarding a test’s validity may focus on The process of administering a test to a the items that collectively make up the test. representative sample of test takers for the purpose The validity of a test may also be questioned on of establishing norms grounds related to the interpretation of resulting A test is said to be standardized when it has clearly test scores. specified procedures for administration and scoring, typically including normative data. NORMS NORM-REFERENCED TESTING AND ASSESSMENT - SAMPLING method of evaluation and a way of deriving meaning Process of developing a test, a test developer has from test scores by evaluating an individual test taker’s targeted some defined group as the population for score and comparing it to scores of a group of test which the test is designed. This population is the takers. complete universe or set of individuals with at least - An individual test score is understood relative to one common, observable characteristic. other scores on the same test. The process of selecting the portion of the universe - A common goal of norm-referenced tests is to deemed to be representative of the whole yield information on a test taker’s standing or population ranking relative to some comparison group of test Subgroups within a defined population may differ with takers. respect to some characteristics, and it is sometimes essential to have these differences proportionately NORM- in the singular is used in the scholarly literature represented in the sample. to refer to behavior that is usual, average, normal, standard, expected, or typical. SAMPLE- portion of the universe of people deemed to be representative of the whole population. NORMS- is the plural form of norm, as in the term gender norms. STRATIFIED SAMPLING- would help prevent sampling bias and ultimately aid in the interpretation of the In a PSYCHOMETRIC CONTEXT, NORMS are the test findings. performance data of a particular group of test takers that are designed for use as a reference when evaluating STRATIFIED-RANDOM SAMPLING- If such sampling or interpreting individual test scores were random NORMATIVE SAMPLE- is that group of people whose PURPOSIVE SAMPLE- arbitrarily select some sample performance on a particular test is analyzed for because we believe it to be representative of the reference in evaluating the performance of individual population test takers. Members of the normative sample will all be typical with respect to some characteristic(s) of the INCIDENTAL SAMPLE OR CONVINIENCE SAMPLE- is people for whom the particular test was designed. one that is convenient or available for use - These data constitute the norms. - The data may be in the form of raw scores or DEVELOPING NORMS FOR A STANDARDIZED TEST converted scores The test developer administers the test according to the. standard set of instructions that will be used with the test. NORMING - refer to the process of deriving norms. The test developer also describes the recommended setting for giving the test. - may be modified to describe a particular type of Establishing a standard set of instructions and conditions norm derivation. under which the test is given makes the test scores of the - For example, race norming (the controversial normative sample more comparable with the scores of practice of norming on the basis of race or ethnic future test takers. the test developer will summarize the background.) data using descriptive statistics, including measures of central tendency and variability NORMING A TEST- can be a very expensive proposition. In order to best assist future users of the test, test For this reason, some test manuals provide what are developers are encouraged to “provide information to variously known as support recommended interpretations of the results, including the nature of the content, norms or comparison groups, and other technical evidence” USER NORMS OR PROGRAM NORMS- which “consist of In practice, descriptions of normative samples vary widely descriptive statistics based on a group of test takers in a in detail. Test authors wish to present their tests in the most given period of time rather than norms obtained by favorable light possible. formal sampling methods” Shortcomings in the standardization procedure or elsewhere in the process of the test’s development CLD | 8 psych assessment therefore may be given short shrift or totally overlooked in One drawback of grade norms is that they are useful a test’s manual. only with respect to years and months of schooling Sometimes, although the sample is scrupulously defined, completed. the generalizability of the norms to a particular group or They have little or no applicability to children who individual is questionable. are not yet in school or to children who are out of When the people in the normative sample are the same people on whom the test was standardized, the phrases school. Further, they are not typically designed for normative sample and standardization sample are often use with adults who have returned to school used interchangeably The test remains standardized based on data from the Both grade norms and age norms are referred to more original standardization sample; it’s just that new generally as developmental norms, a term applied normative data are developed based on an administration broadly to norms developed on the basis of any trait, of the test to anew normative sample. ability, skill, or other characteristic that is presumed to Included in this new normative sample may be groups of develop, deteriorate, or otherwise be affected by people who were underrepresented in the original standardization sample data. In such a scenario, the chronological age, school grade, or stage of life normative sample for the new norms clearly would not be identical to the standardization sample, so it would be NATIONAL NORMS inaccurate to use the terms standardization sample and Derived from a normative sample that was normative sample interchangeably nationally representative of the population at the time the norming study was conducted. TYPES OF NORMS - age norms, grade norms, national for example, national norms may be obtained by norms, national anchor norms, local norms, norms from a testing large numbers of people representative of fixed reference group, subgroup norms, and percentile different variables of interest such as age, gender, norms racial/ethnic background, socioeconomic strata, geographical location PERCENTILE - is an expression of the percentage of The precise nature of the questions raised when people whose score on a test or measure falls below a developing national norms will depend on whom the particular raw score. test is designed for and what the test is designed to do PERCENTAGE CORRECT-refers to the distribution of raw scores—more specifically, to the number of items that NATIONAL ANCHOR NORMS were answered correctly multiplied by 100 and divided by Could provide the tool for such a comparison the total number of items National anchor norms provide some stability to test scores by anchoring them to other test scores. Types of Norms Established typically begins with the computation of AGE NORMS percentile norms for each of the tests to be Also known as age-equivalent scores, age norms compared. Using the equipercentile method, the indicate the average performance of different equivalency of scores on different tests is calculated samples of test takers who were at various ages at with reference to corresponding percentile scores. the time the test was administered. Although national anchor norms provide an Ever since the introduction of the Stanford-Binet to indication of the equivalency of scores on various this country in the early twentieth century, the idea tests, technical considerations entail that it would be of identifying the “mental age” of a test taker has a mistake to treat these equivalencies as precise had great intuitive appeal. equalities The child of any chronological age whose performance on a valid test of intellectual ability SUBGROUP NORMS indicated that he or she had intellectual ability A normative sample can be segmented by any of similar to that of the average child of some other the criteria initially used in selecting subjects for the age was said to have the mental age of the norm sample. What results from such segmentation are group in which his or her test score fell. more narrowly defined subgroup norms. GRADE NORM LOCAL NORMS Designed to indicate the average test performance Typically developed by test users themselves. of test takers in a given school grade Provide normative information with respect to the developed by administering the test to local population’s performance on some test. representative samples of children over a range of Some test users substitute one subtest for another consecutive grade levels within a larger test, thus creating the need for new The primary use of grade norms is as a convenient, norms. There are many different scenarios that readily understandable gauge of how one student’s would lead the prudent test user to develop local performance compares with that of fellow students norms in the same grade. CLD | 9 psych assessment PERCENTILE NORMS In criterion-referenced interpretations of test data, The raw data from a test’s standardization sample a usual area of focus is the test taker’s performance: converted to percentile form. what the test taker can or cannot do; what the test taker has or has not learned; whether the test taker FIXED REFERENCE GROUP SCORING SYSTEMS does or does not meet specified criteria for inclusion The distribution of scores obtained on the test from in some group, access to certain privileges, and so one group of test takers—referred to as the fixed forth. reference group—is used as the basis for the calculation of test scores for future administrations Norm-referenced and criterion-referenced are two of of the test. many ways that test data may be viewed and Perhaps the test most familiar to college students interpreted. However, these terms are not mutually that has historically exemplified the use of a fixed exclusive, and the use of one approach with a set of test reference group scoring system is the SAT. This test data does not necessarily preclude the use of the other was first administered in 1926. approach for another application. Its norms were then based on the mean and standard deviation of the people who took the test CULTURE AND INFERENCE at the time It is incumbent upon responsible test users not to lose sight of culture as a factor in test NORM-REFERENCED VERSUS administration, scoring, and interpretation. CRITERION-REFERENCED EVALUATION In interpreting data from psychological tests, it is One way to derive meaning from a test score is to frequently helpful to know about the culture of the evaluate the test score in relation to other scores on test taker, including something about the era or the same test. As we have pointed out, this “times” that the test taker experienced approach to evaluation is referred to as MARGARET MEAD (1978, p. 71), who, in recalling her norm-referenced. youth, wrote: “We grew up under skies which no Another way to derive meaning from a test score is satellite had flashed.” In interpreting assessment to evaluate it on the basis of whether or not some data from assessees of different generations, it criterion has been met. would seem useful to keep in mind whether “satellites had or had not flashed in the sky.” In CRITERION- a standard on which a judgment or other words, historical context should not be lost decision may be based. sight of in evaluation (Rogler, 2002). - Criterion in criterion-referenced assessments typically derives from the values or standards of an individual or organization. CRITERION-REFERENCED TESTING AND ASSESSMENT the approach has also been referred to as domain-or content- referenced testing and assessment Frequently used to gauge achievement or mastery, they are sometimes referred to as mastery tests. Method of evaluation and a way of deriving meaning from test scores by evaluating an individual’s score with reference to a set standard. Example, to be licensed as a psychologist, the applicant must achieve a score that meets or exceeds the score mandated by the state on the licensing test. widespread acceptance in the field of computer-assisted education program Difference between norm-referenced and criterion-referenced approaches to assessment has to do with the area of focus regarding test results. In norm-referenced interpretations of test data, a usual area of focus is how an individual performed relative to other people who took the test. CLD | 10