Psychological Measurement PDF
Document Details
Uploaded by ThrilledChiasmus4427
Tags
Related
- Psychometrics 1 Introduction + History PDF
- Psychological Assessment Reviewer PDF
- Psychological Assessment Overview PDF
- Psychological Assessment: Understanding Measurement in Psychology PDF
- Introduction to Psychological Assessment in the SA Context 6th Edition PDF
- COUN 6070 Midterm Exam Study Guide PDF
Summary
This document details psychological testing, covering the history of testing, from early examples to modern applications, and examining the underlying theory behind psychological tests, their characteristics, and the different types of tests used. It also covers the history of measurement and the development of various key psychological models and instruments.
Full Transcript
Chapter 1 Psychological Testing: - Psychological testing personally affects many individuals - Good tests facilitate high-quality decisions, and bad tests facilitate low-quality decisions What is a Psychological Test? - A measurement tool or technique that requires a person to perfor...
Chapter 1 Psychological Testing: - Psychological testing personally affects many individuals - Good tests facilitate high-quality decisions, and bad tests facilitate low-quality decisions What is a Psychological Test? - A measurement tool or technique that requires a person to perform one or more behaviors in order to make inferences about human attributes, traits, or characteristics or predict future outcomes - Psychometrics: Field of psychology concerned with the quantitative and technical aspects of testing Similarities Among Psychological Tests: - Behavior: An observable and measurable action - Psychological construct: An underlying, unobservable personal attribute, trait, or characteristic of an individual that is thought to be important in describing or understanding human behavior - Ex. Intelligence, motivation, emotional states (e.g. anxiety, fear), personality trait (e.g. extroversion), abilities (e.g. athleticism) - Inference: Using evidence to reach a conclusion Differences Among Psychological Tests: - Behavior Performed - Construct Measured and Outcome Predicted - Content - Administration and Format - Scoring and Interpretation - Psychometric Quality History of Test Development: - Circa 2200-1000 BC: Chinese introduced written exams to help fill civil service positions (civil service, military affairs, agriculture) - 1791: France & Britain adopted similar examination systems to select trainees for civil service - 1860s: The United States begins civil service examinations Sir Francis Galton (1822-1911) - Charles Darwin’s cousin - Interested in individual differences and their distribution (published Hereditary Genius) - 1884-1890: Tested 17000 individuals on height, weight, motor & sensory abilities (hand strength, visual acuity, reaction time) - Demonstrated that objective tests could provide meaningful scores - First to introduce term “mental tests” 1879: - Wilhelm Wundt introduced the first psychological laboratory in Germany- tested functioning of brain and the nervous system 1890: - Sir Francis Galton & James Cattell developed a mental test to assess college students. Test included measures of strength, resistance to pain, and reaction time Alfred Binet: - Founded first French psychology journal - Developed measures of intelligence- reasoning, judgement, problem-solving abilities - 1905: Binet-Simon Scale of mental development used to classify children with intellectual disabilities in France - Adapted for use in many countries - Developed concept of “mental age” 1916: - Lewis Terman produced Stanford-Binet Intelligence Scales with intelligence quotient (IQ) index - Revised version remains one of the most widely used IQ tests today 1917: - During World War I, US commissioned Robert Woodworth to develop a test to classify incoming recruits - Personnel Data Sheet (PDS) - Woodworth Psychoneurotic Inventory - Fueled development of personality tests 1920-1940: - Development of projective tests (Rorschach/Thematic Apperception Test), personality inventories (MMPI) & SATs (college admission tests) David Wechsler 1955: - Developed Wechsler Adult Intelligence Scale (WAIS) - Intelligence is set of verbal and non-verbal skills - Most popular individually administered intelligence tests in North America - Versions for preschoolers and older children - Current version is WAIS-5 1941-1960: - Development of vocational tests partially spurred by the Great Depression - US Employment Service developed the General Aptitude Test Battery (GATB): used vocational counseling and occupational selection - APA 1953: Ethical Standards of Psychologists - 1961-1980: Emergence of neuropsychological testing - 1980- Present: Wide-spread adaptation of computerized testing 21st Century Assessments: - Psychological testing is a big, multibillion-dollar business - Computerized Adaptive Testing (CAT) - Technological Applications (e.g. neuroimaging) - Authentic: Assessments (e.g. virtual reality software, naturalistic) - Assessment methods using virtual reality (VR) are a valid and promising option - Computerized virtual reality assessment programs with real-world characteristics - Example of naturalistic assessment: Baycrest Multiple Errands Test (BMET) Three Characteristics of Psychological Tests: 1. All good tests representatively sample the behaviors thought to measure an attribute or thought to predict an outcome 2. All good tests include behavior samples that are obtained under standardized conditions 3. All good tests have rules for scoring Six Assumptions of Psychological Tests: 1. Psychological tests measure what they purport to measure or predict what they are intended to predict (test validity) 2. An individual’s behavior, and therefore test scores, will typically remain stable over time (test-retest reliability) 3. Individuals understand test items the same way 4. Individuals will report accurately about themselves 5. Individuals will report honestly their thoughts and feelings 6. The test score an individual receives is equal to his or her true score plus some error, and this error may be attributable to the test itself, the examiner, the examinee, or the environment Test Classification Methods: Maximal Performance, Behavior Observation, or Self-Report - Tests of maximal performance: require test takers to perform a particular, well-defined task & scores are determined by person’s performance in completing the task - EX. language proficiency test, driving test, test for this course, # of km you can cycle in 1 hour - Self-report tests: require test takers to report or describe their feelings beliefs, opinions, or mental states (often in true/false format) - Ex. personality inventories, opinions on a product - Behavior observation tests: involve observing people’s behavior and how people typically respond in a particular context - Ex. job or clinical interview/employer may assess employee’s competence in dealing with customers - Standardized tests: Designed to measure a specific construct, and after development, are administered to a large group of individuals who are similar to the group for whom the test has been designed (e.g. same age, same sex, same education level, etc) - Always contain specific directions for administration and scoring - Standardization sample: people who are tested to obtain data to establish a frame of reference for interpreting individual test scores - Norms: indicate the average performance of a group and the distribution of scores above and below this average - Nonstandardized tests: do not have standardization samples and are more common than standardized tests - Usually constructed by a teacher or trainer in a less formal manner for a single administration - Ex. the tests you will write in this class - Objective tests: are tests where test takers choose a response or provide a response and there are predetermined correct answers, requiring little subjective judgment of the person scoring the test - Ex. Ottawa is the capital of Canada (T/F) - Ex. Memory test (like number 8-5-3-9-2-7-1-4-9-6) - Projective tests: are those on which test takers are asked to respond to unstructured or ambiguous stimuli such as images or incomplete sentences - Ex. Rorschach, Thematic Apperception Test (TAT), Word association tests, sentence completion tests, role-playing tests Dimension Measured - Achievement tests: measures a person’s previous learning in a specific academic area - Tests of knowledge - Primarily used in educational settings - Aptitude tests: assess a test taker’s potential for learning or ability to perform in a new job or situation - Ex. LSAT, GRE - Intelligence tests: Similar to aptitude tests and assess a test taker’s ability to cope with the environment, but at a broader level - Often used to screen individuals for specific programs - Used in clinical and educational settings - Interest inventories: assess a person’s interests in educational programs for job settings and provide information for making career decisions - Not intended to predict success, but to provide a framework for career possibilities - Personality tests: measures human character or disposition - Used in clinical settings, industrial/orgranzational settings, and educational settings 18 major subject categories: - Achievement - Behavior assessment - Developmental - Education - English and language - Fine arts - Foreign languages - Intelligence and general aptitude - Mathematics - Neuropsychological - Personality - Reading - Science - Sensorimotor - Social studies - Speech and hearing - Vocations Psychological Tests and Surveys: Surveys: like psychological tests (and psychological assessments), are used to collect important information from individuals - Focus on group outcomes - Results are often reported at the question level by providing the percentage of respondents who selected each answer alternative Locating Information About Tests: - To choose an appropriate test for a particular circumstance, you must know the types of tests that are available and their merits and limitations - The Mental Measurements Yearbook (MMY) and Tests in Print (TIP) are two of the most popular references for learning more about available tests Chapter 2 Type of Decisions Made Using Psychological Test Results: Individual Decisions: - Test takers use their test scores to make decisions about themselves Institutional Decisions: - Decisions made by another entity about an individual based on their test results Comparative Decisions: - Made by comparing the test scores of a number of people to see who has the best score Absolute Decisions: - Made by determining who has the minimum score (cut score) needed to qualify Which Professionals Use Psychological Tests and for What Reasons? Educational Settings: - Administrators, teachers, school psychologists, and career counselors in schools all use psychological tests Clinical Settings: - Various clinicians and consultants administer psychological tests in clinical and counseling settings Organizational Settings: - Human resources professionals and industrial and organizational psychology practitioners use psychological tests in organizations - Most of these tests have been developed and “normed” in a Western/European context Psychological Testing Controversies: The controversy often stems from how psychological test results are used - Intelligence tests (mental/cognitive competence) - Aptitude tests (used for employment purposes, career counseling, school entry) - Integrity tests (used to determine peoples ethics, preferences, interests) Intelligence Testing: - Debate over discrimination- historically, studies on intelligence testing showed members of disadvantaged minority groups have performed below average on intelligence tests - Individuals in less affluent schools and lower socioeconomic classes perform worse on standardized tests - Results in minorities being over-represented among low-score receivers Intelligence Testing in Education: - Early 20th century: schools used IQ test results to place those with higher scores in academic programs and those with lower scores in more vocationally related programs - Nature vs nurture controversy: the heredity or experience question - 1960s: civil rights movement, activist groups demanded schools abandon use of IQ tests - More and more evidence began suggesting the role of environmental factors Intelligence Testing in the Army: Robert Yerkes - Promoted use of mental health testing in WWI - Developed Army Alpha and Best Tests- first mental tests for group settings - Alpha was used for literate groups; Beta was used for English illiterate groups - Both tests were believed to measure native intellectual abilities, unaffected by culture or educational opportunities Walter Lippman (and other scholars) - Criticized Alpha and Beta Tests due to question of hereditary or experience being a determinate for intelligence - Data collected using army recruits suggests African American men scored lower than white men on average- discrepancies also existed for certain countries (Poland, Turkey, Greece, Russia, and Italy) - Results led to anti-immigration supporters proposing the idea that immigrants were harmful to the nation Gould 1982 - Discouraged mass intelligence testing- intelligence tests were culturally biased - Alpha and Beta tests were discontinued after WWI - Nature vs nurture debate continued Continued Debate: - Herrstein & Murray (1994)- The Bell Curve: Intelligence and Class Structure in American Life - American Psychological Association (APA)- Intelligence: Knowns and Unknowns (Neisser et al., 1996) - No support for IQ difference to be due to genetics - Role of culture must be considered when interpreting test results Sources of Test Score Differences: - Age - Sex - Socioeconomic status (SES) - Level and quality of education - Culture Role of Culture: - While basic cognitive processes and characteristics are common to all human beings, these processes and functions can develop in culturally-distinctive ways and may differ in how they are expressed across cultures (Fernandez & Abe, 2017) - Cultural influences have been identified for many cognitive and behavioural processes, such as: - Attention - Perception - Face processing - Memory - Emotion perception - Executive abilities - Processing speed Sources of Bias in Psychological Testing: Construct bias: - Occurs when the construct measured is not equivalent across cultural groups (e.g. intelligence) - Example: Kenya - - Four distinct terms for intelligence: rieko (knowledge and skills), luoro (respect), winjo (comprehension of how to handle real-life problems), and paro (initiative) Content (or item) bias: - Specific items within a test do not perform the same between cultures - Example: items on a naming test (e,g, Boston Naming Test), verbal items in word lists, story recall, etc. Instrument bias: - Bias relating to the methods used to collect test data - Examples: - Lack of familiarity with formal tests and/or stimulus material (e.g. use of a pencil; time limits; multiple-choice format; Latin alphabet) - Different response styles (e.g. social desirability, extreme scoring) Administration bias: - Test administration process can create bias - Example: stereotype threat effect - Participants under threat allocate cognitive resources to regulating their emotional state, thus less resources are available for test performance (Thames et al, 2013) Enhancing cross-cultural validity of psychological tests: - Cultural decentering: Adapting available instruments so that they attain a similar meaning in a different cultural context - Cultural centering: Developing new tests for a particular culture that have meaning and validity with respect to the construct that is being tested Cultural-Reduced Tests: Developed in such a way that can be applied in different cultural settings without major adaptations - Examples: performance tests, pictorial tests, oral responses, nonverbal content tests, and solving novel problems tests - Raven’s Progressive Matrices: most widely used culture-reduced tests for ages 5+ Criticisms: - Litter empirical support that culture-reduced tests work well to reduce differences in scores - Less informative - Research suggests minorities perform similarly on these tests as they do on standard tests - Some argue that it is impossible to create a test that is equally fair to all cultures because there are no culturally-neutral symbols - These may be more damaging to minority individuals than conventional tests by creating a facade of fairness Conclusions: We must strive to understand how culture can interact with every aspect od the testing process: - Strict guidelines for translation and norming - Cultural appropriateness of test stimuli (differences in meaning/familiarity of items) - Provide a culturally sensitive environment - Interpretation of test results within cultural context - Education is key Chapter 3 - 1954- Charlotte Elmore story - 1991- Soroka v. Dayton Hudson Corporation (owns Target) - 1995- Gateway School District - 2015- Atlanta educators were convicted of conspiracy to boost student marks - 2018- Medical school in Tokyo was found to have altered entrance exam scores to limit females What Are Ethics? - Given the widespread use of tests, there is considerable potential for abuse - Ethical standards: standards of practice set up by governing bodies/associations (not law) - Violation of ethical standards carries penalties (e.g. possible expulsion of a member from the organization or negative consequences to individuals or organizations) Ethical Standards: - 1953: APA published first ethical standards for psychologists - Current version: Ethical principles of Psychologists and Code of Conduct 2010 10 Ethical Standards: - Resolving ethical issues - Competence - Human relations - Privacy and confidentiality - Advertising and other public statements - Record jeeping and fees - Education and training - Research and publication - Assessment - Therapy Professional Practice Standards The Code of Fair Testing Practices in Education: - Published by APA in 1988 (revised in 2005)- includes standards for professionals who develop and use tests in educational settings to make admission, educational assessment, educational diagnosis, and student placement decisions - Ensure that tests used in educational settings are fair to all test takers regardless of age, gender, disability, race, ethnicity, national origin, religion, sexual orientation, linguistic background, or other personal characteristics (Joint Committee on Testing Practices 2004) Standards for Educational Psychological Testing 1966;2014: - Published by American Educational Research Association (AERA) in collaboration with APA - Provide psychologists and others who develop and use standardized psychological tests and assessments with criteria for evaluating tests and testing practices - Validity, reliability, fairness in testing, test design, administration, scoring, use of norms, responsibility for test takers & users, etc. - Most explicit and comprehensive guidelines Certification and Licensure: Certification: - a professional credential individuals earn by demonstrating that they have met predetermined qualifications - Voluntary process Licensure: - is a mandatory credential individuals must obtain to practice within their professions - Purpose is to protect the health & safety of the public General Responsibilities of Test Publishers, Test Users, and Test Takers: Test Publisher Responsibilities: - All test publishers have a responsibility to demonstrate the highest level of professionalism and ethics when selling and marketing psychological tests The Sale of Psychological Tests: - Sometimes professionals need to purchase psychological tests for clinical work or research projects - - Must include statements of user qualifications: the background, training, and/or certifications the test purchaser must meet Qualification Levels: - Level A: There are no special qualifications to purchase these products - Level B: A master’s degree in psychology, education, or in a field closely related to the intended use of the assessment or certification by a professional organization that requires training and experience in a relevant area of assessment - Level C: A doctorate degree in psychology, education, or closely related field with formal training in the ethical administration, scoring, and interpretation of clinical assessments related to the intended use of the assessment The Marketing of Psychological Tests: - Market tests truthfully and provide comprehensive manuals that include psychometric information (validity & reliability) - Test security: ensure that the content of psychological tests does not become public - Include not publishing psychological tests in newspapers, magazines, and popular books Test User Responsibilities: - Test user: is a person who participates in purchasing, administering, interpreting, or using the results of a psychological test - Test taker: person who responds to test questions or whose behavior is measured - APA’s 2018 Rights and Responsibilities of Test Takers and the Standards for Educational and Psychological Testing (AERA et al 2014) include comprehensive discussion of the personal and legal responsibilities of test takers Test Taker Rights: - Issues 1: Right to Privacy - Confidentiality and anonymity - Issue 2: Right to Informed Consent - Explaining purpose of the test, why it is being administered, how third parties may be involved, any fees and limits to confidentiality - Not always required, may be implied - Increasingly important when consequences of testing to test taker increase - Issue 3: Right to Know and Understand Results - Issue 4: Right to Protection From Stigma Testing Special Populations: Test Takers With Physical or Mental Challenges: - Sensory impairments: include deafness and blindness - Motor impairments: include disabilities such as paralysis and missing limbs - Cognitive impairments: are mental challenges that include intellectual disabilities, traumatic brain injuries Testing Special Populations: Test Takers With Learning Disabilities: - Learning disability: does not have visible signs and is a difficulty in any aspect to learning - Key is to develop learning and test-taking strategies - Appropriate resources and support (not apparent if test taker does not self-declare) Test Takers From Multicultural Backgrounds: - Multicultural backgrounds: those who belong to various minority groups based on race, cultural or ethnic origin, sexual orientation, family unit, primary language, and so on - Quality and quantity of research studies still do not match the current need Chapter 4 Measurement: - Defined as the process of assessing the size, the amount, or the degree of an attribute using specific rules for transforming the attribute into numbers - Involves following rules to transform attributes into number Why assign numbers? - Quantify something (give it meaning, make sense of information) - Examine individual differences Levels of Measurement: - Refers to the relationship among the numbers we have assigned to the information - We use numbers at the item level, scale level, and test result level - For psychological tests, individual are typically combined to produce a score on a scale and/or combined to produce a test result - Types of measurement scales: nominal, ordinal, interval, and ratio Properties of Measurement: Magnitude: - Property of “moreness”, Do higher scores indicate more of something Equal intervals: - Is the difference between any two adjacent numbers referring to the same amount of difference on the attribute? Absolute zero: - Does the scale have a zero point that refers to having none of that attribute? Level of Measurement: Nominal scales: The most basic level of measurement - We assign numbers to represent groups or categories of information, Numbers in the nominal scale serve as labels only - Often used for demographic data Examples: Country of Origin - 1 = US, 2 = Mexico, 3 = Canada, 4 = Other Household pet - 0 = None, 1 = Dog, 2= Cat, 3 = Bird, 4 = Hamster, 5 = Rabbit Nominal scales: - Yield only categorical data- data grouped according to a common property - Nominal data is often reported in terms of the number of occurrences in each category (frequencies) - Example: major depression, diagnosis (5 patients), generalized anxiety diagnosis (3 patients), phobia diagnosis (2 patients) - Mode (most frequent)= major depression Ordinal scales: The second level of measurement - The numbers are assigned to order or rank objects on the attribute being measured - Example: Teacher assigns numbers to 20 students based on height, from shortest (1) to tallest (20) Example: Rank the following sports in terms of which sport you enjoy most (1) to least (5). Please use each number (1-5) once. ___ hockey ___basketball ___soccer ___tennis ___baseball Two important points regarding ordinal scales: 1. Number/rank has meaning only within the group being compared 2. Does not assume that intervals between numbers are equal ★ Most psychological tests produce ordinal data Interval scales: The third level of measurement - These scales have all the qualities of the previous scales, but each number represents a point that is an equal distance from the points adjacent to it - Example: Likert-type scale (1 = strongly disagree, 2 = disagree, 3 = somewhat agree, 4 = agree, and 5 = strongly agree) - Allow us to perform more statistical calculation (means, standard deviations) to compare performance between individuals and groups - Can also use these statistics to calculate test norms and standard scores - Disadvantage = zero point on an interval scale is arbitrary and does not indicate an absolute absence of the attribute being measured Ratio Scales: The fourth level of measurement - Ratio scales have all of the qualities of previous scales, and have an absolute zero point - Example: heart beats per minute; weight; measures of time or distance - Allows for ratio comparisons (e.g. person who weighs 160 lbs is twice as heavy as a person who weighs 80 lbs) - Most measures of psychological constructs do not meet requirements of a ratio scale Why is it important to understand the differences in the levels of measurement? - Each of these scales have different properties and allows for different interpretations - Tells you about the statistical operations you can perform and what you can and cannot say about test scores Summary: More Examples: 1. What type of measurement is most appropriate to describe evaluations of service received at a restaurant (very poor, poor, good, very good)? 2. What type of measurements is most appropriate to describe the different categories of movies (drama, comedy, adventure, documentary)? 3. A professor is interested in the relationship between the number of times students are absent from class and the grade that students receive on the final exam. In this example, what is the measurement scale for number of absences 4. A professor is interested in studying the effect od classroom temperature in degrees Celcius on student test performance. In this example, what is the measurement scale of classroom temperature? Procedures for Interpreting Test Scores: Raw scores: - are the most basic scores calculated from a psychological test - Not useful without additional interpretive information Frequency distributions: - an orderly arrangement of a group of numbers (or test scores) - Show the actual number (or percentage) of observations that fall into a range or category - Provide a summary and picture of group data Ungrouped Frequency Distributions Grouped Frequency Distributions Frequency Graphs Normal Curve Normal probability distributions: - Theoretical distributions that exist in our imagination as perfect and symmetrical, and actually consist of a family of distributions that have the same general bell shape - Many human variables fall on a normal or close to normal curve (e.g. IQ, height, weight, lifespan, shoe size) Normal Curve Characteristics Skewness The nature and extent to which symmetry is absent Positive Skewness: - When relatively few of the scores fall at the high end of the distribution - Example: positively skewed test results may indicate that a test was too difficult Negative Shewness: - When relatively few of the scores fall at the low end of the distribution - Example: negatively skewed test results may indicate that a test was too easy Bimodal Bimodal distributions: - Have two high points and result when many people score low, many people score high, and few people score in the middle Procedures for Interpreting Test Scores Descriptive Statistics - Another way we make sense of raw test scores is by calculating descriptive statistics (summary of our data) Measures of central tendency: - values that help us understand the middle of a distribution or set of scores - Mean: the average score in a distribution or sample - Median: the middle score in a group of scores - Mode: the most common score in a distribution Example: Scores on IQ test: 87, 92, 101, 101, 113, 120, 121 - Mean = 105 - Best when distributions of scores are relatively symmetric - Median = 101 - More informative that mean when dealing with skewed distributions - Mode = 101 - When there is more than one mode, the distribution is bimodal or multimodal - More informative than mean when dealing with skewed distributions Outliers: - A few values that are significantly higher or lower than most other values - Example: 2, 2, 2, 5, 100 - If a distribution of scores is symmetric and approaches the normal curve, the mean, mode, and median will be the same - Positive skewness = mean will be higher than median - Negative skewness = mean will be lower than median Measures of variability: - Describe a set of scores in numerical form - How spread out is a group of scores? - Measurement the extent to which scores differ- provide more information about individual differences - Psychological tests depend on variability across individuals Example: Three commonly used measures of variability: Range: Tell us the distance between values in our data - Highest value minus lowest value - Outliners may misrepresent the true range of the distribution Variance: Tell us whether individual scores tend to be similar to or substantially different from the mean - Large variance = individual scores differ substantially from mean - Small variance = individual scores are very similar to the mean - Dependent on the range of test scores - Ex. Range of test scores is 10 (large variance= 7; small variance = 1) - Be cautious of outliers Standard deviation: The most commonly used measure of variability in a distribution of test scores - A measure of how the average score deviates (spread away) from the mean - Square root of the variance - Expressed in the same units as the mean (easier to interpret) Example: Professor administers 100-item multiple-choice test for two Intro to Psychology classes (Class A and B). To compare how students in both classes receive a mean of 75%. However, the standard deviation for Class A is 21, while the standard deviation for Class B is 8. What does this tell us? Measures of relationship: Help is describe distributions of test scores - How are two or more variables related to each other? - Correlation coefficient: A statistic that we typically use to describe the relationship between two or more distributions of scores - Positive correlation: variables change in the same direction - Negative correlation: variables change in the opposite direction Correlation Coefficients: - Value can range from -1.00 to +1.00 - An r=0.00 indicates the absence of a linear relationship - An r= +1.00 or an r = -1.00 indicates a perfect relationship between the variables - A positive correlation means that high scores on one variable tend to go with high scores on the other variable, and that low scores on one variable tend to go with low scores on the other variable - A negative correlation means that high scores on one variable tend to go with low scores on the other variable - The further the value of r is away from 0 and the closer to +1 or -1, the stronger the relationship between the variables Scatterplots: Correlation Coefficients Pearson product-moment correlation coefficient: - When variables are on interval or ratio scales Spearman rank correlation coefficient: - When variables are on an ordinal scale Standard Scores: - Universally understood units in testing that allow test users to evaluate, or make inferences about, a person’s performance - Involves converting raw scores into more meaningful units - Allows comparison between obtained scores and scores of other individuals (e.g. the normative sample) as well as comparison among various scales and instruments Linear transformations: - Change the unit of measurement, but do not change the characteristics of the raw data in any way - Percentages, standard deviation units, Z scores, T scores Area transformations: - Change not only the unit of measurement but also the unit of reference - Percentile rank, stanines Percentages: - A number or ratio expressed as a fraction of 100 Standard deviation (SD) units: - Refer to how many standard deviations an individual score falls away from the mean Z score: - Similar to a standard deviation unit except that it is represented as a whole number with a decimal point - Helps us understand how many SDs an individual test score is above or below the distribution mean - Used to compare results to “normal population” - Mean of a distribution of test scores will always have a z score of 0 - Based on your knowledge about the population’s standard deviation and mean (typically used with sample size > 30) - Example: z score of 1 = 1 SD above mean; z score of -1 = 1 SD below mean Example: - The basic formula to calculate z score: z = (x –μ) / σ - Let’s say you have a test score of 190. The test has a mean (μ) of 150 and a SD ( σ) of 25. Assuming a normal distribution, your score would be: - z = (x –μ) / σ - = (190-150)/25 =16 T scores: - Help us understand how many SDs an individual test score is above or below the distribution mean - Conversion is made without knowledge of the population’s mean and SD (can be used with small sample size < 30) - Scale has a mean set at 50 and a SD of 10 (always positive) - Formula: T score = (z × 10) + 50 - Example: If an individual’s raw test score was converted to a T score of 70, what would that tell us? Area Transformation: Percentile: - the number of scores in a distribution that fall at or below a given raw score - Example: You scored at the 70% percentile on the GRE. What does this mean? Stanines: - A standard score scale with nine points that allows us to describe a distribution in words instead of numbers The Role of Norms: Norms: Test scores achieved by some identified group of individuals - Norm-based interpretation: Helps us answer the question. “Where does a test taker stand in comparison with a group that defines the standards” - Reported in charts/tables that describe the performance of a large group of individuals who were administered the test in the past (referred to as the norm group) - Created during test construction by administering a test to a large number of individuals who are carefully selected to be representative of the population that the test is intended to serve Type of Norms Age norms and Grade norms: - Allow us to determine at what age level or grade level an individual is performing Percentile rank: - Provides us with a way to rank individuals on a scale from 1 % to 100%, making it relatively easy to interpret Size of the norm group matters - If the norm group is small, there is a greater chance that it is not representative of the target population Norms should be up-to-date - When populations change, tests should be re-normed - Decision based on age and grade norms should be carefully made