Psychological Assessment By Cohen (Summary) PDF
Document Details
Uploaded by Deleted User
Cohen
Tags
Summary
This document appears to be a summary of a psychological assessment book written by Cohen. It provides a definition of psychological assessment and covers processes of measuring psychology-related variables, scoring, interpretation, and the role and skill of evaluators in the process.
Full Transcript
CHAPTER 1: PSYCHOLOGICAL TESTING AND ASSESSMENT CHAPTER 1: PSYCHOLOGICAL TESTING AND ASSESSMENT (2) some test administers don’t even have to...
CHAPTER 1: PSYCHOLOGICAL TESTING AND ASSESSMENT CHAPTER 1: PSYCHOLOGICAL TESTING AND ASSESSMENT (2) some test administers don’t even have to be present TESTING AND ASSESSMENT (a) usually administered to larger Roots can be found in early twentieth century in France 1905 groups Alfred Binet published a test designed to help place Paris school (b) test takers complete tasks children WW1, military used the test to screen large numbers of recruits independently quickly for intellectual and emotional problems b) Scoring and interpretation procedures WW2, military depend more on tests to screen recruits for service (1) score: a code or summary statement, PSYCHOLOGICAL usually (but not necessarily) numerical in PSYCHOLOGICAL TESTING ASSESSMENT nature, that reflects an evaluation of Process of measuring performance on a test, task, interview, or Gathering & integration of psychology-related psychology-related data for some other sample of behavior variables by means of DEFINITION the purpose of making a (2) scoring: process of assigning such devices/procedures psychological evaluation with evaluative codes/ statements to designed to obtain a accompany of tools. sample of behavior performance on tests, tasks, interviews, To answer a referral question, or other behavior samples. To obtain some gauge, solve problem or arrive at a (3) different types of score: OBJECTIVE usually numerical in decision thru the use of tools (a) cut score: reference point, nature of evaluation usually numerical, derived by Testing may be PROCESS Typically individualized judgement and used to divide individualized or group Key in the process of selecting Tester is not key into the a set of data into two or more ROLE OF tests as well as in drawing process; may be classifications. EVALUATOR conclusions substituted (i) sometimes reached SKILL OF Typically requires an educated Requires technician-like without any formal EVALUATIOR selection, skill in evaluation skills method: in order to Entail logical problem-solving Typically yields a test “eyeball”, teachers OUTCOME approach to answer the score who decide what is referral ques. passing and what is 3 FORMS OF ASSESSMENT: failing. 1. COLLABORATIVE PSYCHOLOGICAL ASSESSMENT – assessor and (4) who scores it assesse work as partners from initial contact through final feedback (a) self-scored by testtaker 2. THERAPEUTIC PSYCHOLOGICAL ASSESSMENT – self-discovery and (b) computer new understandings are encouraged throughout the assessment process (c) trained examiner 3. DYNAMIC PSYCHOLOGICAL ASSESSMENT – follows a model (a) c) psychometric soundness/ technical quality evaluation (b) intervention (a) evaluation. Provide a means for (1) psychometrics:the science of evaluating how the assesse processes or benefits from some type of psychological measurement. intervention during the course of evaluation. (a) referring to to how consistently and how Tools of Psychological Assessment accurately a psychological test A. The Test (a measuring device or procedure) measures what it purports to 1. psychological test: a device or procedure designed to measure measure. variables related to psychology (intelligence, personality, (2) utility: refers to the usefulness or aptitude, interests, attitudes, or values) practical value that a test or other tool of 2. format: refers to the form, plan, structure, arrangement, and assessment has for a particular purpose. layout of test items as well as to related considerations such as B. The Interview: method of gathering information through direct time limits. communication involving reciprocal exchange a) also referred to as the form in which a test is 1. interviewer in face-to-face is taking note of administered (pen and paper, computer, etc) a) verbal language Computers can generate scenarios. b) nonverbal language b) term is also used to denote the form or structure of (1) body language movements other evaluative tools, and processes, such as the (2) facial expressions in response to guidelines for creating a portfolio work sample interviewer 3. Ways That tests differ from one another: (3) the extent of eye contact a) administrative procedures (4) apparent willingness to cooperate (1) some test administers have an active c) how they are dressed knowledge (1) neat vs sloppy vs inappropriate (a) some test administration 2. interviewer over the phone taking note of involves demonstration of a) changes in the interviewee’s voice pitch tasks b) long pauses (b) usually one-on-one c) signs of emotion in response (c) trained observation of 3. ways that interviews differ: assessee’s performance a) length, purpose, and nature CHAPTER 1: PSYCHOLOGICAL TESTING AND ASSESSMENT b) in order to help make diagnostic, treatment, 6. interpretive report: a formal or official computer-generated selection, etc account of test performance presented in both numeric and 4. panel interview narrative form and including an explanation of the findings; a) an interview conducted with one interviewee with a) the three varieties of interpretive report are more than one interviewer (1) descriptive C. The Portfolio (2) screening 1. files of work products: paper, canvas, film, video, audio, etc (3) consultive 2. samples of ones abilities and accomplishments b) some contain relatively little interpretation and D. Case History Data: records, transcripts, and other accounts in written, simply call attention to certain high, low, or unusual pictorial or other form that preserve archival information, official and scores that needed to be focused on. informal accounts, and other data and items relevant to assessee c) consultative report: A type of interpretive report 1. sheds light on an individual's past and current adjustment as designed to provide expert and detailed analysis of well as on events and circumstances that may have contributed test data that mimics the work of an expert to any changes in adjustment consultant. 2. provides information about neuropsychological functioning d) integrative report: a form of interpretive report of prior to the occurrence of a trauma or other event that results psychological assessment, usually computer- in a deficit. generated, in which data from behavioral, medical, 3. insight into current academic and behavioral standing administrative, and/or other sources are integrated 4. useful in making judgments for future class placements 7. CAPA: computer assisted psychological assessment. (assistance 5. Case history Study: a report or illustrative account concerning to the test user not the test taker) person or an event that was compiled on the basis of case a) enables test developers to create psychometrically history data sound tests using complex mathematical procedures a) might shed light on how one individual’s personality and calculations. and particular set of environmental conditions b) enables test users the construction of tailor-made combined to produce a successful world leader. test with built-in scoring and interpretive capabilities. b) groupthink: work on a social psychological c) Pros: phenomenon: contains rich case history material on (1) test administrators have greater access to collective decision making that did not always result potential test users because of the global in the best decisions. reach of the internet. E. Behavioral Observation: monitoring the actions of others or oneself by (2) scoring and interpretation of test data visual or electronic means while recording quantitative and/or qualitative tend to be quicker than for paper-and- information regarding those actions. pencil tests 1. often used as a diagnostic aid in various settings: inpatient (3) costs associated with internet testing tend facilities, behavioral research laboratories, classrooms. to be lower than costs associated with 2. naturalistic observation: behavioral observation that takes paper-and-pencil tests place in a naturally occurring setting (as opposed to a research (4) the internet facilitates the testing of laboratory) for the purpose of evaluation and information- otherwise isolated populations, as well as gathering. people with disabilities for whom getting 3. in practice tends to be used most frequently by researchers in to a test center might prove as a hardship. settings such as classrooms, clinics, prisons, etc. (5) greener: conserves paper, shipping F. Role- Play Tests materials etc. 1. role play: acting an improvised or partially improvised part in a d) Cons: simulated situation. (1) test client integrity 2. role-play test: tool of assessment wherein assessees are (a) refers to the verification of the directed to act as if they were in a particular situation. Assessees identity of the test taker when are then evaluated with regard to their expressed thoughts, a test is administered online behaviors, abilities, etc (b) also refers to the sometimes G. Computers as tools varying interests of the test 1. local processing: on site computerized scoring, interpretation, taker vs that of the test or other conversion of raw test data; contrast w/ CP and administrator. The test taker teleprocessing might have access to notes, 2. central processing: computerized scoring, interpretation, or aids, internet resources etc. other conversion of raw data that is physically transported from (c) internet testing is only testing, the same or other test sites; contrast w/ LP and teleprocessing. not assessment 3. teleprocessing: computerized scoring, interpretation, or other 8. CAT: computerized adaptive testing: an interactive, computer- conversion of raw test data sent over telephone lines by modem administered test taking process wherein items presented to from a test site to a central location for computer processing. the test taker are based in part on the test taker's performance contrast with CP and LP on previous items 4. simple score report: a type of scoring report that provides only a) EX: on a computerized test of academic abilities, the a listing of scores computer might be programmed to switch from 5. extended scoring report: a type of scoring report that provides testing math skills to English skills after three a listing of scores AND statistical data. consecutive failures on math items. H. Other Tools CHAPTER 1: PSYCHOLOGICAL TESTING AND ASSESSMENT 1. DVD- how would you respond to the events that take place in satisfaction, personal values, quality of living conditions, the video and quality of friendships and other social support. a) sexual harassment in the workplace BUSINESS AND MILITARY SETTINGS b) respond to various types of emergencies GOVERNMENTAL AND ORGANIZATIONAL CREDENTIALING c) diagnosis/treatment plan for clients on videotape How are Assessments Conducted? 2. thermometers, biofeedback, etc protocol: the form or sheet or booklet on which a testtaker’s responses are entered. TEST DEVELOPER o term might also be used to refer to a description of a set of They are the one who create tests. test- or assessment- related procedures, as in the They conceive, prepare, and develop tests. They also find a way to sentence , “the examiner dutifully followed the complete disseminate their tests, by publishing them either commercially or protocol for the stress interview” through professional publications such as books or periodicals. rapport: working relationship between the examiner and the TEST USER examinee They select or decide to take a specific test off the shelf and use it for some purpose. They may also participate in other roles, e.g., as examiners or scorers. ASSESSEMENT OF PEOPLE WITH DISABILITITES TEST TAKER Define who requires alternate assessement, how such assessment are Anyone who is the subject of an assessment to be conducted and how meaningful inferences are to be drawn Test taker may vary on a continuum with respect to numerous from the data derived from such assessment variables including: Accommodation – adaptation of a test, procedure or situation or the o The amount of anxiety they experience & the degree to substitution of one test for another to make the assessment more which the test anxiety might affect the results suitable for an assesee with exceptional needs. o The extent to which they understand & agree with the Translate it into Braillee and administere in that form. rationale of the assessment Alternate assessment – evaluative or diagnostic procedure or process o Their capacity & willingness to cooperate that varies from the usual, customary, or standardized way a o Amount of physical pain/emotional distress they are measurement is derived either by virtue of some special experiencing accommodation made to the assesee by means of alternative o Amount of physical discomfort methods o Extent to which they are alert & wide awake Consider these four variables on which of many different types of o Extent to which they are predisposed to agreeing or accommodation should be employed: disagreeing when presented with stimulus o The capabilities of the assesse o The extent to which they have received prior coaching o The purpose of the assessment o May attribute to portraying themselves in a good light o The meaning attached to test scores Psychological autopsy – reconstruction of a deceased individual’s o The capabilities of the assessor psychological profile on the basis of archival records, artifacts, & REFERENCE SOURCES interviews previously conducted with the deceased assesee TEST CATALOUGES – contains brief description of the test TYPES OF SETTINGS TEST MANUALS – detailed information EDUCATIONAL SETTING REFERENCE VOLUMES – one stop shopping, provides detailed o achievement test: evaluation of accomplishments or the information for each test listed, including test publisher, author, degree of learning that has taken place, usually with purpose, intended test population and test administration time JOURNAL ARTICLES – contain reviews of the test regard to an academic area. ONLINE DATABASES – most widely used bibliographic databases o diagnosis: a description or conclusion reached on the basis of evidence and opinion though a process of distinguishing TYPES OF TESTS the nature of something and ruling out alternative INDIVIDUAL TEST – those given to only one person at a time conclusions. GROUP TEST – administered to more than one person at a time by o diagnostic test: a tool used to make a diagnosis, usually to single examiner identify areas of deficit to be targeted for intervention ABILITY TESTS: o informal evaluation: A typically non systematic, relatively o ACHIEVEMENT TESTS – refers to previous learning (ex. Spelling) brief, and “off the record” assessment leading to the o APTITUDE/PROGNOSTIC – refers to the potential for formation of an opinion or attitude, conducted by any learning or acquiring a specific skill person in any way for any reason, in an unofficial context o INTELLIGENCE TESTS – refers to a person’s general and not subject to the same ethics or standards as potential to solve problems evaluation by a professiomal PERSONALITY TESTS: refers to overt and covert dispositions CLINICAL SETTING o OBJECTIVE/STRUCTURED TESTS – usually self-report, o these tools are used to help screen for or diagnose require the subject to choose between two or more behavior problems alternative responses o group testing is used primarily for screening: identifying o PROJECTIVE/UNSTRUCTURED TESTS – refers to all those individuals who require further diagnostic possible uses, applications and underlying concepts of evaluation. psychological and educational tests COUNSELING SETTING o INTEREST TESTS – o schools,prisons, and governmental or privately owned institutions o ultimate objective: the improvement of the assessee in terms of adjustment, productivity, or some related variable. GERIATRIC SETTING o quality of life: in psychological assesment, an evaluation of variables such as perceived stress,lonliness, sources of CHAPTER 2: HISTORICAL, CULTURAL AND LEGAL/ETHICAL CONSIDERATIONS A HISTORICAL PERSPECTIVE testakers from young children through senior 19TH CENTURY adulthood. Tests and testing programs first came into being in China B. THE MEASUREMENT OF PERSONALITY Testing was instituted as a means of selecting who, of many o Field of psychology was being too test oriented applicants would obtain government jobs (Civil service) o Clinical psychology was synonymous to mental testing The job applicants are tested on proficiency in endeavors such as o ROBERT WOODWORTH – develop a measure of music, archery, knowledge and skill etc. adjustment and emotional stability that could be GRECO-ROMAN WRITINGS (Middle Ages) administered quickly and efficiently to groups of recruits World of evilness To disguise the true purpose of the test, Deficiency in some bodily fluid as a factor believed to influence questionnaire was labeled as Personal Data personality Sheet Hippocrates and Galen He called it Woodworth Psychoneurotic RENAISSANCE Inventory – first widely used self-report test of Christian von Wolff – anticipated psychology as a science and personality psychological measurement as a specialty within that science o Self-report test: CHARLES DARWIN AND INDIVIDUAL DIFFERENCES Advantages: Tests designed to measure these individual differences in ability and Respondents best qualified personality among people Disadvantages: “Origin of Species” chance variation in species would be selected or Poor insight into self rejected by nature according to adaptivity and survival value. One might honestly believe “survival of the fittest” something about self that isn’t true FRANCIS GALTON Unwillingness to report seemingly Explore and quantify individual differences between people. negative qualities Classify people “according to their natural gifts” o Projective test: individual is assumed to project onto some Displayed the first anthropometric laboratory ambiguous stimulus (inkblot, photo, etc.) his or her own KARL PEARSON unique needs, fears, hopes, and motivations Developed the product moment correlation technique. Ex.) Rorschack inkblot o His work can be traced directly from Galton C. THE ACADEMIC AND APPLIED TRADITIONS WILHEM MAX WUNDT First experimental psychology laboratory in University of Leipzig Culture and Assessment Focuses more on relating to how people were similar, not different from each other. Culture: ‘the socially transmitted behavior patterns, beliefs, and products of work JAMES MCKEEN CATELL f a particular population, community, or group of people’ Individual differences in reaction time Coined the term mental test Evolving Interest in Culture-Related Issues CHARLES SPEARMAN Goddard tested immigrants and found most to be feebleminded Originating the concept of test reliability as well as building the -invalid; overestimated mental deficiency, even in native English- mathematical framework for the statistical technique of factor speakers analysis Lead to nature-nurture debate about what intelligence tests actually measure VICTOR HENRI Needed to “isolate” the cultural variable Frenchman who collaborated with Binet on papers suggesting how Culture-specific tests: tests designed for use with ppl from one culture, but not mental tests could be used to measure higher mental processes from another EMIL KRAEPELIN -minorities still scored abnormally low Early experimenter of word association technique as a formal test ex.) loaf of bread vs. tortillas LIGHTNER WITMER today tests undergo many steps to ensure its suitable for said nation “Little known founder of clinical psychology” -take testtakers reactions into account Founded the first psychological clinic in the U.S. PSYCHE CATELL Some Issues Regarding Culture and Assessment Daughter of James Cattell Verbal Communication Cattel Infant Intelligence Scale (CIIS) & Measurement of Intelligence in o Examiner and examinee must speak the same language Infants and Young Children o Especially tricky with infrequently used vocabulary or RAYMOND CATTELL unusual idioms employed Believed in lexical approach to defining personality which examines o Translator may lose nuances of translation or give human languages for descriptors of personality dimensions unintentional hints toward more desirable answer 20th CENTURY o Also requires understanding of culture - Birth of the first formal tests of intelligence Nonverbal Communication and Behavior - Testing shifted to be of more understandable relevance/meaning o Different between cultures A. THE MEASUREMENT OF INTELLIGENCE o Ex.) meaning of not making eye contact o Binet created first intelligence to test to identify mentally o Body movement could even have physical cause retarded school children in Paris (individual) o Psychoanalysis: Freud’s theory of personality and o Binet-Simon Test has been revised over again psychological treatment which stated that symbolic o Group intelligence tests emerged with need to screen significance is assigned to many nonverbal acts. intellect of WWI recruits o Timing tests in cultures not obsessed with speed o David Wechsler – designed a test to measure adult o Lack of speaking could be reverence for elders intelligence test Standards of Evaluation for him Intelligence is a global capacity of the o Acceptable roles for women differ throughout culture individual to act purposefully, to think rationally o “judgments as to who might be the best employee, and to deal effectively with his environment. manager, or leader may differ as a function of culture, as Wechsler-Bellevue Intelligence Scale might judgments regarding intelligence, wisdom, courage, Wechsler Adult Intelligence Test – was revised and other psychological variables” several times and extended the age range of CHAPTER 2: HISTORICAL, CULTURAL AND LEGAL/ETHICAL CONSIDERATIONS o must ask ‘how appropriate are the norms or other The right to be informed of test findings standards that will be used to make this evaluation’ o Formerly test administrators told to give participants only positive information Tests and Group Membership o No realistic information is required ex.) must be 5’4” to be police officer- excludes cultures with short o Tell test takers as little as possible about the nature of their stature performance on a particular test. So that the examinee ex.) Jewish lifestyle not well suited for corporate America would leave the test session feeling pleased and statisfied. affirmative action: voluntary and mandatory efforts to combat o Test takers have the right also to know what discrimination and promote equal opportunity in education and recommendations are being made as a consequence of the employment for all test data Psychology, tests, and public policy The right to privacy and confidentiality o Private right: “recognizes the freedom of the individual to Legal and Ethical Condiseration pick and choose for himself the time, circumstances, and Code of professional ethics: defines the standard of care expected of members of particularly the extent to which he wishes to share or a given profession. withhold from others his attitudes, beliefs, behaviors, and opinions” The Concerns of the Public o Privileged information: information protected by law from Beginning in world war I, fear that tests were only testing the ability to being disclosed in legal proceeding. Protects clients from take tests disclosure in judicial proceedings. Privilege belongs to the Legislation client not the psychologist. o Minimum competency testing programs: formal testing o Confidentiality: concerns matters of communication programs designed to be used in decisions regarding outside the courtroom various aspects of students’ educations Safekeeping of test data: It is not a good policy o Truth-in-testing legislation: state laws to provide testtakers to maintain all records in perpetuity with a means of learning the criteria by which they are being The right to the least stigmatizing label judged o The standards advise that the least stigmatizing labels Litigation should always be assigned when reporting test results. o Daubert ruling made federal judges the gatekeepers to determining what expert testimony is admitted o This overrode the Frye policy which only admitted scientific testimony that had won general acceptance in the scientific community. The Concerns of the Profession Test-user qualifications o Who should be allowed to use psych tests o Level A: tests or aids that can adequately be administered, scored, and interpreted with the aid of the manual and a general orientation to the kind of institution or organization in which one is working o Level B: tests or aids that require some technical knowledge of test construction and use and of supporting psychological and educational fields o Level C: tests and aids requiring substantial understanding of testing and supporting psych fields with experience Testing people with disabilities o Difficulty in transforming the test into a form that can be taken by testtaker o Transferring responses to be scorable o Meaningfully interpreting the test data Computerized test administration, scoring, and interpretation o simple, convenient o easily copied, duplicated o insufficient research to compare it to pencil-and-paper versions o value of computer interpretation is questionable o unprofessional, unregulated “psychological testing” online The Rights of Testtakers the right of informed consent o right to know why they are being evaluated, how test data will be used and what information will be released to whom o may be obtained by parent or legal representative o must be in written form: general purpose of the testing the specific reason it is being undertaken general type of instruments to be administered o revealing this information before the test can contaminate the results o deception only used if absolutely necessary o don’t use deception if it will cause emotional distress o fully debrief participants CHAPTER 3: A STATISTICS REFRESHER No absolute zero point Why We Need Statistics Can take average RATIO SCALE - Statistics are important for purposes of education In addition to all the properties of nominal, ordinal, and interval o Numbers provide convenient summaries and allow us to measurement, ratio scale has true zero point evaluate some observations relative to others Equal intervals between numbers - We use statistics to make inferences, which are logical deductions Ex.) measuring amount of pressure hand can exert about events that cannot be observed directly True zero doesn’t mean someone will receive a score of 0, but means o Detective work of gathering and displaying clues – that 0 has meaning exploratory data analysis o Then confirmatory data analysis NOTE: - Descriptive statistics are methods used to provide a concise Permissible Operations description of a collection of quantitative information - Level of measurement is important because it defines which - Inferential statistics are methods used to make inferences from mathematical operations we can apply to numerical data observations of a small group of people known as a sample to a larger - For nominal data, each observation can be placed in only one group of individuals known as a population mutually exclusive category - Ordinal measurements can be manipulated using arithmetic SCALES OF MEASUREMENT - With interval data, one can apply any arithmetic operation to the differences between scores MEASUREMENT – act of assigning numbers or symbols to o Cannot be used to make statements about ratios characteristics of things according to rules. The rules serves as a guideline for representing the magnitude. It always involves error. DESCRIBING DATA SCALE – set of numbers whose properties model empirical properties Distribution: set of scores arrayed for recording or study of the objects to which the numbers are assigned. Raw Score: straightforward, unmodified accounting of performance, CONTINUOUS SCALE – interval/ratio. A scale used to measure usually numerical continuous variable. Always involves error DISCRETE SCALE – nominal/ordinal used to measure a discrete Frequency Distributions variable (ex. Female or male) Frequency Distribution: All scores listed alongside the number of ERROR – collective influence of all of the factors on a test score. times each score occurred Grouped Frequency Distribution: test-score intervals (class intervals), PROPERTIES OF SCALES replace the actual test scores - Magnitude, equal intervals, and an absolute 0 o Highest and lowest class intervals= upper and lower limits Magnitude of distribution - The property of “moreness” Histogram: graph with vertical lines drawn at the true limits of each - A scale has the property of magnitude if we can say that a particular test score (or class interval) forming TOUCHING rectangles- midpoint instance of the attribute represents more, less, or equal amounts of in center of bar the given quantity than does another instance Bar Graph: rectangles DON’T touch Equal Intervals Frequency Polygon: data illustrated with continuous line connecting - A scale has the property of equal intervals if the difference between the points where test scores or class intervals meet frequencies two points at any place on the scale has the same meaning as the A single test score means more if one relates it to other test scores difference between two other points that differ by the same number A distribution of scores summarizes the scores for a group of of scale units individuals - A psychological test rarely has the property of equal intervals Frequency distribution: displays scores on a variable or a measure to - When a scale has the property of equal intervals, the relationship reflect how frequently each value was obtained between the measured units and some outcome can be described by o One defines all the possible scores and determines how a straight line or a linear equation in the form Y=a+bX many people obtained each of those scores o Shows that an increase in equal units on a given scale reflects equal increases in the meaningful correlates of Income is an example of a variable that has a positive skew units Whenever you draw a frequency distribution or a frequency polygon, Absolute 0 you must decide on the width of the class interval - An Absolute 0 is obtained when nothing of the property being Class interval: for inches of rainfall is the unit on the horizontal axis measured exists - This is extremely difficult/impossible for many psychological qualities Measures of Central Tendency Measure of central tendency: statistic that indicates the average or NOMINAL SCALE midmost score between the extreme scores in a distribution. Simplest form of measurement The Arithmetic Mean Classification or categorization o “X bar” Arithmetic operations can be performed with nominal data o sum of observations divided by number of observations Ex.) Male or female o Sigma (X/n) Also includes test items o Used for interval or ratio data when distributions are relatively normal o Ex.) yes/no responses ORDINAL SCALE The Median Classifies in some kind of ranking order o The middle score o Used for ordinal, interval, and ratio data Individuals compared to others and assigned a rank o Especially useful when few scores fall at extremes Imply nothing about how much greater one ranking is than another The Mode Numbers/ranks do not indicate units of measure o Most frequently-occurring score No absolute zero point o Bimodal distribution- 2 scores both have highest Binet: believed that data derived from intelligence test are ordinal in frequency nature o Only common with nominal data INTERVAL SCALE Measures of Variability In addition to the features of nominal and ordinal scales, contain equal intervals between numbers CHAPTER 3: A STATISTICS REFRESHER Variability: indication of how scores in a distribution are scattered or conversion of a raw score into a number indicating how many dispersed standard deviation units the raw score is below or above the mean of The Range the distribution. o Difference between the highest and lowest scores The difference between a particular raw score and the mean divided o Quick but gross description of the spread of scores by the standard deviation The interquartile and semi-interquartile range Used to compare test scores with difference scales o Distribution is split up by 3 quartiles, thus making 4 quarters each representing 25% of the scores T-score o Q2= median Standard score system composed of a scale that ranges from 5 o Interquartile range measure of variability equal to the standard deviations below the mean to 5 standard deviations above difference between Q3 and Q1 the mean o Semi-interquartile range interquartile range divided by 2 No negatives Quartiles and Deciles o Quartiles are points that divide the frequency distribution Other Standard Scores into equal fourths SAT o First quartile is the 25th percentile; second quartile is the GRE median, or 50th percentile; third quartile is the 75th Linear transformation: when a standard score retains a direct percentile numerical relationship to the original raw score o The interquartile range is bounded by the range of scores Nonlinear transformation: required when data are not normally that represents the middle 50% of the distribution distributed, yet comparisons with normal distributions need to be o Deciles are similar but use points that mark 10% rather made than 25% intervals o Normalized Standard Scores o Stanine system: converts any set of scores into a When scores don’t fall on normal distribution transformed scale, which ranges from 1 to 9 “normalizing a distribution involves ‘stretching’ The average deviation he skewed curve into the shape of a normal o X-mean=x curve and creating a corresponding scale of o Average deviation= (sum of all deviation scores)/ total standard scores, a scale called a normalized number of scores standard score scale” o Tells us on average how far scores are from the mean The Standard Deviation o Similar to average deviation o But in order to overcome the (+/-) problem, each deviation is squared o Standard deviation: a measure of variability equal to the square root of the average squared deviations about the mean o Is square root of variance o Variance: the mean of the squares of the difference b/w the scores in a distribution and their mean Found by squaring and summing all the deviation scores and then dividing by the total number of scores o s = sample standard deviation o sigma = population standard deviation Skewness skewness: nature and extent to which symmetry is absent POSITIVE SKEW Ex.) test was too hard NEGATIVELY SKEWED ex.) test was too easy can be gauges by examining relative distances of quartiles from the median Kurtosis steepness of distribution platykurtic: relatively flat leptokurtic: relatively peaked mesokurtic: somewhere in the middle The Normal Curve Normal curve: bell-shaped, smooth, mathematically defined curve, highest at center; both sides taper as it approaches the x-axis asymptotically -symmetrical, and thus have mean, median, mode, is same Area under the Normal Curve Tails and body Standard Scores Standard Score: raw score that has been converted from one scale to another scale, where the latter has arbitrarily set mean and standard deviation -used for comparison Z-score CHAPTER 4: OF TESTS AND TESTING Tasks on some tests mimic the actual behaviors that Some Assumptions About Psychological Testing and Assessment the test user is attempting to understand - Assumption 1: Psychological Traits and States Exist o Obtained behavior is usually used to predict future behavior o Trait: any distinguishable, relatively enduring way in which one o Could also be used to postdict behavior to aid in the individual varies from another understanding of behavior that has already taken place o States: distinguish one person from another but are relatively o Tools of assessment, such as a diary, or case history data, might less enduring be of great value in such an evaluation Trait term that an observer applies, as well as - Assumption 4: Tests and Other Measurement Techniques Have Strengths strength or magnitude of the trait presumed present and Weaknesses based on observing a sample of behavior o Competent test users understand a lot about the tests they use o Trait and state definitions also refer to individual variation How it was developed make comparisons with respect to the hypothetical average Circumstances under which it is appropriate to person administer the test o Samples of behavior: How test should be administered and to whom Direct observation How results should be interpreted Analysis of self-report statements o Understand and appreciation limitations for tests they use Paper-and-pencil test answers - Assumption 5: Various Sources of Error Are Part of the Assessment Process o Psychological trait covers wide range of possible o Everyday error= misstates and miscalculations characteristics; ex: o Assessment error= a long-standing assumption that factors Intelligence other than what a test attempts to measure will influence Specific intellectual abilities performance on a test Cognitive style o Error variance: component of a test score attributable to Psychopathology sources other than the trait or ability measured o Controversy regarding how psychological tests exist Assessees themselves are sources of error variance Psychological tests exist only as constructs: an o Classical test theory (CTT)/ True score theory: assumption is informed, scientific concept developed or made that each testtaker has a true score on a test that would constructed to describe or explain a behavior be obtained but for the action of measurement error Cant see, hear or touch infer existence - Assumption 6: Testing and Assessment Can Be Conducted in a Fair and from overt behavior: refers to an Unbiased Manner observable action or the product of an o Court challenged to various tests and testing programs have observable action, including test- or sensitized test developers and users to the societal demand for assessment-related responses fair tests used in a fair manner o Traits not expected to be manifested in behavior 100% of the Publishers strive to develop instruments that are fair time when used in strict accordance with guidelines in the Seems to be rank-order stability in personality test manual traits relatively high correlations between trait o Fairness related problems/questions: scores at different time points Culture is different from people whom the test was o Whether and to what degree a trait manifests itself is intended for dependent on the strength and nature of the situation Politics - Assumption 2: Psychological Traits and States Can Be Quantified and - Assumption 7: Testing and Assessment Benefit Society Measured o Many critical decisions are based on testing and assessment o After acknowledged that psychological traits and states do exist, procedures the specific traits and states to be measured need to be defined What types of behaviors are assumed to be WHAT’S A “GOOD TEST”? indicative of trait? - Criteria Test developer has to provide test users with a clear o Clear instruction for administration, scoring, and interpretation operational definition of the construct under study - Reliability o After being defined, test developer considers types of item o A “good test”/measuring tool reliable content that would provide insight into it Involves consistency: the prevision with which the Ex: behaviors that are indicative of a particular trait test measures and the extent to which error is o Should all questions be weighted the same? present in measurements Weighting the comparative value of a test’s items Unreliable measurement needs to be avoided comes about as the result of a complex interplay - Validity among many factors: o Test is considered valid if it doesn’t indeed measure what it Technical considerations purports to measure The way a construct has been defined (for o If there is controversy over the definition of a construct then the particular test) validity is sure to be criticized as well Value society (and test developer) attach o Questions regarding validity focus on the items that collectively to behaviors evaluated make up the test o Need to find appropriate ways to score the test and interpret Adequately sample range of areas to measure results construct Cumulative scoring: test score is presumed to Individual items contribute to or take away from represent the strength of the targeted ability or trait test’s validity or state o Validity may also be questioned on grounds related to the The more the testtaker responds in a interpretation of test results particular direction (as keyed by test - Other Considerations manual) the higher the testtaker is o “Good test” one that trained examiners can administer, score presumed to possess the targeted trait or and interpret with minimum difficulty ability Useful - Assumption 3: Test-Related Behavior Predicts Non-Test-Related Behavior Yields actionable results that will ultimately benefit o Objective of test is to provide some indication of some aspects individual testtakers or society at large of the examinee’s behavior CHAPTER 4: OF TESTS AND TESTING o Purpose of test compare performance of testtaker with o STANDARD ERROR OF THE DIFFERENCE – estimate how performance of other testtakers (contains adequate norms: large a difference between two scores should be before normative data) the difference is considered statistically significant Normative data provides standard with which results - Developing norms for a standardized test measured can be compared o Establish a standard set of instructions and conditions NORMS under which the test is given makes scores of normative - Norm-referenced testing and assessment: method of evaluation and sample more comparable with scores of future testtakers a way of deriving meaning from test scored by evaluating an o All data collected and analyzed, test developer will individual testtaker’s score and comparing it to scores of a group of summarize data using descriptive statistics (measures of testtakers central tendency and variability) - Meaning of individual score is relative to other scores on the same Test developer needs to provide precise test description of standardization sample itself - Norms (scholarly context): usual, average, normal, standard, expected Descriptions of normative samples vary widely or typical in detail - Norms (psychometric context): the test performance data of a Tracking particular group of testtakers that are designed for use as a reference - Comparisons are usually with people of the same age when evaluating or interpreting individual test scores - Children at the same age level tend to go through different growth - Normative sample: group of people whose performance on a patterns particular test is analyzed for reference in evaluation the performance - Pediatricians must know the child’s percentile within a given age of individual testtakers group o Yields a distribution of scores - This tendency to stay at about the same level relative to one’s peers is - Norming: refers to the process of deriving norms; particular type of known as tracking (ie height and weight) norm derivation - Diets may alter this “track” o Race norming: controversial practice of norming on the - Faults: some believe there is an analogy between the rates of physical basis of race or ethnic background growth and the rates of intellectual growth - Norming a test can be very expensive user norms/program norms: o Some say that children learn at different rates consist of descriptive statistics based on a group of testtakers in a o This system discriminates against some children given period of time rather than norms obtained by form sampling methods TYPES OF NORMS - Sampling to Develop Norms o Classification of norms ex: age, grade, national, local, - Standardization: process of administering a test to a representative percentile, etc. sample of testtakers for the purpose of establishing norms o PERCENTILES o Standardized when has clear, specified procedures Median= 2nd quartile: the point at or below which - Sampling 50% of the scores fell and above which the remaining o Developer targets defined group as population test 50% fell designed for Might wish to divide distribution of scores into All have at least one common, observable deciles (instead of quartiles): 10 equal parts characteristic The Xth percentile is equal to the score at or below o To obtain distribution of scores: which X% of scores fall Test administered to everyone in targeted Percentile: an expression of the percentage of population people whose score on a test or measure falls below Administer test to a sample of the population a particular raw score Sample: portion of universe of Percentage correct: refers to the people deemed to be representative distribution of raw scores (number of of whole population items that were answered correctly) Sampling: process of selecting the multiplied by 100 and divided by the total portion of universe deemed to be number of items *not same as percentile representative of whole Percentile is a converted score that refers o Subgroups within a defined population may differ with to a percentage of testtakers respect to some characteristics and it is sometimes Percentiles are easily calculated popular way of essential to have these differences proportionately organizing test related data represented in sample Using percentiles with normal distribution real Stratified sampling: sample reflects statistics of differences between raw scores may be minimized whole population; helps prevent sampling bias near the ends of the distribution and exaggerated in and ultimately aid in interpretation of findings the middle (worsens with highly skewed data) Purposive sampling: arbitrarily select sample o AGE NORMS we believe to be representative of population Age-equivalent scores/age norms: indicate the Incidental/convenience sampling: sample that average performance of different samples of is convenient or available for use testtakers who were at various ages at the time the Very exclusive (contain exclusionary test was administered criteria) Age norm tables for physical - TYPES OF STANDARD ERROR: characteristics o STANDARD ERROR OF MEASUREMENT – estimate the “Mental” age vs. physical age (need to extent to which an observed score deviates from a true identify mental age) score o GRADE NORMS o STANDARD ERROR OF ESTIMATE – In regression, an Grade norms: designed to indicate the average test estimate of the degree of error involved in predicting the performance of testtakers in a given school grade value of one variable from another Developed by administering the test to o STANDARD ERROR OF THE MEAN – a measure of sampling representative samples of children over a error range of consecutive grades Mean or median score for children at each grade level is calculated CHAPTER 4: OF TESTS AND TESTING Great intuitive appeal CORRELATION Do not provide info as to the content or Degree and direction of correspondence between two things. type of items that a student could or Correlation coefficient (r) – expresses a linear relationship between could not answer correctly two continuous variables Developmental norms: (ex: grade norms and age o Numerical index that tells us the extent to which X and Y norms) term applied broadly to norms developed on are “co-related” the basis of any trait, ability, skill, or other Positive correlation: high scores on Y are associated with high scores characteristic that is presumed to develop, on X, and low scores on Y correspond to low scores on X deteriorate, or otherwise be affected by Negative correlation: higher scores on Y are associated with lower chronological age, school grade, or stage of life scores on X, and vise versa o NATIONAL NORMS No correlation: the variables are not related National norms: derived from a normative sample -1 to 1 that was nationally representative of the population Correlation does not imply causation. at the time the norming study was conducted o Ie weight, height, intelligence o NATIONAL ANCHOR NORMS Many different tests purporting to measure the same PEARSON r human characteristics or abilities Pearson Product Moment Correlation Coefficient National anchor norms: equivalency tables for scores Devised by Karl Pearson on tests that purpose to measure the same thing Relationship of two variables are linear and continuous Could provide the tool for comparisons Coefficient of Determination (r2) – indication of how much variance is Provides stability to test scores by shared by the X and the Y variables anchoring them to other test scores SPEARMAN RHO Begins with the computation of percentile Rank order correlation coefficient norms for each test to be compared Developed by Charles Spearman Equipercentile method: equivalency of Used when the sample size is small and when both sets of scores on different tests is calculated with measurements are in ordinal form (ranking form) reference to corresponding percentile BISERIAL CORRELATION scores expresses the relationship between a continuous variable and an o SUBGROUP NORMS artificial dichotomous variable Normative sample can be segmented by an criteria o If the dichotomous variable had been true then we would initially used in selecting subjects for sample use the point biserial correlation Subgroup norms: result of segmentation; more o When both variables are dichotomous and at least one of narrowly defined the dichotomies is true, then the association between o LOCAL NORMS them can be estimated using the phi coefficient Local norms: provide normative info with respect to o If both dichotomous variables are artificial, we might use a the local population’s performance on some test special correlation coefficient – tetrachoric correlation Typically developed by test users themselves REGRESSION - Fixed Reference Group Scoring Systems analysis of relationships among variables for the purpose of o Norms provide context for interpreting meaning of a test score understanding how one variable may predict another o Fixed reference group scoring system: distribution of scored SIMPLE REGRESSION: one IV (X) and one DV (Y) obtained on the test from one group of testtakers (fixed - Regression line: defined as the best-fitting straight line through a set reference group) is used as the basis for the calculation of test of points in a scatter diagram scores for future administrators on the test o Found by using the principle of least squares, which Ex: SAT test (developed in 1962) minimizes the squared deviation around the regression NORM-REFERENCED VERSUS CRITERION-REFERENCED EVALUATION line - Way to derive meaning from test score is to evaluate test score in Primary use: To predict one score or variable from another relation to other scores on same test (Norm-referenced) Standard error of estimate: the higher the correlation between X and - Criterion-referenced: derive meaning from a test score by evaluating Y, the greater the accuracy of the prediction and the smaller the SEE. it on the basis of whether or not some criterion has been met MULTIPLE REGRESSION: The use of more than one score to predict Y. o Criterion: a standard on which a judgment or decision may be based Regression coefficient: (b) slope of the regression line - Criterion-referenced testing and assessment: method of evaluation o Sum of squares for the covariance to the sum of squares and way of deriving meaning from test scores by evaluating an for X individual’s score with reference to a set standard (ex: to drive must o Sum of squares is defined as the sum of the squared past driving test) deviations around the mean o Derives from values and standards of an individual or o Covariance is used to express how much two measures organization covary, or vary together o Also called Domain/content-referenced testing and Slope describes how much change is expected in Y each time X assessment increases by one unit o Critique: if followed strictly, important info about Intercept (a) is the value of Y when X is 0 individual’s performance relative to others can be o The point at which the regression line crosses the Y axis potentially lost THE BEST-FITTING LINE Culture and Inference The difference between the observed and predicted score (Y-Y’) is - Culture is a factor in test administration, scoring and interpretation called the residual - Test user should do research in advance on test’s available norms to The best-fitting line is most appropriately found by squaring each check how appropriate it is for targeted testtaker population residual o Helpful to know about the culture of the testtaker Best-fitting line is obtained by keeping these squared residuals as small as possible CORRELATION AND INFERENCE o Principle of least squares: Correlation is a special case of regression in which the scores for both variables are in standardized, or Z, units CHAPTER 4: OF TESTS AND TESTING In correlation, the intercept is always 0 - Third variable, ie poor social adjustment, causes TV viewing and Pearson product moment correlation coefficient is a ratio used to aggression determine the degree of variation in one variable that can be - External influence is the third variable estimated from knowledge about variation in the other variable Restricted Range Testing the Statistical Significance of a Correlation Coefficient - Correlation and regression use variability on one variable to explain - Begin with the null hypothesis that there is no relationship between variability on a second variable variables - Restricted range problem: correlation requires variability; if the - Null hypothesis rejected is there is evidence that the association variability is restricted, then significant correlations are difficult to between two variables is significantly different from 0 find - t distribution is not a single distribution, but a family of distributions, Mulvariate Analysis each with its own degrees of freedom - Multivariate analysis considers the relationship among combinations - Degrees of freedom are defined as the sample size minus 2, or N-2 of three of more variables - Two-tailed test General Approach - Linear combination of variables is a weighted composite of the How to Interpret a Regression Plot original variables - Regression plots are pictures that show the relationship between - Y’ = a+b1X1 +