Psychological Assessment Summary PDF

Summary

This document provides a summary of principles and techniques in psychological assessment. It discusses different types of tests, scales of measurement, and assessment tools.

Full Transcript

PRINCIPLES/INTRO o Criterion-referenced test (CRT) –...

PRINCIPLES/INTRO o Criterion-referenced test (CRT) – relate to the content of the test (ex. PSCYHOLOGICAL TEST Set of items designed to measure characteristics of human beings that pertains to behavior. Theres a certain criterion to be met) PROJECTIVE TESTS: ambiguous test stimulus, unclear Psychological assessment gathering use tools: tests, interviews, case studies, observations responses PSYCHOLOGICAL  Collaborative- assessor &assesse may work as partners  Wishes, Intrapsychic conflict, Desires, APTITUDE TESTS: predicting, acquiring skills or ASSESSMENT  Therapeutic Unconscious motives competencies  Dynamic  Subjectivity on test- interpretation/clinical Ex. Differential Aptitude Test SCALES Relate raw scores to some theoretical/empirical distribution (IRON) judgement SCORING *cut score*  Self administered/individual tests  Unlimited responses Results are integrated into a single score interpretation SCALES OF MEASUREMENT (IRON) NONPARAMETRIC: Abnormal Distribution of Scores PARAMETRIC: Normal distribution of scores (Pearson r) ASSESSMENT TECHNIQUES (D I T O) (Spearman, Chisquare in Nominal) DOCUMENTS INTERVIEWS TESTS OBSERVATION Mag. Eq.Int Abs. 0 Mag. Eq.Int Abs. 0 -records, protocols, -interview responses, -initial assessment > -behavioral observation INTERVAL: temp., time, iq / / - ORDINAL: ranking / - - collateral reports screening verification -observation checklist RATIO: weight, height / / / NOMINAL: no ranking - - - -Structured -written, verbal, visual -Unstructured DISTRIBUTION How frequent each value was obtained BIAS SOURCES Abnormal distribution- skewed RESPONSE SET A rater marks at the same place on the rating scale regardless of examinee’s performance Normal distribution- falls on central tendency (mean, median, mode) FREQUIENCY DIST’N  Mean- average score LENIENCY ERROR Give high positive ratings despite differences among examinee’s performance  SD- approximation of the average deviation around the mean; square root of SEVERITY ERROR Give low negative ratings despite differences among examinee’s performance variance CENTRAL TENDENCY Give middle range rating (Likert scale)  Z scores- difference between a score and the mean, divided by SD ERROR Falls at high end of distribution. PROXIMITY ERROR Differing skills are rated similarly when sequentially ordered as in a process POSITIVE SKEW *means test is too difficult HALO ERROR Performance rating is influenced by unrelated impressions Falls at lower ends of distribution. LOGICAL ERROR Poorly worded skill specification in an unintended manner NEGATIVE SKEW *means test is too easy LACK OF INTEREST Percentage of people whose scores on a test falls below a particular raw score Rater is not really interested in the process PERCENTILE RANK ERROR Percentile: specific scores within a distribution IDIOSYNCRATIC ERROR Unexpected and unpredictable ratings given for a number of reasons PSYCHOLOGICAL TESTS ABILITY TESTS PERSONALITY TESTS WHICH TYPE OF RELIABILITY IS APPROPRIATE? INTELLIGENCE TESTS: general potential to solve problems PERSONALITY TESTS: traits/domains/factors. Usually no Test has two forms Parallel Forms Reliability  Verbal intelligence right or wrong answers Test designed to be administered to an individual more than  Non-verbal intelligence Test-retest Reliability Ex. MBTI once Ex. WAIS, Stanford Binet Int. Scale, Culture Fare Intelligence Test Tests with factorial purity Cronbach Alpha OBJECTIVE TESTS: structured. “Yes or No” or “True or false” Test with items carefully ordered according to Split-half reliability  Standardized: test administration, scoring, difficulty interpreting scores Tests involves some degree of subjective scoring Inter-rater Reliability ACHIEVEMENT TESTS: previous learnings.  Limited number of responses Tests involves dichotomous items KR20  Measures the extent of one’s knowledge; various  Group tests academic subject Dynamic Characteristics –ever changing  NORMS: where we base the scores. Internal Consistency Ex. Stanford Achievement Test in reading characteristics that change through time or situation. o Norm-referenced test (NRT) – test takers perform better or worse (ex. Static Characteristics –Characteristics that would not Test-retest and Parallel-Form Reliabilty Age norms) vary across time VALIDITY -measures what it purports to measure CONTENT CONSTRUCT -An informed scientific idea developed or hypothesized to describe or explain a behavior; something built by -essence of what you’re measuring consist of mental synthesis. Unobservable, presupposed traits topics and processes -Required when no criterion or universe of content is accepted as entirely adequate to define the quality -often made by expert judgement being measured. -GENERALIZABILITY – examiner will generalize CRITERION RELATED -A test has a good construct validity if there is an existing psychological theory which can support what the from the sample of items to the degree of –how well a test corresponds w/ a particular criterion test items are measuring. content mastery possessed by individual -criterion – standard ->Characteristics: Relevant, valid and reliable, Uncontaminated -both logical analysis and empirical data. examinee -Criterion contamination- criterion based on predictor measures -general than specific and provide frame of reference EDUCATIONAL CONTENT VALID TEST - follows -both valid and reliable EVIDENCES: TOS -performance in the first measure should be highly correlated w/ performance on the second 1. Test is homogenous, measuring a single construct. EMPLOYMENT CONTENT VALID TEST- 2. Test score increases or decreases as a function of age, passage of time, or experimental manipulation. appropriate job related skills. Reflect the job 3. Pretest, posttest differences specification of the test 4. Test scores differ from groups. CLINICAL CONTENT VALID TEST- symptoms of 5. Test scores correlate with scores on other test in accordance to what is predicted. disorders are covered. Reflects the diagnostic UNIDIMENSIONAL- one construct MULTI-DIMENSIONAL- several constructs criteria for a test CONVERGENT DIVERGENT CONSTRUCT UNDERREPRESENTATION- failure to PREDICTIVE - The test is correlated to another measure -Also called as divergent/discriminant validity capture important components of a construct CONCURRENT -correlate what occurs in the future -correlate well; measure the same construct as to -A validity coefficient sharing little or no -correlate what is occurring now - Test scores may be obtained at one time CONSTRUCT-IRRELEVANCE VARIANCE- when other test relationship between the newly created test and scores are influence by factors irrelevant to the -Both the test scores and the criterion measures and the criterion measure may be obtained Ex. Depression test and Negative Affect Scale an existing test. are obtained at present in the future after an intervening event. construct - administered to the same subjects as the measure is -Social Desirability test and Marital Satisfaction -valid, reliable and considered a standard -performance is predicted based on one or CONTENT VALIDITY RATIO (CVR)- by Lawshe, being validated. Two measures are intended to test. proposed a structured & systematic way of -often confused with a construct validity strategy more known measured variables measure the same construct, but are NOT -test measuring something different from the -ex. MAT, GRE< GMAT establishing content validity of a test administered in the same fashion. other test measure RELIABILITY -consistency of a test -indicate how stable a test score. -should produce similar results consistently if it measures the same thing -A TEST CAN BE RELIABLE BUT WITHOUT BEING VALID TEST-RETEST RELIABILITY PARALLEL-FORM RELIABILITY INTERNAL CONSISTENCY INTER-RATER RELIABILITY -Stability (Will the scores be stable over time?” -r -“How well does each item measure the content/construct -Kappa Statistics -Pearson r -Equivalent (Are the two forms of the test equivalent?) under consideration?” -different raters, using common rating form, measure the -gives the same test to the same group of test takers on 2 -different forms of the same test are administered to the same -Used when tests are administered once. object of interest consistently. different times group at different times ->high reliability coefficient -There is consistency among items within the test. -“Are the raters consistent in their ratings?” -carryover effect: “too short”when the first testing session -Tests should contain the same number of items and the items If all items on a test measure the same construct, then it has a *Cohen’s Kappa –used to know the agreement among 2 raters influences the results of the second session and this can affect should be expressed in the same form and should cover the good internal consistency. *Fleiss’ Kappa –used to know the agreement among 3 or more the test-retest reliability of a psychological measure same type of content. The range and level of difficulty of the *SPLIT-HALF RELIABILITY- spearman brown prophecy formula raters. -practice effect: a type of carryover effect wherein the scores items should also be equal. Instructions, time limits, illustrative -splitting the items on a questionnaire or test in half, on the second test administration are higher than they were examples, format and all other aspects of the test must computing a separate score for each half, and then calculating on the first. likewise be checked for equivalence. the degree of consistency between the two scores for a group -used only in measuring traits/characteristics that do not -PROBLEM: difficulty of developing another form of participants. (Odd or even) change over time *CRONBACH ALPHA- Used when two halves of the test have -error variance: corresponds to the random fluctuations of unequal variances. performance from one test session to the other. -Provides the lowest estimate of reliability. -Average of all split halves. Ex. Likert scale items *KR20- for binary; dichotomous. Tests with right or wrong format I.DESCRIPTION OF THE GROUP II. CORRELATE VARIABLES III. COMPARISON OF GROUPS IV. PREDICTING VARIABLES A. Pair of Interval or continuous – Pearson r A. Random Sampling A. One is to one – Linear Regression B. Pair of Ordinal – Spearman Rho a. 2 separate groups w/ individual means – T-test independent B. More than one is to one (X1+ X2+ X3=Y) – Multiple Regression C. Pair of Dichotomous – KR20 measures C. Sets of predictors ; Significant or not: Hierarchical Regression a. Both alternatives b. 1 group, 2 scores – T-test dependent M1 Xq = Y D. One continuous and one dichotomous c. 3 or more groups – ANOVA Repeated M2 X q + X2 = Y a. True – Point Biserial d. 1 group, 3-more scores – ANOVA 1way M3 Xq + X2+ X3 = Y A. Central Tendency b. Artificial – Biserial e. 2 or more groups per group – ANOVA Split Plot or Mixed design D. Sets of predictors ; All significant – Stepwise Regression B. Variability E. 3 or more raters – Agreement f. 2 IV’s; 1 DV – ANOVA Two way M1 Xq* = Y C. Standard scores a. Kendal’s Coefficient Concordance i. 4 groups- 2x2 design M2 Xq* + X2* = Y D. Frequencies B. Non Random Sampling M3 Xq* + X2*+ X3* = Y a. 2 separate groups – Mann Whitney U E. Outcome is Nominal – Logistic Regression b. 1 group; 2 Ordinals – Wilcoxon Signed Rank Test c. 3 or more groups – Kruskal Wallis & H-test d. 3 or more ranks – Freidman Test e. 1 group into categories/frequencies – Chi-square ASSESSMENT TESTING - Broad array of evaluative process - Instruments that yield scored based on collected data (a subset of assessment) - Objective- answers, solves problems, decides - Obtain some measure (numerical in nature with regard to ability/attribute) - Process: Individualized process - Process: Individualized or grouped - Role of evaluator: Key in the choice of tests - Role of evaluator: May be substituted - Skills of evaluator: Educated selection of tools, and skilled - Skills: Technician-like skills - Outcome: Logical problem solving approach - Outcome: Yield test scores/series of test scores Technical Quality – to a test’s psychometric soundness TESTS ITEM - Suggests a sample of behavior of an individual. 1. Content – the subject matter of the test 3 FORMS OF ASSESSMENT (T C D) SCALE - Process by which a response can be scored. 2. Format – pertains to the form, plan, 1. THERAPEUTIC PSYCHOLOGICAL ASSESSMENT – the patient gains insight about the disorder & later develop TYPES OF PSYCHOLOGICAL TESTS structure, arrangement, and layout of psychological wellness 1. NUMBER OF TEST TAKERS test items 2. COLLABORATIVE PSYCHOLOGICAL ASSESSMENT – the patient helps the clinician to uncover the disorder a. Individual 3. Administration Procedures – 3. DYNAMIC PSYCHOLOGICAL ASSESSMENT – follow process (ABA Design) b. Group administered on a one-to-one basis or by a. Evaluation 2. VARIABLE BEING MEASURED group b. Therapy/intervention a. ABILITY 4. Scoring and Interpretation – c. Evaluation i. ACHIEVEMENT a. Score – code or summary ASSESSMENT TOOLS (O P I) ii. APTITUDE/PROGNOSTIC statement that reflects an 1. OBSERVATION – monitoring the actions of others or oneself by visual or electronic means while recording iii. INTELLIGENCE evaluation of performance on a quantitative and/or qualitative information regarding those action b. PERSONALITY test a. Natural observation - observing behaviors in setting in which behavior would typically be expected to i. OBJECTIVE/STRUCTURED b. Scoring – process of assigning such occur ii. PROJECTIVE/UNSTRUCTURED evaluative codes or statements to b. Role play test - a tool of assessment wherein examinees are directed to act as if they were in a particular iii. INTERESTS performance on tests situation, 2. PYSCHOLOGICAL TESTING – A set of items used for testing/ measuring/ determining individual difference. The process MAXIMUM PERFORMANCE TESTS of measuring psychology related variables by means of a device.  SPEED TEST – test is homogeneous, means that it is easy. Short CHARACTERISTS OF PSYCHOLOGICAL TESTING 3. INTERVIEW – gathering information through direct communication. Differ from their purpose, length, and nature. time. 1. Objective – free from the a. Panel interview – multiple interviewers  POWER TEST – few items but more complex subjective perception i. Advantage: minimizes the idiosyncratic biases of a lone interviewer REFERENCE SOURCES –sources for authoritative info about published test 2. Standardized – Uniformity exists ii. Disadvantage: costly; the use of multiple interviewers may not be even justified  Test Catalogues – brief description of test 3. Reliable – there is consistency in iii. Portfolio: sample of one’s ability and accomplishment.  Test manuals – detailed information of a test test results iv. Case history data: refers to records, transcripts, and other accounts in written, pictorial. CASE REFERENCE VOLUMES – “one-stop shopping” 4. Valid – test measures what it STUDY - a report or illustrative account concerning a person or an event that was compiled on  Journal articles purports to measure 5. Good predictor validity – test the basis of case history data  Online data bases results suggest future behavior. ETHICS IN PSYCHOLOGICAL TESTING TEST SECURITY ETHICAL CODE  The codes remind professionals that it is their responsibility to make reasonable efforts to ensure the integrity of test content  Professional guidelines for appropriate behavior and the security of the test itself. Professionals should not duplicate tests or change test materials without the permission of o American Counseling Association (2005) the publisher. o American Psychological Association (2003) o Psychological Association of the Philippines (2009) CHOOSING APPROPRIATE ASSESSMENT INSTRUMENTS TEST SCORING & INTERPRETATION  Ethical codes stress the importance of professionals choosing assessment instruments that show test worthiness, which has to  The codes highlight the fact that when scoring test and interpreting their results, professionals should reflect on how test do with the reliability, validity, cross-cultural fairness, and practicality of a test. worthiness (reliability, validity, cross-cultural fairness, and practicality) might affect the results.  Professional must take appropriate actions when issues of test worthiness arise during an assessment so that the results of the assessment are not misconstrued. MORAL ISSUES COMPETENCE IN USING TESTS  Human Rights DIVIDED LOYALTIES - Psychologist are torn whether their client is the institution or the person.  Requires adequate knowledge and training in administering an instrument.  Labeling Institutions should be informed of what they needed or answer the referral question only.  Competence to use tests accurately is another aspect that is stressed in the codes. The codes declare that professionals  Invasion of Privacy should have adequate knowledge about testing and familiarity with many test they may use.  Divided Loyalties  Responsibilities of Test Users, Test Publishers, and Test Constructors THREE-TIER SYSTEM HUMAN RIGHTS  LEVEL A - those that can be administered, scored, and interpreted by responsible nonpsychologist who have carefully read the  Right to Informed Consent manual and are familiar with the overall purpose of testing. Educational achievement tests fall into this category. Ex.  Right to know their test results and basis of any decisions that affect their lives Achievement tests, Specialized Aptitude Test  Right to know who will have access to test data and right to confidentiality of test results.  LEVEL B - requires technical knowledge of test construction and use and appropriate advanced coursework in psychology and INFORMED CONSENT related courses (Statistics, Individual Differences, and Counseling). Ex. Group Intelligence Test, Personality Test  Permission given by the client after assessment process in explained.  LEVEL C - requires an advanced degree in Psychology or Licensure as a psychologist and advanced training/supervised  Informed consent involves the right of clients to obtain information about the nature and purpose of all aspects of the experience in the particular test. Ex. Projective Test, Individual Intelligence Test, Diagnostic Test assessment process and for clients to give their permission to be assessed. CROSSCULTURAL SENSITIVITY NON-REQUIREMENT OF INFORMED CONSENT  Ethical guideline to protect clients from discrimination and bias in testing.  Mandated by the law.  The code stresses the importance of professionals being aware of and attending to the effects of age, color, cultural identity,  Testing as routine educational, institutional, or organizational activity. disability, ethnicity, gender, religion, sexual orientation, and socioeconomic status on administration and test interpretation.  Evaluation of decisional capacity. LABELING CONFIDENTIALITY - Ethical guideline to protect client information. Whether PROPER DIAGNOSIS  Effects of Labeling  Choose appropriate assessment techniques for accurate diagnosis. conducting a broad assessment of a client or giving one test, keeping information o Results to Stigmatization confidential is a critical part of the assessment process and follows similar guidelines  The codes emphasize the important role that professionals play when deciding which assessment techniques to use in o Affects one’s access to help to how one would keep information confidential in a therapeutic relationship. forming diagnosis for mental disorder and the ramification of making such diagnosis. o Make a person passive RELEASE OF TEST DATA INVASION OF PRIVACY  Test data are protected-client release required  The codes generally acknowledge that, to some degree, all test invade one’s privacy and highlight the importance of clients  The codes assert that data should only be released to others if the clients have given their consent. understanding how their privacy and highlight the importance of clients understanding how their privacy might be violated  The release of such data is generally only given to individuals who can adequately interpret the test data and to those who upon. will not misuse the information. WHEN CAN REVEAL CONFIDENTIAL INFORMATION 1. If a client is in danger of harming himself or herself or someone else; 2. If a child is a minor and the law states that parents have a right to information about their child; TEST ADMINISTRATION 3. If a client asks you to break confidentiality (for example, your testimony is needed in court)  The codes reinforce the notion that tests should be administered in a manner that is in accord with the way that they were 4. If you are bound by the law to break confidentiality (for example you are hired by the courts to assess an individual’s capacity established and standardized. to stand trial);  Alterations to this process should be noted and interpretations of test data adjusted in the testing conditions were not ideal. 5. To reveal information about your client to your supervisor in order to benefit the client; 6. When you have a written agreement from your client to reveal information to specified sources (for example, the court has asked you to send a test report to them). MORAL MODEL OF DECISION MAKING RESPONSIBILITIES OF TEST USERS, PUBLISHERS, AND CONSTRUCTORS  AUTONOMY - Respecting the client’s right of self-determination and freedom of choice.  Use assessment instrument to samples similar of the standardization group (reliability, validity, established norms)  NON-MALEFICENCE - Ensuring the professionals do no harm  Test users must possess knowledge of test construction and supporting researches of any test they administer.  BENEFICENCE - Promoting the well being of others and of society  Test developers should provide psychometric properties of the test specified scoring and administration and clear description  JUSTICE - equal and fair treatment to all people and being non discriminatory. of the normative sample.  FIDELITY - Being loyal and faithful to your commitments in the helping relationship.  VERACITY - Dealing honestly with the client. 0 NORMS AND STATISTICS USE OR TWO TYPES OF STATISTICS MEASURE OF CENTRAL TENDENCY - Statistics that indicated the average or midmost score between the extreme scores in distribution. 1. DESCRIPTIVE – used for making interpretation of test results. Provide concise description of quantitative information  MEAN – the most appropriate central tendency for interval and ratio when distribution is normal. 2. INFERENTIAL – provide conclusions regarding a population based on the observation on a sample  MEDIAN – middle score of the population SCALES OF MEASUREMENT  MODE – the most frequently occurring score in a distribution 1. NOMINAL – naming; labeling; one category does not suggest that the other is higher or lower. Ex. Gender; religion MEASUREMENT OF VARIABILITY 2. ORDINAL – observations can be ranked into order but the degree of difference is unobtainable. Ex. Position in the company  Indicates how scattered the score are distribution; how far one score is from the other. Measures the dispersion of the scores. 3. RATIO – there is magnitude, equal intervals, and true zero  Range –equal to the difference of HS to LS 4. INTERVAL – there is magnitude and equal interval; no true zero INTERQUARTILE AND SEMI-INTERQUARTILE RANGE *magnitude - “moreness”; we suggests that one is more than the other  Quartile –points that divide the distribution into 4 equal parts. *equal interval - the difference between two points at any place has the same meaning as the difference between two other points on other  Interquartilerange –difference between Q3 and Q1; represents the middle 50% of the distribution. places.  Semi-interquartilerange -(Q3 –Q1)/2 *absolute zero - zero suggest absence of the variable being measured *most psychological data are ordinal by nature but are treated as interval. *IQ are initially for classification and not for measurement (cited by Binet) FREQUENCY DISTRIBUTION - Displays scores on a variable or a measure to reflect how frequent each value was obtained. STANDARD DEVIATION - Approximation of the average deviation around the mean. Gives detail of how much above or below a score to the mean. *GRAPH - a diagram or chart illustrating data  NORMAL DISTRIBUTION – majority of the test takers are bulked at the middle of the distribution, very few test takers are at the  Histogram - graphs with vertical lines at the true limits of each test score; connected bars; used for continuous data extremes  Bar graph – used in describing frequencies; disconnected bars  POSITIVELY SKEWED – more test takers got a low score. Mean>median>mode  Frequency Polygon – points are plotted at the class mark of each of the intervals; Continuous lines  NEGATIVELY SKEWED – more test takers got a high score. Mode>median>mean KURTOSIS - The steepness of a Distribution STANDARD SCORES - A raw score that has been converted from one scale to another scale. Provide a context of comparing scores on different tests  PLATYKURTIC – flat; the difference of the number of test takers who got high and low score is not far from the number of test takers by converting scores from the two tests into z-score who got a score in equivalent to the mean  Z SCORE – Mean of 0; SD of 1. Zero plus or minus one scale. When determined, can be used to translate one scale to another.  LEPTOKURTIC – Peaked; the difference of the number of test takers who got high and low score is far from the number of test takers  T-SCORE – Mean = 50; SD = 10. Created by McCall in honor of his professor Thorndike who got a score in equivalent to the mean.  STANINE – Mean = 5; SD = 2. Used by US Airforce Assessment. Takes whole numbers 1 –9; no decimals  MESOKURTIC – Middle; the distribution is deemed normal.  DEVIATION IQ – Mean = 100; SD = 15. Used for interpreting IQ DECILE - Points where the distribution is equally divided into 10 parts. D1 –D9  STEN – Standard ten. Mean = 5.5; SD = 2  GRE/SAT – Mean = 500; SD = 100. Used for admission for graduate school and college LINEAR TRANSFORMATION - Derived formula of the Z-score to transform one score from a scale to another score. NS = SD(Z)+M NORMS - Performance by defined groups on a particular test. Transformation of raw scores in making meaningful interpretations of scores on a test PERCENTILE RANK  NORMING - process of creating norms  Tells the relative position of a test taker in a group of 100.  NORMATIVE SAMPLES - group of people whose performance on a particular test is analyzed and referred  Suggests how many samples fall below a specified score.  RACE NORMING – norming based on race/ culture  For example: if person has a score equivalent to percentile 50, it suggests that 50 percent of the test takers fall below that specific  USER NORMS - norms provided by the test manuals score.  NORMAN - the person who constructs a norm CORRELATION - Statistical tools for testing the relationship between variables. CRITERION-REFERENCE - interpretation of test is based on a certain standards.  COVARIANCE – How much two scores vary together NORM-REFERENCE - Score is interpreted based on the performance of a standardized group.  CORRELATIONAL COEFFICIENT – mathematical index that describes the direction and magnitude of a relationship. 1. DEVELOPMENTAL NORMS – indicates how far along the normal developmental path an individual has progressed. o Ranges from -1.00 to +1.00 - AGE NORMS, GRADE NORMS, ORDINAL SCALE o The nearer to 1; the stronger the relationship 2. WITHIN GROUP NORMS – individual’s performance is evaluated in terms of the performance of the most nearly comparable o The nearer to 0; the weaker the relationship standardization group. o The symbol suggests the type of relationship (negative = indirect relationship; positive = direct relationship) a. PERCENTILE CORRELATIONAL STATISTICS b. STANDARD SCORE o PEARSON PRODUCT MOMENT CORRELATION – 2 variables in interval/ratio scale c. DEVIATION IQ o SPEARMAN RHO – correlates 2 variables in ordinal scale. Also called rank-ordered correlation. 3. NATIONAL NORMS – norms on large scale samples o BISERIAL CORRELATION – 1 continuous and 1 artificial dichotomous data (dichotomy in which there are other possibilities a. SUBGROUP NORMS in a certain category) b. LOCAL NORMS o POINT BISERIAL CORRELATION – 1 continuous and 1 true dichotomous data (dichotomy in which there are only two REGRESSION (Ŷ = a + Bx) possible categories.)  Intercept (a) –the point at which the regression line crosses the Y axis o PHI COEFFICIENT – 2 dichotomous data; at least 1 true dichotomy  Regression Coefficient (b) –the slope of the regression line. o TETRACHLORIC COEFFICIENT – 2 dichotomous data; both are artificial dichotomy  Regression line –best fitting straight line through a set of points in a scatter plot o COEFFICIENT OF ALIENATION - measure of non association between two variables  Standard Error of Estimate –measure the accuracy of prediction o COEFFICIENT OF DETERMINATION - Suggests the percentage shared by two variables. The effect of one variable to MULTIPLE REGRESSION - statistical technique in predicting one variable from a series of predictors. Used to find linear combinations of three or 2 another. r=0.75; r =0.56 more variables. Applicable only when the data are all continuous. (FACTOR ANALYSIS) STANDARDIZED REGRESSION COEFFICIENT - Also called as beta weights. Tells how much a variable from a given list of variables predict a single variable. FACTOR ANALYSIS - Used to study the interrelationships among set of variables.  Factors –variables; Also called as principal components  Factor Loading –the correlations between the original and the factors; depicted through beta weights. META-ANALYSIS - Family of techniques used to statistically combine information across studies to produce single estimates of the data under study.  Effect size –the estimate of the strength of relationship or size of differences. Evaluated through correlation coefficient ITEM ANALYSIS AND ITEM CONSTRUCTION ITEM WRITING GUIDELINES: ITEM ANALYSIS - general term for a set of methods used to evaluate test items, one of the most important aspects of test construction.  Define clearly I. ITEM DIFFICULTY - measures achievement/ability, defined by the number of people who get correct items. Indicates the easiness of  Generate item pool the test. Should range from 0.30-0.70. Achievement tests make use of multiple choice because it has 0.25 chance of getting the correct  Avoid long items response  Keep level of reading difficulty appropriate for those who will complete the test. a. Optimum item difficulty - suggests the best difficulty for an item based on the number of responses.  Avoid double-barreled items (more than one ideas in one item) i. OID = (chance performance + 1)/ 2  Consider making positive & negative worded items ii. Chance performance –performance based on guessing. Can be equated by dividing 1 from the number of ITEM FORMAT - Form, plan, structure, arrangement, and layout of individual test items. distractors. I. SELECTED RESPONSE FORMAT – select a response from a set of alternative responses. b. Item difficulty index - value that describes the item difficulty for an ability test. a. DICHOTOMOUS FORMAT - offers 2 alternatives for each item. ADVANTAGE: simplicity, easy administration, quick score, c. Item endorsement index - value that describes the percentage of individuals who said endorsed an item in a personality no neutral response. DISADVANTAGE: needs more items, 50% chance of getting the correct answer; sample can test. memorize responses d. Omnibus spiral format - Items in an ability test are arranged into increasing difficulty. b. POLYCHOTOMOUS - has more than 2 alternatives. Ex. multiple choice. i. Give away items –presented near the beginning of the test to spur motivation and lessen test anxiety. i. Question - stems II. ITEM RELIABILITY - Indicates the internal consistency of a test. The higher the index; the higher the internal consistency. ii. Correct choice - keyed response a. (Item Reliability) = (SD of the item) x (item-total correlation) iii. Distractors - incorrect choices. b. Factor analysis can also be used to determine which items has more load for the whole test. iv. Cute distractors - less likely to be chosen, may affect the reliability of the test III. ITEM VALIDITY - indication of the degree to which a test is measuring what it purports to measure. Higher item-validity index; the c. LIKERT FORMAT - requires the respondent to indicate the degree of agreement with a particular attitudinal question. higher the criterion related validity for the test. Superior item format. Uses factor analysis. Can be 5-4/6 choice format *without neutral point*. Negative items are a. Item Validity = (item standard deviation) x (correlation of item and criterion) reversed score then summed up all scores. IV. ITEM DISCRIMINABILITY - How well an item performs in relation to some criterion. How adequately an item separates high scorers d. CATEGORY - asked to rate a construct from 1-10; 1-lowest and 10-highest. from low scorers on the entire test. Limits at 0.30 discrimination index. The higher the d the more high scorers answering the item e. CHECKLIST - a subject receives a long list of adjectives and indicates whether each one is characteristic of himself or correctly herself a. Extreme group method – compares people who have done well with those who have done poorly on a test f. QSORT - requires respondents to sort a group of statements into 9 piles. b. Point biserial – correlating dichotomous and continuous data. Correlates whether those who got an item correct tends to g. GUTTMAN SCALE - Items are arranged from weaker to stronger expressions of attitude, belief, or feeling being measured. have high scores as well II. COMPLETION ITEMS – complete a set of stimuli to complete a certain item. V. ITEM CHARACTERISTIC CURVE - Graphic representation of item difficulty and discrimination. Usually plots the scores at x-axis then p a. ESSAY ITEMS - samples need to respond to a question by writing a composition; used to determine the depth of and d on the y-axis. knowledge of the respondent. VI. ITEMS FOR CRITERION REFERENCE TEST - frequency polygon is created after the test given to two groups; one group that is exposed to learning unit, another group that is not exposed to learning unit EQUAL APPEARING INTERVAL a. Antimode-the score with the lowest frequency  Described by Thurstone b. Determination of cut score (passing score) for a criterion referenced test.  Scale wherein + and –items are present VII. DISTRACTOR ANALYSIS –  Adds all responses in order to transform it into interval scale. VIII. ISSUES AMONG TEST ITEMS  Uses direct estimation scaling a. ITEM FAIRNESS - Degree of an item is biased. o Direct estimation scaling - Transformation of a scale to other scales is possible due to computable value of the mean i. Biased Test Items –items that favor one particular group of examinees. Can be tested using inferential o Indirect estimation scaling - Cannot be transformed to other scales because the mean is not present. statistics among groups. COMPUTER ADAPTIVE TESTING - Also called as computer assisted testing. Interactive computer-administered test-taking process where in items b. QUALITATIVE ITEM ANALYSIS - Involve exploration of the issues through verbal means such as interviews and group presented to the test taker are based in part on the test taker’s performance on previous items discussions conducted with test takers and other relevant parties  ITEM BANK – relatively large and easily accessible collection of test questions c. THINK OUT LOUD ADMINISTRATION - Allows test takers (during standardization) to speak their mind while taking the  ITEM BRANCHING – ability of the computer to tailor the content and order of presentation of test items on the basis of response to test. Used for shedding light to the test taker’s thought process during the administration of the test. previous item d. EXPER PANELS - Guide researchers/test developers in doing sensitivity review (especially in cultural issues) SCORING ITEMS i. Sensitivity review –a study of test items typically to examine test bias, presence of offensive language and I. CUMULATIVE MODEL – the higher the score on the test, the higher the test taker is on the ability, trait, or other category. stereotypes II. CLASS SCORING/CATEGORY SCORING – test taker response earn credit toward placement in a particular class or category with other test takers whose pattern of responses is similar in some way. Most useful in diagnostic tests III. IPSATIVE SCORING – compares a test taker’s score on one scale within a test to another scale within that same test TEST DEVELOPMENT - umbrella term that goes into the process of creating a test. I. TEST CONCEPTUALIZATION - wherein idea for a particular test is conceived. Following are determined: Construct, Goal, User, Taker, Administration, Format, Response, Benefits, Costs, Interpretation. Determination whether the test would be Norms-Referenced or Criterion-Referenced a. Pilot work - May be in the form of interview in determining appropriate item for the test II. TEST CONSTRUCTION –writing test items, formatting items, scoring rules, design and building a test. a. Scaling –process of setting rules for assigning numbers in measurement. Manifested through its item format (dichotomous, polytomous, likert, catergory) b. Item pool - usually 2 times the intended final form number of items. 3 times is more advisable III. TEST TRYOUT - administration of a test to a representative sample of test takers under conditions. Issues: a. Determination of target population b. Determination of number of samples for test tryout (# of items multiplied to 10) c. Test tryout should be executed under conditions as identical as possible to the conditions under which the standardized test will be administered. IV. ITEM ANALYSIS - Entails procedures usually statistical designed to explore how individual test items work as compared to other items in the test and in the context of the whole test. (validity, reliability, item difficulty and discrimination V. TEST REVISION - Balancing of the weakness and strengths of the test/an item. a. Norming - Done after the test has been revised into acceptable levels of reliability, validity, and item index. 1 TEST ADMINISTRATION TEST UTILITY ISSUES IN TEST ADMINISTRATION The Examiner and the Subject Subject Variables USES OF TEST Training of the Test Administrator Behavior Assessment Issues  Classification –Assigning a person to one category rather than another Mode of Administration  Screening –refers to quick and simple tests or procedures identify persons who might have special characteristics or needs. EXAMINER AND THE SUBJECT - Relationship between Examiner and the test taker  Placement –sorting of persons into different programs appropriate to their needs or skills.  Wechsler Intelligence Scale for Children (WISC) enhanced rapport increased score  Selection -refers to a process whereby each person evaluated for a position will be either accepted or rejected for that position  Faulty Response Style  Diagnosis and Treatment Planning –Determination of abnormal behavior; classify using diagnostic criteria; precursor to o Acquiescent Response –tendency to have increased agreement in responding in a test or interview. Most responses are recommendation of treatment of personal distress. positive in test items regardless of item content  Self Knowledge –understanding of individual’s intelligence and personality characteristics o Socially Desirable Response Style –Present oneself in favorable or socially desirable way  Program Evaluation –Systematic assessment and evaluation of educational and social programs  Language of the test taker - Test takers proficient in two or more languages should be tested to the language they are most  Research –measures variables that suggests correlations and causal relationships comfortable. UTILITY - Usefulness or practical value of testing efficiency  Race of test taker - There are significant effects from the examiner’s race to the samples responses.  PSYCHOMETRIC SOUNDNESS – Tests should be reliable and valid for it to be used. Reliability sets the limit for Validity –the upper TRAINING OF TEST ADMINISTRATOR boundary of validity is reliability  Different assessment procedures require different levels of training.  COST – Disadvantages, losses, or expenses in both economic and non economic terms associated with testing or non testing  According to research, at least 10 practice sessions are needed to gain competency in scoring WAIS –R o ECONOMIC COST – monetary expenses (Personnel, test protocols, testing venues, etc.) MODE OF ADMINISTRATOR o NON ECONOMIC COST – intangible loss (Loss of trust from patrons due to unqualified personnel)  Self administered measures shows lower results than psychologist administered.  BENEFIT – Profits, gains, advantages for testing or non testing  Telephone interviews show better health than self administered interviews. o ECONOMIC BENEFIT – monetary benefits (Highly qualified salesperson (extroverted) can reach quotas equivalent to SUBJECT VARIABLES financial gains) I. TEST ANXIETY - anxiety based on test performance. (worry, emotionality, lack of self confidence) o NON ECONOMIC BENEFIT – Increase in quality and quantity of worker’s performance II. ILLNESS - diseases influence test taking behavior and performance (malingerers) UTILITY ANALYSIS - Family of techniques that entail a cost-benfitanalysis designed to yield information relevant to a decision about the usefulness III. HORMONES - imbalance of hormones affect mood cycles thus affect performance on a test and/or practical value of a tool of assessment IV. MOTIVATION - required to take testing as occupational requirement tend to have unreliable results  Test Comparison ERRORS OF BEHAVIORAL ASSESSMENT  Assessment tools comparison I. REACTIVITY - Being evaluated increases performance; also called as Hawthorne Effect  Addition of test/assessment tools II. DRIFT - moving away from what one has learned going to idiosyncratic definitions of behavior; this suggests that observers should be  Determination of non-testing retrained in a point of time APPROACH OF UTILITY ANALYSIS a. CONTRAST EFFECT - tendency to rate the same behavior differently when observations are repeated in the same context. I. EXPECTANCY TABLES – shows the percentage of people within specified test-score intervals who subsequently were placed in various III. EXPECTANCIES - Tendency for results be influenced by what test administrators expect to find. categories of the criterion. a. Rosenthal Effect –the test administrator’s expected results influence the result of the test. a. TAYLOR-RUSSELL TABLES - Statistical tables once extensively used to provide test users with an estimate of the extent to b. Golem Effect –negative expectations from the test administrator decreases one’s performance. which inclusion of a particular test in the selection system would improve selection decisions IV. RATING ERRORS - judgment resulting from intentional and unintentional misuse of a rating scale i. SELECTION RATIO – ratio of number of people to be hired and number of applicants a. Halo Effect –tendency to ascribe positive attribute independently of the observed behavior; suggested by Thorndike ii. BASE RATE – Lowest possible percentage of people hired expected to be successful in their job. b. Leniency Error/ Generosity Error –rater’s tendency to be too forgiving and insufficiently critical b. NAYLOR-SHINE TABLES – Indicates the mean difference of the newly selected group and the mean of the standard c. Severity Error –evaluation to be overly critical group/unselected group d. Central Tendency Error –The rater has reluctance in giving ratings at either positive or negative extreme. II. BROGDEN-CRONBACH-GLESER FORMULA (BCG FORMULA) – Calculates the dollar amount of a utility gain resulting from the use of a e. Rater’s ratings would tend to cluster in the middle of the continuum. particular selection instrument under specified conditions f. General Standoutishness–People tend to judge on the basis of one outstanding characteristic. a. UTILITY GAIN – an estimate benefit of using a particular test. b. PRODUCTIVITY GAIN – estimated increase in work output INTERVIEW - Method of getting information by talk, discussion, or direct question. TYPES OF INTERVIEWS SOURCES OF ERROR IN INTERVIEW I. DIRECTIVE INTERVIEW - Interviewer directs, guides, and controls the course of the interview. 1. INTAKE INTERVIEWS - Entails detailed questioning about the I. INTERVIEW VALIDITY II. NONDIRECTIVE INTERVIEW - the interviewee guides the interview process. present complaints a. HALO EFFECT III. SELECTION INTERVIEW - it was designed to elicit information pertaining an applicants qualifications and capabilities for particular 2. DIAGNOSTIC INTERVIEWS - assignment of DSM b. GENERAL STANDOUTISHNESS employment duties 3. STRUCTURE - predetermined, planned sequence of questions that c. CULTURAL DIFFERENCES IV. SOCIAL FACILITATION INTERVIEW - Interviewers serve as a model for the interviewee. an interviewer asks a client d. INTERVIEWER BIAS PRINCIPLES OF EFFECTIVE INTERVIEW 4. UNSTRUCTURED - no predetermined plan of questions II. INTERVIEW RELIABILITY I. PROPER ATTITUDE – 5. SEMI-STRUCTURED - Usually starts with unstructured followed by a. MEMORY AND HONESTY OF THE INTERVIEWEE a. INTERPERSONAL INFLUENCE – degree to which one person can influence another. structured targeting a diagnostic classification. b. CLERICAL CAPABILITIES OF INTERVIEWER b. INTERPERSONAL ATTRACTION – degree to which people share a feeling of understanding mutual respect similarity and 6. MENTAL STATUS EXAMINATION(MSE) - quick assessment of how MEASURING UNDERSTANDING the like. the client/patient is functioning at the time of evaluation.  LEVEL 1 – Little or no relationship to the interviewee’s II. RESPONSES TO AVOID – 7. CRISIS INTERVIEW - Usually for suicidal or abuse cases response a. JUDGEMENTAL STATEMENTS – evaluating the thoughts, feelings, or actions of another 8. CASE HISTORY INTERVIEW - Discuss developmental stages of the  LEVEL 2 – Communicates superficial awareness of the b. PROBING STATEMENTS – Demanding more information than the interviewee wishes to provide voluntarily patient meaning of a statement c. HOSTILE STATEMENTS  LEVEL 3 – Interchangeable to interviewee’s statements d. FALSE ASSURANCE  LEVEL 4 – Communicates empathy and adds minimal III. EFFECTIVE RESPONSE information/idea a. OPEN ENDED QUESTIONS b. SUMMARIZING  LEVEL 5 – Communicates empathy and adds major c. TRANSITIONAL PHASE d. CLARIFICATION RESPONSE information/idea e. PARAPHRASING AND RESTATEMENT f. EMPATHY & UNDERSTANDING 2

Use Quizgecko on...
Browser
Browser