Reviewer for Psych Assessment PDF

1 PSYCH ASSESSMENT *GROUP TESTS- Are those that can be administered to a group and usually done on a paper-pencil method; and can TOPIC 1: OVERVIEW PSYCHOLOGICAL TESTING AND...

1 PSYCH ASSESSMENT *GROUP TESTS- Are those that can be administered to a group and usually done on a paper-pencil method; and can TOPIC 1: OVERVIEW PSYCHOLOGICAL TESTING AND be administered individually. Examples: Achievement ASSESSMENT Tests, RPM, MBTI =PSYCHOLOGICAL TESTS= *SPEED TESTS- Are administered under a prescribed time Are objective and standardized measure of a sample of limit usually for a short period of time and not enough for human behavior (Anastasi & Urbina, 1997). an individual to finish answering the entire test. The level These are instruments with three defining characteristics: of difficulty is the same for all items. Example is the SRA o It is a sample of human behavior. Verbal Test o The sample is obtained under standardized conditions. *POWER TESTS- Are those that measure competencies and o There are established rules for scoring or for obtaining abilities. The time limit prescribed is usually enough for quantitative information from the behavior sample. one to accomplish the entire test. Example is the *PYSCHOLOGICAL MEASUREMENT- The process of Differential Aptitude Test assigning numbers (i.e. test scores) to persons in such a *VERBAL TESTS- Are instruments that involve words to way that some attributes of the person being numbers. measure a particular domain. Example, admission tests =GENERAL TYPES OF PSYCHOLOGICAL TESTS= for many educational institutions. These tests are categorized according to the manner of *NONVERBAL TESTS- Are instruments that do not use administration, purpose, and nature. words, rather, they use geometrical drawing or patterns. Examples, RPM. *Administration- Individual; Group *COGNITIVE TESTS- Are those that measure thinking skills. *Item Format- Objective; Projective Examples are the broad range of intelligence and achievement tests. *Response Format- Verbal; Performance *AFFECTIVE TESTS- Are those that measure personality, *Domain Measures- Cognitive; Affective interests, values, etc. Examples: Life Satisfaction Scale, =T Y P E S O F T E S T S= 16PF, MBTI *STANDARDIZED TESTS- Are those instruments that have =TESTING OF HUMAN ABILITY= prescribed directions for administration, scoring, and interpretation. Examples: MBTI, MMPI, SB-5, WAIS *NON-STANDARDIZED TESTS (Informal Tests)- Are exemplified by teacher-made tests either for formative or summative evaluation of student performance. Examples: Prelim Exam, Quizzes *NORM-REFERENCED TESTS- Instruments whose score interpretation is based on the performance of a particular group. For example, Ravens Progressive Matrices (RPM) *TESTS FOR SPECIAL POPULATION-Developed similarly for which has several norm groups which serve as a use with persons who cannot be properly or adequately comparison group for the interpretation of scores. examined with traditional instruments, such as the individual scales. *CRITERION-REFERENCED TESTS- These are the measures Follows performance, or nonverbal tasks. whose criteria for passing or failing have decided before hand. For example, a passing score of 75%. =PERSONALITY TESTS= *INDIVIDUAL TESTS- Are instruments that are administered These are instruments that are used for the measurement one-on-one, face-to-face. Examples: WAIS, SB-5, Bender- of emotional, motivational, interpersonal, and attitudinal Gestalt characteristics. 2 *Approaches to the Development of Personality = GENERAL PROCESS OF ASSESSMENT= Assessment Empirical Criterion Keying Factor Analysis Personality Theories =R E A S O N S F O R U S I N G T E S T S= *PROJECTIVE TECHNIQUES- Are relatively unstructured tasks; a task that permits almost an unlimited variety of possible responses; a disguised procedure task. Examples: Rorschach Inkblot Test Thematic Apperception Test Sentence Completion Test Drawing Test *PSYCHOLOGICAL TESTING - The process of measuring psychology-related variables by means of devices or =THE TOOLS OF PSYCHOLOGICAL ASSESSMENT= procedures designed to obtain a sample of behavior. *THE TEST- A test is defined simply as a measuring device *PSYCHOLOGICAL ASSESSMENT- The gathering and or procedure. integration of psychology-related data for the purpose of *PSYCHOLOGICAL TEST- Refers to a device or procedure making a psychological evaluation that is accomplished designed to measure variables related to psychology: through the use of tools such as tests, interviews, case Intelligence studies, behavioral observation, and especially designed Personality apparatuses and measurement procedures. Aptitude Interests Attitudes Values = DIFFERENCES IN PSYCHOLOGICAL TESTS AND OTHER TOOLS= =DIFFERENT APPROACHES TO ASSESSMENT= *COLLABORATIVE PSYCHOLOGICAL ASSESSMENT- The assessor and the assesse work as partners from initial contact through final feedback *THERAPEUTIC PSYCHOLOGICAL ASSESSMENT- *THE INTERVIEW- A method of gathering information Therapeutic self-discovery and new understandings are through direct communication involving reciprocal encouraged throughout the assessment process exchange. *DYNAMIC ASSESSMENT-The interactive approach to Differences: purpose, length, and nature. psychological assessment that usually follows a model of: Uses: diagnosis, treatment, selection, decisions. evaluation> intervention> evaluation. Interactive, changing, and varying nature of assessment 3 * THE PORTFOLIO- Contains a sample of one’s ability and * Test users- Select or decide which specific test/s will be accomplishment which can be used for evaluation. used for some purposes. May also act as examiners or scorers. *THE CASE HISTORY DATA- Are the records, transcripts, and other accounts in written, pictorial, or other form that *Test sponsors- Institutional boards or agencies who preserve archival information, official and informal contract test developers or publishers for various testing accounts, and other data and items relevant to the assesse. services. *Case Study or Case History- a report of illustrative *Test administrators or examiner- Administer the test account concerning a person or an event that was compiled either to one individual at a time or to groups. on the basis of case history data. *Test takers- Take the test by choice or necessity. *BEHAVIORAL OBSERVATION- Monitoring the actions of *Test scorers- Tally the raw scores and transform into test others or oneself by visual or electronic means while scores through objective or mechanical scoring or through recording quantitative or qualitative information regarding the application of evaluative judgment. the actions. *Test score interpreters- Interpret test results to Aids the development of therapeutic intervention which is consumers such as; individual test takers or their relatives, extremely useful in institutional settings such as schools, other professionals, or organizations of various kinds. hospitals, prisons, and group homes. =SETTINGS WHERE ASSESSMENTS ARE CONDUCTED= *THE ROLE-PLAY TESTS- Acting in improvised or partially improvised part in a simulated situation. Assesses are *EDUCATIONAL SETTINGS- Helps to identify children who directed to act as if they were in a particular situation. may have special needs. Diagnostic tests and/or Evaluation: expressed thoughts, behaviors, abilities, and achievement tests other related variables. Can be used as both a tool of assessment and a measure of *CLINICAL SETTINGS - For screening and or diagnosing outcome. behavioral problems. May be intelligence, personality, neuropsychological tests, or other specialized instruments *THE COMPUTERS AS TOOLS- Can serve as test depending on the presenting or suspected problem area. administrators (online or offline) and as highly efficient test scores. *COUNSELING SETTINGS- Aims to improve the assesee’s adjustment, productivity, or some related variables. May *Interpretive Reports- distinguished by its inclusion of be personality, interest, attitude, and values tests. numerical or narrative interpretive statements in the report. *GERIATRIC SETTINGS- Quality of life assessment which measures variables related to perceived stress, loneliness, *Consultative Reports- written in language appropriate sources of satisfaction, personal values, quality of living for communication between assessment professionals and conditions, and quality of friendships and social support. may provide expert opinion concerning data analysis. *BUSINESS & MILITARY SETTINGS- Decision making about *Integrative Report- employs previously collected data the careers of the personnel. into the test report. *GOVERNMENTAL & ORGANIZATIONAL CREDENTIALING- =PARTICIPANTS IN THE TESTING PROCESS AND THEIR Licensing or certifying exams. ROLES= *ACADEMIC RESEARCH SETTINGS- Sound knowledge of *Test authors and developers- Conceive, prepare, and measurement principles and assessment tools are required develop test. Also find a way to disseminate their tests. prior to research publication. *Test publishers- Publish, market, and sell tests, thus controlling their distribution. *Test reviewers- Prepare evaluative critiques of tests based on technical and practical merits. 4 =SOME TERMS TO REMEMBER= ONLINE DATABASES- maintained by APA; PsycINFO, ClinPSYC, PsyARTICLES, etc. *PROTOCOL- Typically refers to the form or sheet or booklet on which a test taker's responses are entered. =A BRIEF HISTORY OF PSYCHOLOGICAL TESTING= May also be used to refer to a description of a set of test-or *20TH Century France- The roots of contemporary assessment-related procedures. psychological testing and assessment. *RAPPORT- The working relationship between the *1905- Alfred Binet and a colleague published a test to help examiner and the examinees. place Paris schoolchildren in classes. *ACCOMMODATION- The adaptation of a test, procedure, *1917 World War I- The military needed a way to screen or situation or the substitution of one test for another, to large numbers of recruits quickly for intellectual and make the assessment more suitable for an assessee emotional problems. without an exceptional need. *World War II- Military depend even more on psychological *ALTERNATE ASSESSMENT - An evaluative or diagnostic tests to screen recruits for the service. procedure or process that varies from the usual, customary, or standardized way a measurement is derived. *Post-war- More and more tests purporting to measure an Alternative methods designed to measure the same ever-widening array of psychological variables were variables. developed and used. = TEST USER QUALIFICATION LEVELS= =PROMINENT FIGURES IN THE HISTORY OF PSYCHOMETRICS= =I N D I V I D U A L D I F F E R E N C E S= In spite of our similarities, no two humans are exactly the same. *CHARLES DARWIN- Believed that some of the individual differences are more adaptive than others. Individual differences, over time, lead to more complex, intelligent organisms. *FRANCIS GALTON- Cousin of Charles Darwin He was an applied Darwinist. He claimed that some people possessed characteristics that made them more fit = SOURCES OF INFORMATION ABOUT TESTS= than others. TEST CATALOGUES- usually contain only a brief Wrote Hereditary Genius (1869). description of the test and seldom contain the kind of Set up an anthropometric laboratory at the International detailed technical information. Exposition of 1884. Noted that persons with mental retardation also tend to TEST MANUALS- detailed information concerning the have diminished ability to discriminate among heat, cold, development if a particular test and technical information and pain. relating to it. REFERENCE VOLUMES- periodically updated which provides detailed information for each test listed; Mental Measurements Yearbook. JOURNAL ARTICLES- contain reviews of the test, updated, or dependent studies of its psychometric soundness, or examples of how the instrument was used in either research or applied context. 5 *Charles Spearman- Had been trying to prove Galton’s *GUY MONTROSE WHIPPLE- Was influenced by Fechner hypothesis concerning the link between intelligence and and was a student of Titchner. visual acuity. Pioneered the human ability testing. Expanded the use of correlational methods pioneered by Conducted seminar that changed the field of Galton and Karl Pearson, and provided the conceptual psychological testing (Carenegie Institute in 1918). foundation for factor analysis, a technique for reducing a Because of his criticisms, APA issued its first standards for large number of variables to a smaller set of factors that professional psychological testing. would become central to the advancement of testing and Construction of Cernegie Interest Inventory – Strong trait theory. Vocational Interest Blank. Devised a theory of intelligence that emphasized a *LOUIS LEON THURSTONE- Was a large contributor to general intelligence factor (g) present in all intellectual factor analysis and attended Whipple’s seminars. activities. His approach to measurement was called the Law of *KARL PEARSON-Famous student of Galton. Comparative Judgment. Continued Galton’s early work with statistical regression. =INTEREST IN MENTAL DEFICIENCY= Invented the formula for the coefficient of correlation; Pearson’s r. *JEAN ETIENNE ESQUIROL- A French physician and was the favorite student of Phillippe Pinel– the founder of *JAMES MCKEEN CATTELL- The first person who used the Psychiatry. term mental test. Was responsible for the manuscript on mental Made a dissertation on reaction time based upon Galton’s retardation which differentiated between insanity and work. mental retardation. Tried to link various measures of simple discriminative, perceptive, and associative power to independent *EDOUARD SEGUIN- A French physician who pioneered in estimates of intellectual level, such as school grades. training mentallyretarded persons. Rejected the notion of incurable mental retardation (MR). =EARLY EXPERIMENTAL PSYCHOLOGISTS= 1837, he opened the first school devoted to teaching Early 19th century, scientists were generally interested in children with MR. identifying common aspects, rather than individual 1866, he conducted experiments with physiological differences. training of MR which involved sense/muscle training used Differences between individuals were considered as until today and leads to nonverbal tests of intelligence source of error, which rendered human measurement (Seguin Form Board Test). inexact. *EMIL KRAPELIN- Devised a series of examinations for *JOHAN FRIEDRICH HERBART- Proposed the Mathematical evaluating emotionally impaired individuals. models of the mind. The founder of Pedagogy as an academic discipline. =INTELLIGENCE TESTING= *ERNST HEINRICH WEBER- Proposed the concepts of *ALFRED BINET- Appointed by the French government to sensory thresholds and Just Noticeable Differences (JND). develop a test that will place Paris schoolchildren to special classes who failed to respond to normal schooling. *GUSTAV THEODOR FECHNER- Involved in the Mathematic Devised the first intelligence test: the Binet-Simon scale sensory thresholds of experience. of 1905. Founder of Psychophysics, and one of the founders of The scale has standardized administration and used a Experimental Psychology. standardization sample. Weber-Fechner-Law was the first to relate to situation and stimulus. It states that the strength of a sensation *LEWIS MADISON TERMAN- Translated the Binet-Simon grows as the logarithm of the stimulus intensify. Scales in English to be used in the US and in 1916, it was Was considered by some as the founder of published as the StanfordBinet Intelligence Scale. Psychometrics. SB scale became more psychometrically sound and the term IQ was introduced. IQ = Mental Age / Chronological Age X 100 6 *ROBERT YERKES- President of the APA who was Started with great suspicion; first serious study made in commissioned by the US Army to develop structured tests 1932. of human abilities. Symmetric colored and black & white inkblots. WW1 arose the need for large-scale group administered Introduced to the US by David Levy. ability tests by the army. *THEMATIC APPERCEPTION TEST (TAT)- Was developed in Army Alpha – verbal; administered to literate soldiers. 1935 and composed of ambiguous pictures that were Army Beta – nonverbal; administered to illiterate soldiers. considerably more structured than the Rorschach. *DAVID WECHSLER- Subscales on his tests were adopted Subjects are shown pictures and asked to tell a story from the army scales. including: Produces several scores of intellectual ability rather than o What has led up to the event shown; Binet’s single scores. o What is happening at the moment; Evolved to the Wechsler Series of intelligence tests (WAIS, o What the characters are feeling and thinking; and o What WISC, etc.) the outcome of the story was. =PERSONALITY TESTING- =PERSONALITY TESTING: Second Structured Test= These tests were intended to measure personality traits. *MINNESOTA MULTIPHASIC PERSONALITY INVENTORY (MMPI 1943)- Tests like Woodworth made too many *TRAITS- are relatively enduring dispositions (tendencies to assumptions. act, think, or feel in a certain manner in any given The meaning of the test response could only be circumstance). determined by empirical research. *1920s- The rise of personality testing MMPI-2 and MMPI-A are most widely used. *1930s- The fall of personality testing *RAYMOND B. CATTELL: The 16 PF- The test was based on factor analysis – a method for finding the minimum number *1940s- The slow rise of personality testing of dimensions or factors for explaining the largest number =PERSONALITY TESTING: First Structured Test= of variables. J. R. Guilford, the first to apply factor analytic approach to *WOODWORTH PERSONAL DATA SHEET- The first test construction. objective personality test meant to assist in psychiatric interviews. =THE RISE OF MODERN PSYCHOLOGICAL TESTING= Was developed during the WW1. *1900s- Everything necessary for the rise of the first truly Designed to screen out soldiers unfit for duty. modern and successful psychological test was in place. Mistakenly assume that subject’s response could be taken at face value. *1904- Alfred Binet was appointed to devise a method of evaluating children who could not profit from regular classes and would require special education. *1905- Binet and Theodore Simon published the first useful instrument to measure general cognitive abilities or global intelligence. *1908- Binet revised, expanded, and refined his first scale. *1911- The birth of the IQ. William Stern (1911) proposed the computation for IQ based on Binet-Simon scale (IQ = Mental Age / Chronological Age X 100). *1916- Lewis Terman translated the Binet-Simon scales to =PERSONALITY TESTING: Slow Rise – Projective English and published it as the Stanford-Binet Intelligence Techniques= Scale. *HERMAN RORSCHACH: The Inkblot Test- Pioneered the projective assessment using his inkblot test in 1921. 7 *1905- Binet and Theodore Simon published the first useful Always present of measurement that follows a instrument to measure general cognitive abilities or global continuous scale. intelligence. =SCALES OF MEASUREMENT= *1908 and 1911- Binet revised, expanded, and refined his *N O M I N A L S C A L E S- Known as the simplest form of first scale. measurement. *1917 World War I- Robert Yerkes, APA President Involve classification or categorization based on one or developed a group test of intelligence for US Army. more distinguishing characteristics, where all things Pioneered the first group testing; Army Alpha and Army measured must be placed into mutually exclusive and Beta. exhaustive categories. Example: DSM5, Gender of the patients, colors *1918- Arthur Otis devised a multiple choice items that could be scored objectively and rapidly. Published Group *O R D I N A L S C A L E S- It also permit classification and in Intelligence Scale that had served as model for Army Alpha. addition, rank ordering on some characteristics is also permissible. *1919- E.L Throndike produced an intelligence test for high It imply nothing about how much greater one ranking is school graduates than another; and the numbers do not indicate units of =PROMINENT FIGURES IN THE MODERN PSYCHOLOGICAL measurement. TESTING= No absolute zero point. Examples: fastest reader, size of waistline, job positions Alfred Binet Theodore Simon Lewis Terman Robert Yerkes Arthur Otis Lewis Terman *I N T E R V A L S C A L E S- Permit both categorization and rank, in addition, it contain intervals between numbers, TOPIC 2: STATISTICS REVIEW thus, each unit on the scale is exactly equal to any other *MEASUREMENT- The act of assigning number or symbols unit on the scale. to characteristics of things (people, events, etc.) according No absolute zero point however, it is possible to average to rules. a set of measurements and obtain a meaningful result. For example, IQs of 80 and 100 is thought to be similar to *SCALE- A set of numbers (or other symbols) whose that existing between IQs of 100 and 120. If an individual properties model empirical properties of the objects to achieved IQ of 0, it would not be an indication of zero which the numbers are assigned. intelligence or total absence of it. Examples: temperature, time, IQ scales, psychological =CATEGORIES OF SCALES= scales *DISCRETE- Values that are distinct and separate; they can *R A T I O S C A L E S- Contains all the properties of be counted. For example, if subjects were to be nominal, ordinal, and interval scales, and it has a true zero categorized as either female or male, the categorization point; negative values are not possible. scale would said to be discrete because it would not be A score of zero means the complete absence of the meaningful to categorize a subject as anything other than attribute being measured. female or male. Examples: Gender, Types of House, Color Examples: exam score, neurological exam (i.e. hand grip), *CONTINUOUS- Exists when it is theoretically possible to heart rate divide any of the values of the scale. The values may take *DESCRIPTIVE STATISTICS- Is used to say something about of any value within a finite or infinite interval. Examples: a set of information that has been collected only. Temperature, Height, Weight *E R R O R- Refers to the collective influence of all the factors on a test score or measurement beyond those specifically measured by the test or measurement. It is very much an element of all measurement, and it is an element for which any theory of measurement must surely account. 8 =D E S C R I B I N G D A T A= *AVERAGE DEVIATION- Another tool that could be used to describe the amount of variability in a distribution. *DISTRIBUTION- A set of test scores arrayed for recording Rarely used perhaps due to the deletion of algebraic signs or study. renders it is a useless measure for purpose of any further *RAW SCORE- Is a straightforward, unmodified accounting operations. of performance that is usually numerical. May reflect a *STANDARD DEVIATION (SD)- A measure of variability that simple tally, such as the number of items responded to is equal to the square root of the average squared correctly on an achievement test. deviations about the mean. The square root of the *FREQUENCY DISTRIBUTION- All scores are listed alongside variance. A low SD indicates that the values are close to the number of times each score occurred. Scores might the mean, while a high SD indicates that the values ae be listed in a tabular or graphical forms. dispersed over a wider range. *MEASURES OF CENTRAL TENDENCY- It is a statistic that *S K E W N E S S- Refers to the absence of symmetry. indicates the average of midmost scores between the It is an indication of how a measurements in a distribution extreme scores in a distribution. are distributed. *MEAN- The most common measure of central tendency. It takes into account the numerical value of every score. *MEDIAN- The middle most score in the distribution. Determined by arranging the scores in either ascending or descending order. *MODE- The most frequently occurring score in a distribution of scores =MEASURES OF VARIABILITY= *K U R T O S I S- The steepness of the distribution in its center. Describes how heavy or light the tails are. *Variability- is an indication of how scores in a distribution are scattered or dispersed. PLATYKURTIC- relatively flat, gently curved MESOKURTIC- moderately curved, somewhere in the middle LEPTOKURTIC- relatively peaked *RANGE- he simplest measure of variability. It is the difference between the highest and the lowest score. *Interquartile Range- A measure of variability equal to the difference between Q3 and Q1. *Semi-Interquartile Range - Equal to the interquartile range divided by two 9 *T H E N O R M A L C U R V E- Is a bell-shaped, smooth, Approximately 95% of all scores occur between the mean mathematically defined curve that is highest at its center. and +/- 2 SD. It is perfectly symmetrical with no skewness. *T H E S T A N D A R D S C O R E S-These are raw scores Majority of the test takers are bulked at the middle of the that have been converted from one scale to another scale, distribution; very few test takers are at the extremes. where the latter scale has some arbitrarily set mean and Mean = Median = Mode standard deviation. Q1 and Q3 have equal distances to the Q2 (median). It also provide a context of comparing scores on two different tests by converting scores from the two tests into z-score. =TYPES OF STANDARD SCORES= *z Scores- Known as the golden scores. Results from the conversion of a raw score into a number indicating how many SD units the raw score is below or above the mean of the distribution. Mean = 0; SD = 1 Zero plus or minus one scale (0 +/- 1) Scores can be positive or negative. *t Scores- Fifty plus or minus ten scale (50 +/- 10) Mean = 50; SD = 10 Devised by W.A McCall (1922, 1939) and named a T score in honor of his professor E.L. Thorndike. Composed of a scale that ranges from 5 SD below the =AREAS UNDER THE NORMAL CURVE= mean to 5 SD above the mean. None of the scores are negative. *Stanine- Takes the whole numbers from 1 to 9 without decimals, which represent a range of performance that is half of SD in width. Mean = 5; SD = 2 Used by US Airforce Assessment 50% of the scores occur above the mean and 50% of the scores occur below the mean. Approximately 34% of all scores occur between the mean *Deviation IQ- Used for interpreting IQ scores Mean = and 1 SD above the mean. 100; SD = 15 Approximately 34% of all scores occur between the mean *STEN- Standard ten Mean = 5.5; SD = 2 and 1 SD below the mean. Approximately 68% of all scores occur between the mean *Graduate Record Exam (GRE) or Scholastic Aptitude Test and +/- 1 SD. (SAT)- Used from admission to graduate school and college. Mean = 500; SD = 100 10 = RELATIONSHIP BETWEEN STANDARD SCORES= =THREE TYPES OF CORRELATIONS= *SPEARMAN RHO CORRELATION- A method of correlation for finding the association between two sets of ranks thus, = RELATIONSHIP BETWEEN STANDARD SCORES= two variables must be in ordinal scale. Frequently used when the sample size is small (fewer than 30 pairs of measurements). Also called rank-order correlation coefficient or rank- difference correlation. Devised by Charles Spearman *BISERIAL CORRELATION- Expresses the relationship between a continuous variable and an artificial * CORRELATION AND INFERENCE- Correlation coefficient is dichotomous variable. a number that provides an index of the relationship For example, the relationship between passing or failing between two things. the bar exam (artificial dichotomous variable) and general weighted average (GPA) in law school (continuous variable) *C O R R E L A T I O N A L S T A T I S T I C S- Are statistical tools for testing the relationships or associations between *POINT BISERIAL CORRELATION- Correlates one variables. continuous and one true dichotomous data. A statistical tool of choice when the relationship between For example, score in the test (continuous or interval) and variables is linear and when the variables being correlated correctness in an item within the test (true dichotomous). are continuous. *TRUE DICHOTOMY- There are only two possible COVARIANCE- how much two scores vary together. categories that are formed naturally. For example: Gender (M/F) CORRELATION COEFFICIENT- a mathematical index that describes the direction and magnitude of a *ARTIFICIAL DICHOTOMY- Reflect an underlying relationship; always ranges from -1.00 to +1.00 only. continuous scale forced into a dichotomy; there are other possibilities in a certain category. *PEARSON PRODUCT MOMENT CORRELATION- For example: Exam score (Pass or Fail) Determines the degree of variation in one variable that can be estimated from knowledge about variation in other *PHI COEFFICIENT- Correlates two dichotomous data; at variable. least one should be true dichotomy. Correlated two variables in interval or ratio scale format. For example, gender population who passed or fail the Devised by Karl Pearson 2018 Physician Licensure Exam. *TETRACHORIC COEFFICIENT- Correlated two dichotomous data; both are artificial dichotomy. For example, passing or failing a test and being highly anxious or not. =ISSUES IN THE USE OF CORRELATION= *RESIDUAL- difference between the predicted and the observed values. 11 *STANDARD ERROR OF ESTIMATE- standard deviation of *STANDARDIZED REGRESSION COEFFICIENTS- Also known the residual; measure of accuracy and prediction. as beta weights Tells how much a variable from a given list of variables predict a single variable. *SHRINKAGE- the amount of decrease observed when a regression equation is created for one population and then *FACTOR ANALYSIS- Is used to study the interrelationships applied to another. among a set of variables without reference to a criterion. Factors–these are the variables; also called principal *COEFFICIENT OF DETERMINATION (r2)- tells the components. proportion of the total variation in scores on Y that we Factor Loading –the correlation between original items know as a function of information about X. It also suggests and the factors. the percentage shared by two variables; the effect of one variable to the other. *META-ANALYSIS- The family of techniques used to statistically combine information across studies to produce *COEFFICIENT OF ALIENATION- measures the non- single estimates of the data under study. association between two variables. ADVANTAGES: *RESTRICTED RANGE- significant relationships are difficult to find if the variability is restricted. Can be replicated Conclusions tend to be more reliable and precise than =Essential Facts About Correlation= conclusions from single studies The degree of relationship between two variables is More focus on effect size than statistical significance indicated by the number in the coefficient, whereas the alone direction of the relationship is indicated by the sign. Promotes evidenced-based practice –a professional Correlation, even if high, does not imply causation. practice that is based on clinical research findings. High correlations allow us to make predictions Effect Size – the estimate of strength of relationship or *R E G R E S S I O N- Defined broadly as the analysis or size of differences. Typically expressed as a correlation relationships among variables for the purpose of coefficient. understanding how one variable may predict the other =PARAMETRIC VS NON PARAMETRIC TESTS= through the use of linear regression. Predictor (X) –serves as the IV; causes changes to the *PARAMETRIC- Assumptions are made for the population other variable. Homogenous data; normally distributed samples Predicted (Y) –serves as the DV; result of the change as Mean and SD the value of predictor changes. Randomly selected samples Represented by the formula: Y = a + bX *NON-PARAMETRIC- Assumptions are made for the INTERCEPT (a)- the point at which the regression line samples only crosses the Y axis Heterogeneous data; skewed distribution. REGRESSION COEFFICIENT (b)- the slope of the regression Ordinals and categories line Highly purposive sampling REGRESSION LINE- best fitting straight line through a set of points in a scatter plot =NON PARAMETRIC TESTS= STANDARD ERROR OF ESTIMATE- measures the accuracy of predi *MANN-WHITNEY U TEST *MULTIPLE REGRESSION ANALYSIS- A type of multivariate o Counterpart of t-test for independent samples (three or more variables) analysis which finds the linear o Ordinal data combination of the three variables that provides the best o Assumption of heterogeneous group prediction. *WILCOXON SIGNED RANK TEST Statistical technique in predicting one variable from a series of predictors. o Counterpart of t-test for dependent samples Intercorrelations among all the variables involved. o Ordinal data o Assumption of heterogeneous data Applicable only for all data that are continuous. 12 *KRUSKAL WALLIS H TEST whom, and how the test results should be interpreted. Competent test users understand and appreciate the o Counterpart for One-Way ANOVA limitations of the tests they use o Ordinal data o Assumption of heterogeneous group *Assumption 5- Various Sources of Error Are Part of the Assessment Process *FRIEDMAN TEST *Error- Is trasitionally refereed to as a something that is o Counterpart of t-test for dependent samples more than expected and is a component of the o Ordinal data measurment process. o Assumption of heterogeneous data Is a long-standing about the assumptions that factors TOPIC 3: ESSENTIALS OF TEST SCORE other than what a test attempts tos measure will influence INTERPRETATION (Of Test and Testing) perfromance on the test =ASSUMPTIONS ABOUT PSYCHOLOGICAL TESTING AND *Error Variance- The component of a test score MEASUREMENT= atrributable to sources than the trait or ability being measured *Assumption 1- Psychological Traits and States Exist *Classical Test Theory- assumption is made of that each *TRAIT -Any distinguishable, relatively enduring way in test taker has a score on a test that would be obtained but which one individual varies from one another for the action of measurement error Psychological traits exist only as a construct–an informed, scientific concept developed or constructed to * Assumption 6- Testing and Assessment Can Be Conducted describe or explain behavior. in a fair asn Unbiased Manner *STATES -Distinguish one person from the another but are Sources of fairness-related problems is the best test who relatively less enduring attempts to use a particular test with people whose background and experience of people for whom the test *Assumption 2- Psychological Traits and States was intended. Can Be Quantified and Measured It is more political than psychometric Traits and states shall be clearly defined to be measured accurately. *Assumption 7- Testing and Assessment Benefit Society Test developers and researchers, like other people in Without Test,there will be…. general, have many different ways of looking at and defining the same phenomenon. Subjective personnel hiring process Once defined, test developer considers the types of item Children with special needs might be assigned to certain content that would provide insight into it. classes by gut feel of the teachers and school *Cummulative scoring- assumption that higher the tesk administrators taker’ score is, there is the presumption to be on the targeted ability or trait Great needs to diagnose adecational difficulties *Assumption 3- Test-Related Behavior Predicts Non-Test- No instruments todiagnose neuropsychological Related Behavior impairments The tasks in some tests mimic the actual behaviors that No practical way for military to screen thousands of the test user is attempting to understand. recruits Obtained sample of behavior is typically used to make =WHAT IS A GOOD TEST?= predictions about future behavior. Psychometric Soundness *Assumption 4- Tests and Other Measurement Techniques Have Strengths and Weakness *Reliability- consistency in measurement The precision with which the test measures and the extent to Competent test users understand how a test was which error is present in measurement Perfectly reliable developed, the circumstance under which it is appropriate measuring tool consistenly measures in the same way to administer the test, how to administer the test and to 13 *Validity- when a test measures what it purports to =SAMPLING TO DEVELOP NORMS= measure An intelligence test is valid test because it *Sample- The representative of the whole population It measures intelligence; the same way with personality tests; could be a small as one person, though samples that and with other psychological tests Questions on test’s approach the size of the population reduce thepossible valifity may focus on the items that is collectively make up sources of error due to insufficient sample size the test. *Sampling- The process of selecting theportion of the *Norms- These are the test performace data of a particular universe deemed to be representative of the whole group of test takers that are designed for use as a reference population when evaluating or interpreting individual test scores Obtained by adminis tering the test to a sample of people and obtaining the distribution of scores for that group *Normative Sample- Group of people whose performancr on a particular test in analyzed for reference in evaluating =Developing Norms for a Standardized Test= the persormance of an individual test taker The test developer administers the test according to the *Norming- The perocess of deriving norms standard set of instructions that will be used with the test Maybe modified to describe a particular type of norm The test developer describes the recommended setting for derivation giving the test *Standardization- the process of adminitering a test to a The test developer will summarize the data using representative sample of test takers for the purpose of descriptive statistics, including measures of central establishing norms tendency and variability A test is said to be standardized when it has clearly specified procedures for administration and scoring. The test developer provides a precise description od the Typically including normative data standardization saple itself =TYPES OF STANDARD ERROR= Exact figure exact # of sample, specify the demograpic *Standard Error of Measurement (SEM)- a statistic to = EVALUATION OF NORMS= estimate the extent to which an observed score deviates *Norm-referenced Test- a score is interpreted by from a true score comparing it with the scores obtained by others on the *Standard Error of Estimate (SEE)- In regression, it is an same test Methoed of evaluation and a way of deriving estimate of the degree of error involved in predicting the meaning from test scores by evaluationg an individual’s value of one variable from another score with reference to a set of standard *Standard Error of the Mean (SEM)- a measure of sampling *Criterion-referenced Test- is uses as specified content error domain rather than a specified poppukation of people. The score is interpreted based on the performance of a *Standard Error of the Difference (SED)- a statistic used to standardized group Also known as content-referenced or estimate how large difference between two scores should domain-reference be before the difference is considered statistically significant Criterion: how the test developer pre determined the cut score 14 =TYPES OF NORMS REFERENCED TESTING= *TRACKING – The tendency to stay at about the same level relative to one’s peer. *Development Norms- Norm developed on the basis of any Staying at the same level of characteristics as compared trait, ability, skill or other characteristics that is presumed to the norms. to develop, deteriorate, or otherwise be affected by This process is applied to children when parents want to chronoligical age, school grade, or stage of life know if the child is growing normally. *Age Norms- average performance of different TOPIC 4: RELIABILITY samples of test takers who were at various ages at the time the test was administered *Reliability- refers to the dependability or consistency in measurement The proportion of the total variancr *Grade Norms- designed to indicate the average attibuted to true variance test performance of test takers in a given school grade *Reliability Coefficient- an index of reliblity. A proportion *Ordinal Scale- are digned to identify the stage that indicates the ratio between the true score variance on reached by the child in the development of specific a test and the total variance behavior functions If we use X to represent an observed score, T to represent a *Within Group Norms-the individual’s performance true score, and E to represent error, then the fact that an evaluated in terms of the performance of the most nearly observed score equals the true score plus error may be comparable standardization group expressed as follows: X = T + E *Percentiles- an expression of the percentage of =Concepts in Reliability= people whose score on a test or measure falls below a particular score it indicates the individual’s relative *Variance (σ2) - useful in describing sources of test score position in the standardization sample variability. The standard deviation squared. This statistic is useful because it can be broken into components. (Percentile rank: your position in the entire rank *True Variance- variance from true differences Example: Kyla placed in 95th percentile *Error variance- variance from irrelevant, random sources Interpretation: Kyla is in the 95th percentile rank which means that she outsmarted 95% of the population *Measurement error- refers to, collectively, all of the who also took the test.) factors associated with the process of measuring some variable, other than the variable being measured *Standard scores- are derived scores which uses as its unit the SD of the population upon which the test was If σ2 represents the total variance, the true variance, and standardized the error variance, then the relationship of the variances can be expressed as *Deviation IQ- a standard score on an intelligence test with a mean of 100 and an SD that approximates the σ2 = σ2 th + σ2e SD of the SB IQ distribution In this equation, the total variance in an observed *National Norms- Norms on a large scale sample National distribution of test scores (σ2) equals the sum of the true representatives variance (σ2 th) plus the error variance (σ2 e). *Subgroup Norms- segmented by any of the Types of Measurement Error criteria initially used in selecting subjects for the sample *Random error- is a source of error in measuring a *Local Norms- provide normative information with targeted variable caused by unpredictable fluctuations and respect to the normative population performance on some inconsistencies of other variables in the measurement test process. *Systematic error- refers to a source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured. 15 =Theory of Reliability= *Domain Sampling Model- The greater the numbers of items, the higher the reliability is Puts sampling error and correlation together in the context It considers the problems created by using a limited of measurement number of items to represents a large and more *Test Score Theory: Classical Test Theory (CTT)- known as complicated construct the true score model of measurement Conceptualizes reliability as the ratio of the variance of X = T + E where X is the observed score, T is the true score, the observed score on the test and the variance of the long- and E is the error run true score Standard Error of Measurement- the standard deviation =Sources of Error Variance= of errors because of the assumption that the distribution of random error will be the same for all people *Test Construction- the item sampling or content sampling Widely used in assessment of reliability and has been is one of the sources of variance during test construction around for over 100 years due to the variation among items within a test as well as Disadvantage: requires that exactly the same test items be variation among items between tests administered to each person *Test Administration- Sources of error variance that occur Errors of measurement are random during test administration may influence the test taker’s Each person has a true score obtained if there were no attention or motivation. error inn measurement test taker’s reactions to those influences are the source of The true score for an individual will not change with one kind of error variance, test environment for instances repeated applications of the same test Test taker variables such as emotional problems, physical *Item Response Theory (IRT)- provides a way to model the discomfort, lack of sleep, effects of drugs or medication, probability that a person with X ability will be able to etc. perform at a level of Y Examiner- related variables- such as physical appearance Latent-Trait Theory- synonym of IRT I academic literature and demeanor, or the mere presence or absence of an because often times, a psychological or educational examiner construct being measured is physically unobservable and *Test Scoring and Interpretation- Not all tests can be because the construct being measured may be trait scored using a computer such as the tests administered The computer is used to focus on the range of item individually difficulty that helps assess on individual’s ability level a test may employ objective-type items amendable to For example, if the test taker gets several easy items computer scoring yet technical glitch may contaminate the correct, the computer might quickly move to difficult items. data If the person gets several difficult items wrong, the if subjectively is involved in scoring, the scorer or rater can computer moves back to the area of item difficulty where be the source of error variance the person gets some items right and some wrong Requires a bank of items that have systematically evaluated for level of difficulty The more questions that the test takers have answered the higher the chance that the question is easy The more questions that the test takers have not answered the higher possibility the chance that the question is difficult 16 =Models/ Estimates of Reliability= *Alternate-Forms Reliability- Different versions of a test that have been constructed to be parallel Designed 1. Test-retest Reliability- is used to evaluate error to be equivalent with respect to variables such as content associated with administering a test at two different times; and level of difficulty refers correlating the scores of two test administered to the extent to which these different forms of the same Should be used when measuring traits or characteristics test have been affected item sampling error, or other error. that do not change over time- static attributes It is same with parallel but different version Also known as coefficient of stability 3. Internal Consistency- estimates how the test itmes are *Conveyer Effect- occurs when the first testing consistent with one another session influences scores from the second session the degree of correlation among all the items on a scale *Practice Effect- type of carryover wherein skills =Types to measure internal consistency= are improved with practiced *Split-Half Reliability- a tets is given and is divided into Scores on the second administration are usually higher halves that are scores separately, the results of one half of than they were on the first; thus, changes are not constant the test are then compared with the results of the other; across the group. one administration required If a test manual has test-retest correlation, always pay Equalizes the difficulty of the test attention to the interval between two testing sessions a useful of reliability when it is impractical and undesirable poor test-retest correlations do not always mean that the to use two tests or to administer a test twice-odd-even test is unreliable; it could mean that the characteristics system Three under the study has changed over time steps; Divide the test into equivalent halves Calculate the 2. Parallel-And Alternate-Forms Method- coefficient of pearson’s r between scores on the two halves of the test equivalence adjust the half-test reliability using the spearman-brown responsible for item-sampling error formula; corrected r= 2r + r Also uses correlation *Spearman-Brown Formula- estimates what the Disadvantages: correlation would have been if each half had been the length of a whole test It increases the estimate of Burdensome to develop two forms of the same reliability can be used to estimate the effect of the test shortening at reduction of test items on the test’s reliability Practical constraints include retesting of the same can also be used to determine the number of items group of individuals needed to attain a desired level of reliability; hence the new items must be equivalent in content and difficulty so creating a new secondary form that has the same question that the longer test still measure what the original test and same difficulty but different presentation (for board measured exam) (Magkakaroon ng correlation kung yung length ng half test *Parallel-Forms Reliability- compares two is the same of the length of the whole test equivalent forms of a test that measure that same attitude Ilang test items ang dapat mong alisin para bumaba ang the two forms use different items; however, the rules reliability coefficient to an acceptable range for a newly used to select items of a particular difficulty level are the develop test same Sinagot niya yung weakness ni split half) theoretically, scores obtained on parallel forms correlate equally with the true score or with other measures *Kuder-Richardson Formula: An estimate of the extent to which item sampling and the *KR-20- used for a test in which items are errors have affected scores on versions of the same test dichotomous or can be scored right or wrong (merong when, for each form of the test, the means and variances tamang sagot) ; homogeneous, and items with unequal observed test scores are equal difficulty the 20th formula that was developed in the series due to G. Frederic Kuder and M.W. Richardson dissatisfaction with the spilt-half methods 17 *KR-21- for items that have equal difficulty or that *Kappa- indicates the actual agreement os a proportion of average difficulty level is 50% the potential agreement following a correction for a chance agreement. Can’t apply in personality test POINTS TO REMEMBER: *Coefficient Alpha (a)- Can be thought of as a mean of all possible split-half correlations, corrected by the Spearman- Indices of reliability provide an index that is the Brown formula; thus, providing the lowest estimate of characteristic of a particular group of test score, not the reliability that one can expect. test itself (Caruso. 2000: Yin & Fan. 2000). Measures of Can be used when the two halves of a test hove unequal reliability are estimates, and estimates are subject to error. variances. The precise amount of error inherent in a reliability Appropriate for tests that have nondichotomous (no estimate will vary with various factors such as the sample of correct and incorrect) items; personality tests. test takers from which the data were drawn. The most preferred statistic for obtaining an estimate of Reliability index is published in a test manual and might internal consistency reliability. be impressive. Remember that the report reliability was Best use in in personality test achieved with a particular group of test takers. Widely used measure of reliability, in pod because it requires only one administration of the test. HOW RELIABLE IS RELIABLE? Values typically range from 0 to +1 only. Usually depends on the use of the test. *COEFFICIENT OMEGA- The extent to which all Reliability estimates in the range of.70 to.80 are good items measure the some underlying trait enough for most purposes of basic research. Overcomes the weakness of coefficient alpha as its.95 are not very useful because it may suggest that all of coefficient can be greater than zero even though the items the items ore essentially the something and the measure are not assessing the same trait. could easily be shortened. *Average Proportional Distance (APD)- =SOLVING LOW RELIABILITY= Evaluates the internal consistency of a test that focuses on the degree of difference that exist between items scores. *Increase the Number of Test Items General rule: if the obtained value is.2 or lower, it the larger the sample, the more likely that the test will indicates on excellent internal consistency. The value of.25 represent the true characteristic. to.2 is in the acceptable range. In domain-sampling model, the reliability of a test A relatively new measure developed by Sturman et al.. increases, as the number of items increases. Can use the general Spearman-Brown formula to achieve 4. INTER-SCORER (RATER) RELIABILITY- The degree of a desired level of reliability agreement or consistency between Iwo or more scorers (raters or judges) with regord to a particular measure. *Factor Analysis Often used to code nonverbal behavior. Unidimensional makes the test more reliable: thus, one factor should account for considerable more of the variance Coefficient of Inter-Scorer Reliability- the degree of than any other factor. consistency among scorers. Remove items that do not load on the one factor being measured. *Kappa Statistics- the best method to ossess the level of agreement omong several observers. *Item Analysis Correlation between each item and the total score for the *Cohen’s kappa- small no. Of raters and scorers (2 test often coiled as discriminability analysis. raters) When the correlation between performance on a single *Fleiss kappa- measure the ahreement between item and the total test score is low, the item is probably the fixied no. of raters (more than 2) measuring something different from other items on the test. It is the reliability of instrument (performance and score of a group) that we are measuring and not the test itself *Correction of Attenuation Used to determine the exact correlation between How the raters in agreement with the performance etc. variables in the test is deemed affected by error. 18 The estimation of what the correlation between tests =OTHER CONCEPTS TO REMEMBER= would have been is there had been no error in *STANDARD ERROR OF MEASUREMENT (SEM) measurement. A tool used to estimate or infer the extent to which a test =NATURE OF THE TEST= score deviates from o true score. Provides on estimate of the amount of error inherent in *Homogenous or Heterogenous Test Items on observed score of measurement. HOMOGENOUS: items that ore functionally uniform An Index of the extent to which one individual score vary throughout. The test is designed to measure one factor over tests presumed to be parallel. such as one ability or one trait which yields a high degree of Confidence Interval- the range or bond of test scores that internal consistency. is likely to contain the hue score. HETEROGENOUS: more than one factor is being measured; thus, internal consistency might be low relative *STANDARD ERROR OF DIFFERENCE (SED) to a more appropriate estimate of test-retest reliability. Determining how lorge the difference should be before it is considered statistically significant. *Dynamic or Static Characteristic, Ability, Trait Larger than SEM for either score alone because the DYNAMIC- a trait, state, or ability is presumed to be ever- former is affected by the measurement in both sides. changing as a function of situational and cognitive experiences. If two scores each contain error such that in each case STATIC- a characteristic, ability, or trait that is stable; that true score could be higher or lower, then we would hence, test-retest and alternate forms methods would be wont the two scores to be further apart before we appropriate conclude that there is a significant difference between them. *Range of Test Scores is or is not Restricted Important to be used when interpreting a coefficient of *Don't forget: reliability. The relationship between the SEM and the ratability of a If the variance of either variable in o correlational analysis test that is inverse; the higher the reliability of o test (or Is restricted by the sampling procedure used, then the individual subtest within a test), the lower the SEM. resulting correlation coefficient tends to be lower. In accordance with the True Score Model, an obtained If the variance is inflated by the sampling procedure, then test score represented one point in the theoretical the resulting correlation coefficient is higher. distribution of scores the test faker could have obtained *Speed Test or Power Test SPEED TESTS- generally contain items of uniform level of difficulty (typically low) so that when given generous time limits, test taker should be able to complete all the items correctly (with time limit) POWER TESTS- when the time Emit is long enough to attempt all items, and if some test items are so difficult that no lest taker is able to attain o perfect score (no time limit) *Test is or is not Criterion-References Designed to provide on indication of where 0 test taker stands with respect to some variable criterion Contains materials that have been mastered in a hierarchical fashion. For example, tracing a letter pattern before attempting to master writing Scores on this measure tends to be interpreted in puss- foil terms, and any scrutiny of performance on individual items tends to be for diagnostic and remedial purposes. 19 TOPIC 5: VALIDITY behavior representative of the universe of behavior that the test was designed to sample. *VALIDITY- A judgment of estimate of how well a test Requites subject matter words to ensure all items are valid. measures what it purports to measure in a particular context. (How sufficiently represented the domain that should be *Validation: the process of gathering and evaluating evidence measure. All discussion is covered in the test) about validity. *CONSTRUCT UNDERREPRESENTATION- Describes *LOCAL VALIDATION STUDIES- necessary when the test user the failure to capture important components of a construct. plans to alter II some way the format, instruction, language, For example, an English Proficiency exam that did not cover or content of the test. the Ports of Speech knowledge of the test takers. Also necessary it a test use sought to test with o population al lest token Mot differed in some significant way from the *CONSTRUCT-IRRELEVANT VARIANCE- Occurs when population on which the test was standardized. scores ore influenced by factors irrelevant to the construct. For example, test anxiety, physical condition, etc. (Knows the context of introversion *Quantification of Content Validity (Lawshe, 1975) *Local Validation- no one has the right to translate a psychological test and administer it to a certain group of Essential (Accepted) people whom it was translated to the language that they Useful but not essential (Revise) understood without undergoing local validation) Not necessary (Rejected) =ASPECTS/ TRINITARIAN MODELS OF VALIDITY= * Face Validity- A judgment concerning how relevant test items appear to be. Makes the test taker to seriously take the test. Even if a test lacks face validity, it may still be relevant and useful. (It is mathematics test but there are subject verb agreement) *CRITERION-RELATED VALIDITY- A judgment of how adequately a test score can be used to infer an individual's * CONTENT-RELATED VALIDITY- Adequacy of representation most probable standing on some measure on interest - the of the conceptual domain the test is designed to cover. measure of interest being the criterion. It describes a judgment of how adequately a test samples 20 *Criterion- the standard against which the test or test =Considerations in Predictive Validity= score is compared to or evaluated. It can be a test score, *BASE RATE- the extent to which a particular trait, behavior, behavior, amount of time. rating, psychiatric diagnosis, characteristic, or attribute exists in the population (existence training cost, index of absenteeism, etc. For of trait) example, a test might be used to predict which engaged couples will have successful montages and which ones will get *HIT RATE- the proportion of people a test accurately annulled. Marital success is the criterion but it cannot be identifies as possessing or exhibiting o particular trait, known at the time the couple take premarital test. behavior, characteristic, or attribute (presence of the threat, behavior) Characteristics of a Criterion *MISS RATE- the proportion of people the test foils to identify *RELEVANT-must be pertinent or applicable to the motler at as having, or not having a particular trait, characteristic or hand. (The criteria you have is the criterion needed) attribute; inaccurate prediction. *VALID- should be adequate for the purpose for which it is *False Positive- a miss when a test taker was being used. If one test is being used as the criterion to predicted to have the attributes being measured when in fact validate the second test, then evidence should exist that the did not: akala meron pero wala. first test is valid. *False Negative- a miss when a test taker was not *UNCONTAMINATED- the criterion is not affected by other predicted to have the attributes being measured when in fact criterion measures; otherwise, criterion contamination he or she did have; akalo wok" pero meron. occurs. *Incremental Validity- value add of the test or of the =Types of Criterion Validity= criterion. *CONCURRENT VALIDITY- Comes from assessments of The degree to which on additional predictor explains simultaneous relationship between the test and the criterion. something about the criterion measure that is not explained Indicates the extent to which test scores may be used to by predictors already in use. The estimate an individual's present standing on a criterion. predictive power of the test to see or discover something else For example, work samples, scores on Beck Depression other that what it is intended to Inventory IBDII and BDI-II, etc. (What more, extra mile that the test can offer more than what *PREDICTIVE VALIDITY- How accurately scores on the test the test is offering) predict some criterion measure. * Validity Coefficient- The relationship between a test score For example, scores on College Admission Test (CAT] and and a score on the criterion measure. It is the computed GPA of freshmen provide evidence of the predictive validity of correlation coefficient. such admission test. No hard-and-fast rules about how large the validity NMAT predicts MSAT predicts GPA in Med School PhiLSAT coefficient must be. predicts BAT predicts GPA in Law School.60 or larger is rare;.30 to.40 are commonly considered high. (There is validity coefficient in criterion-related validity) 21 =EVALUATING VALIDITY COEFFICIENT= *CONSTRUCT- a scientific idea developed or hypothesized to describe a explain behavior: something built by mental Look for Changes in the Cause of Relationships synthesis. Examples are intelligence, motivation, job Identify if the Criterion is Valid and Reliable satisfaction, self-esteem, etc. o Criterion validity studies would mean nothing if the criterion is not valid or reliable *CONSTRUCT VALIDATION- assembling evidence about what the test means and is done by showing the relationship Review the Subject Population in the Validity Study between a test and other tests and measures. o The validity study might have been done on a population that does not represent the group to which inferences will be =Main Types of Construct Validity Evidence= made. *CONVERGENT VALIDITY- When a measure correlates well Be Sure the Sample Size was Adequate with other tests (standardized. published. etc.) that are o A good validity study will present evidence for cross- designed to measure similar construct. validation; hence, the sample size should be enough. o Yields a moderate 10 high correlation coefficient wherein Cross validation study assesses how well the test actually the scores on the constructed test will be correlated with the forecasts performance for on independent group of subjects. scores on an established test that measures the same construct. Never Confuse the Criterion with the Predictor o Criterion is the standard being measured or the desired Can be obtained by: outcome while predictor is a variable that affects the Showing that a test measures the same thing as the other criterion. tests used for the same purpose: and Demonstrating specific relationships that can be expected if Check for Restricted Range on Both Predictor and Criterion the test's really doing its job. o Restricted range happens if all scores foil very close together. *DISCRIMINANT (Divergent) VALIDITY- A proof that the test measures something unique. Review Evidence of Validity Generalization Indicates that the measure does not represent a construct o The findings obtained in one situation may be applied to other than the one for which it was devised. other situations. Correlating a test to another measure that has a little to no Consider Differential Prediction relationship at all; hence, the correlation coefficient should be o Predictive relationships may not be the same for oil low to zero. demographic groups. =Other Evidences of Construct Validity = =Other Concepts= *Evidence of Homogeneity- how uniform a test is in *EXPECTANCY DATA- provide information that can be used in measuring a single concept. (Homogeneity- Normative sample evaluating the criterion-related validity of a test. have the same characteristics) *EXPECTANCY TABLE- shows the percentage of people within *Evidence of Changes with Age- some Constructs Ore specified test-scores intervals who subsequently were placed expected to change over time. in various categories of the criterion (passed or failed). *Evidence of Pretest-Posttest Changes- result of intervening Established through a series at activities in which a experiences. researcher simultaneously defines some construct and *Evidence from Distinct Groups- or contrasted groups - score develops the instrumentation to measure it. on a test vary in a predictable way as a function of Judgment about the appropriateness of inferences drawn membership in some group. from test scores regarding individual standing on a variable. Viewed as the unifying concept for all validity evidence 22 *FACTOR ANALYSIS- A method of finding the minimum TOPIC 6: TEST DEVELOPMENT number of dimensions or factors for explaining the largest number of variables. *Test development- is an umbrella term for all that goes

Reviewer for Psych Assessment PDF

Document Details

Tags

Related

Summary

Full Transcript