CH 3 Using Standardized Tests and Assessment Tools_smrev042024.docx

Full Transcript

\3 \ USING STANDARDIZED ASSESSMENT TOOLS **\INTRODUCTION** This chapter provides information about the use of standardized assessment tools. The content is applicable to other occupational therapy practice areas as most occupational therapy evaluations include the administration of standardized t...

\3 \ USING STANDARDIZED ASSESSMENT TOOLS **\INTRODUCTION** This chapter provides information about the use of standardized assessment tools. The content is applicable to other occupational therapy practice areas as most occupational therapy evaluations include the administration of standardized tests. Different types of standardized tests including **norm- referenced** and **criterion-referenced** standardized assessment tools are discussed. Methods for evaluating **psychometric properties** of assessment tools are presented, and information to help you select appropriate assessments, learn to administer assessments, and to understand, interpret, and report different types of **standardized scores** is covered in detail. Standardized assessment tools allow for objective measurement of body functions, skills and abilities and are important for advancing the scientific body of knowledge of professions. Objective measurement in occupational therapy clinical practice and for use in research, especially applied research assists in raising the status of our profession and promotes evidence-based practice. Therefore, the use of standardized testing for research purposes is also addressed in this chapter. The chapter concludes with a sample review of a common standardized assessment of motor skills often used by occupational therapists with children, and a series of tables listing available standardized assessment tools organized by the functional skill areas and body functions addressed by each. **\DESCRIPTION OF STANDARDIZED ASSESSMENT TOOLS** **Standardized assessment tools** are those that have specific procedures for administration and research purposes. Standardized assessment tools are also important for advancing the scientific body of knowledge of professions, and for enhancing communication across disciplines since standardized scores are designed to be universally understood. Test materials and forms typically are provided in a test kit, along with a test manual. **Test manuals** describe the purpose(s) of the test and the population for which it was designed. The test construction process should also be described along with the results of research studies examining the test's reliability and validity. Administration and scoring procedures are described in detail so that all individuals trained in administration and scoring procedures administer the assessment to their clients precisely in the same manner. Standardized assessments might be performance-based, or questionnaires designed as self-report or caregiver report measures. Standardized assessment tools are important for providing objective data about client performance. These data may be used for (1) diagnostic purposes, (2) determining the nature and extent or severity of difficulties, (3) evaluating and documenting change in performance over time, (4) determining eligibility for an individual to attend a particular program or service, (5) predicting performance in a related task or future function, (6) program planning purposes, and (7) research purposes. There are standardized assessment tools available to examine all domain areas of interest to occupational therapists including performance in areas of occupation, client factors such as body functions, values and beliefs, motor, process and social interaction skills, contextual elements, and performance patterns and roles. Some assessment tools measure very specific skills such as feeding, visual perception, or characteristics of a classroom setting, while others are very broad, such as tests that examine most areas of occupational performance, adaptive behavior, and all main developmental domains. **\Norm-Referenced Standardized Assessments** Norm-referenced assessment tools are very common in pediatric occupational therapy practice. They yield scores by comparing a child's performance with the performance from a large sample of children or normative group. Because a child's score on a norm-referenced test is converted to a standard score that is derived from the normative data, the child's score depends on the average performance of children in the normative sample with whom the child was compared. Test manuals include detailed information about the normative data, characteristics of the sample of children used, and how the sample was obtained. The validity of score interpretation is largely dependent upon the relevancy of the norms for the child being assessed. For example, a normative sample from children living in an urban city in Colorado may not be relevant for children living in rural Alaska; a test of gross motor skills normed on typical children may not be relevant to detect changes in gross motor performance of children with Down's syndrome; or norms of children nine years of age may not be relevant for children 10 or 11 years of age. Norm-referenced tests typically evaluate a broad array of skills and are particularly good for diagnostic and research purposes, to evaluate change in performance, and to examine the extent or severity of identified delays or dysfunction. Many norm-referenced tests are also useful for program planning, prioritizing needs and goals, and the development of intervention programs. **\Criterion-Referenced Assessments** Criterion-referenced, standardized assessment tools typically use a rating system or checklist to indicate a level of performance of a child's abilities with respect to a set of activities or skills. This type of test helps describe specifically the skills a child can or cannot do by comparing the child's performance with a set of criteria rather than comparing their performance with that of others. Percentage of items correctly demonstrated for example, may be the type of the score used. Criterion-referenced tests may provide information regarding the amount of help needed or the methods used to complete a particular activity. The administration procedures tend to be somewhat less rigid than those of normative tests. Some assessment tools such as the Peabody Developmental Motor Scales--3 by Folio and Fewell^1^ and the Pediatric Evaluation of Disability Inventory by Haley and colleagues^2^ are both criterion-referenced and norm-referenced. **\MAKING THE DECISION TO USE A STANDARDIZED ASSESSMENT TOOL** Your decision to use a particular standardized assessment tool should be made after consideration of several factors and is one of most difficult decisions therapists make as part of the evaluation process. Important considerations include: (1) the specific purpose or areas measured by the test; (2) whether the tool is appropriate for your client's age and abilities; (3) its psychometric properties including characteristics and quality of the normative data, reliability, and validity; and (4) pragmatic factors such as length of time needed, your competency in its administration, space requirements, and cost. Sound **clinical reasoning** will assist you in determining whether the child's performance on the assessment tool you are considering will provide the type of information you are looking for. You also want to select an assessment tool that can be administered efficiently, and one that you believe you client will be able to complete in the standardized way intended. Questions you may want to ask to help you decide whether to administer a specific assessment tool are listed in Box 3-1. Most standardized assessment tools used in pediatrics by occupational therapists focus on specific performance skills and body functions, although there are also some tools designed to address occupational performance areas, and other factors like contextual elements. Tables 3-1 through 3-6 provide a list of many of the tools available and commonly used by occupational therapists organized by the following categories: (1) developmental evaluation and screening tools; (2) occupational performance measures including those for assessing activities of daily living, school-related functions, handwriting, and play; (3) assessments of sensory motor skills, including gross and fine motor skills, postural control, sensory processing, and sensory integration functions; (4) assessments measuring visual-motor and visual-perceptual skills; (5) assessments measuring psychosocial and emotional functioning; and (6) contextual assessments, including ecological inventories. \ \ \ \ \ \ \ **\Learning to Use Standardized tesTs** Your responsibilities as an examiner or standardized test administrator cannot be underestimated because as noted earlier, critical decisions such as eligibility for certain programs or providing a diagnosis are sometimes made based on your test results. **Competency** in the administration of standardized tests with children requires not only the ability to administer and score a test properly, but also a solid understanding of child development (see Chapter 2), knowledge of the principles of measurement, and the ability to effectively interact and establish rapport with children of various ages, developmental levels, and behavioral challenges. Learning to administer a standardized test is a labor-intensive process especially for performance-based tests that require the set up and manipulation of various materials, and presentation of test items and tasks in a precise manner. Developing competency involves first reading and thoroughly understanding the content in the test manual. It is important to understand the test development process, the test's purposes, the strengths and characteristics of the normative data, and the results of reliability and validity studies and other psychometric data presented. This information will help determine for whom the test would be of most benefit, and the level of confidence with which you can make accurate and valid score interpretations. Then time must be spent learning the administration and scoring procedures. Examples of a therapist administering standardized tests to children are depicted in Figures 3-1 to 3-3. Note the specific set up and equipment like the balance pad (Figure 3.1) and puzzle pieces (Figure 3.3) that are required for administration. Some assessment tools have administration easels that facilitate ease of administration (Figure 3.2). Although some standardized tests require a certification process or require therapists to go through a formal training program most can be self-taught provided that therapists have the background knowledge in child development and measurement, and are willing to put the time in to read the manual carefully, learn the administration and scoring procedures, and of course practice. Observing experienced therapists administer tests is another learning strategy that is often helpful. Some assessment tools also provide a recommended process for training as well as training materials, fidelity measures, and videos of test administration. \ \ \ **Competency guidelines** or **standards of practice** related to the use of standardized assessments are provided by professional organizations such as the American Occupational Therapy Association (AOTA) and by legislation such as the Individuals with Disabilities Education Improvement Act, 2004 (Table 3-7). Publishing companies and other organizations that sell testing materials such as Western Psychological Services (see www.wpspublish.com) and PsychCorp/Pearson (see http://psychcorp.pearsonassessments.com) have competency standards and qualification forms that potential users must complete prior to purchasing certain materials. These standards are consistent with the Standards for Educational and Psychological Testing published the American Psychological Association and prepared by the American Psychological Association in collaboration with the American Educational Research Association and the National Council on Measurement in Education. \ **\INTERPRETATION OF STANDARD SCORES FROM NORM-REFERENCED ASSESSMENTS** The first step in scoring a child's performance on a test is to obtain a **raw score**. The raw score is then converted to a **standard score** so that meaningful interpretations can be made. A standard score takes into consideration how your client did in relation to the scores of children in the normative sample with whom he or she is compared. The scores of children in the **normative sample** typically follow a normal distribution with the bulk of scores clustered around the mean and relatively few scores falling at the extreme ends of the range. The test manual will include tables (often they are in the back of the test manual) to convert your raw scores into standard scores. Some assessment tools used web-based scoring platforms or provide software for scoring which involve computer entry of raw scores and other necessary data, and then the program converts the raw scores into standard scores based upon the normative data. The **standard deviation** (SD) of a sample of a given population is a measure of variability or the spread of scores which is the extent to which scores deviate from the mean and from one another. The SD can be used to divide up the normal distribution (Fig. 3-4) into sections so that for a normal distribution, approximately 68% of the scores are within 1 SD of the mean, 28% between 1 and 2 SD of the mean, and 4% outside 2 SD of the mean. This is important because it is believed that "typical" or "normal" performance is behavior (or test scores) that fall within 1 to 1.5 SDs from the mean. In other words, if you consider a score 1 SD below the mean to be significantly below average, you believe that when a child performs at or below the performance of 16% of the children in the comparative normative group then the child's performance is considered as being lower than the range of scores considered to be typical or average. \ Common standard scores used to report a child's results on pediatric assessment tools are ***z* scores** (mean = 0, SD = 1), **T-scores** (mean = 50, SD = 10), and scores such as developmental quotients and IQ scores (sometime just referred to as a Standard Score) that have a mean of 100, and SD of 15. **Stanine scores** range from 1 to 9 with a mean of 5, and scores of 3 to 7 are interpreted as being within the "normal" range. As long as you know the mean and SD of the sample that is used to compare your score, you can convert from one type of standard score to another (see the examples in Box 3-2) because they are all based on the normal distribution. These standard scores facilitate communication among professionals because they are all interpreted similarly regardless of what the test used to measure, or the areas being tested. \ **Percentile scores** are defined as the percentage of people in the standardization sample who scored at or below a given score. For example, a child's raw score of 15 that falls at the 38th percentile means that 38% of the children with whom he or she is compared scored at or below this score. Percentile scores are also derived based on the normal distribution. The mean, therefore, is at the fiftieth percentile, and percentile scores within 1 SD of the mean are those that range from the sixteenth percentile to the eighty-fourth percentile (see Fig. 3-4). A type of norm-referenced score sometimes reported is an **age-equivalent score**. An age-equivalent score relates the child's score to that of a typical or average child of a particular age group. For example, if a child's raw score of 25 is converted to an age-equivalent score of 6 years, 4 months, this means that the child received the mean score for children aged 6 years, 4 months from the normative sample, or in other words, the child performed like typical children aged 6 years, 4 months. Another type of standardized score is a Rasch score which is specific to measurement instruments that apply the Rasch theory or methodology which is a type of item response theory. The Assessment of Motor and Process Skills^18^, The Evaluation of Social Interaction^58^ and the PEDI-CAT^13^ are examples of such tools. The Rasch model ranks items hierarchically on a continuous scale from easiest to most difficult and then Logit scores are generated based on individual ability relative to how they perform in consideration of item difficulty. Logit scores are defined as the natural log of an odds ratio which is the likelihood or probability of a particular number of items being passed or failed. Logit scores can be converted to standardized scaled scores which are like criterion-referenced scores as they indicate a child's performance along a continuum of items ranging in difficulty. Logit scores can also be used to generate norm-referenced, or age-relative scores such as z-scores and percentile scores using normative data. The age-relative scores indicate the child's standing in relation to age or developmental expectations. More information on applying Rasch scores within child assessment can be found in the review by Ludlow and Haley^81^. Waysof reporting child results of standardized assessments including test scores are provided in the sample evaluation report included in the ancillaries and throughout the book. **\STANDARD ERROR OF MEASUREMENT** All test scores inherently have some error associated with them. A child's score which is sometimes referred to as an **observed score** really represents a **true score** which is a hypothetical construct representing the child's true ability, plus or minus **measurement error**. Measurement error is inevitable with standardized testing and can result from characteristics of the test itself, examiner error, and child errors (Table 3-8). **Standard error of measurement** (SEM) indicates how much variability can be attributed to error, as described by Murphy and Davidshofer^82^ and it depends on (1) the test-retest reliability coefficient, "*r*" (discussed later in this chapter), and (2) the SD of the sample used to determine the reliability coefficient. It is calculated using the following formula: [\$\\text{SEM} = \\text{SD}\\sqrt{1 - r}\$]{.math.inline}. \ The SEM is important in relation to test scores because it represents the confidence you have in the validity of the test score. Test scores may be reported using the SEM to account for the test's measurement error and is a more accurate way to present a child's scores (albeit less precise). A test's SEM is usually given in the test manual, and it can be used to create a confidence interval around a child's observed score. A confidence interval is a range in which it can be stated with a known degree of confidence that a specific score would fall. Based on the normal distribution, it is known that 68% of scores fall within 1 SD of the mean and that 95% of scores fall between --2.5 and +2.5 SD of the mean. Therefore, we can use these numbers to construct 68% and 95% confidence intervals. For example, suppose a child 6 years of age had an observed standard score on a test of 70 (mean = 100, SD = 15), and the manual reported that the SEM for this test is 6 for a child of that age. Instead of reporting the child's score as 70, it is more accurate to express the score in terms of a confidence interval by stating that you are 68% confident that the child's true score falls between 64 and 76. To establish the 95% confidence interval, you add and subtract 2 SEMs instead of 1 and would report the child's true score as falling within the range of 58 to 82. **\EVALUATING PSYCHOMETRIC PROPERTIES OF ASSESSMENT TOOLS** Evaluating the **psychometric properties** of standardized assessment tools involves examining the results from research studies that were conducted during the development of the test, and they are typically described in the test's manual. It is also likely that other studies have been published in refereed research journals after the time the test was made available that contribute useful information about the quality of the test's normative data and aspects of reliability and validity. Being able to evaluate a test's normative data, reliability, and validity requires a solid understanding of number of concepts and statistical tests. One of the most common statistics used in psychometric analyses of standardized tests is the correlation coefficient. A **correlation coefficient** is a measure of the strength of the relationship between two variables. It is a number that ranges from --1 (a perfect negative relationship) to +1 (a perfect positive correlation), and a correlation of zero means that there is no relationship. A negative correlation means that as one variable increases, the other decreases (or vice versa). A positive correlation means that as one variable increases, the other increases, or as one decreases, the other decreases. In other words, if both variables move in the same direction, it is a positive correlation, and if they move in opposite directions, the correlation or relation is negative. In general, weak correlation coefficients range from 0.26 to 0.49 (or --0.26 to --0.49), moderate correlation coefficients from 0.50 to 0.69 (or --0.50 to --0.69), and strong correlations from 0.7 to 1 (or --0.7 to --1)^83^. Determining acceptable correlation values when evaluating research studies that examine the reliability and validity of standardized tests is somewhat subjective and is dependent upon the type and purpose of the study. A format for completing reviews of standardized tests is provided in Box 3-3. As an example, a review of the Peabody Developmental Motor Scales--3,^1^ is provided in Box 3.4.Guidelines for evaluating normative data, reliability, and validity are described below. \ \ **\Evaluation of the Normative Data** **Norms** provided in a test manual are not the "true norms," but estimates of the true norms based on the performance of the sample of the population used to develop the norms. Therefore, the usefulness of normative data is directly related to how much the sample used to generate the norms reflects the population from which it was drawn. This is also true for criterion-referenced tests. Typically, criteria are developed based on the performance of a sample from the population of interest. Therefore, the sample used must be considered particularly in relation to age, gender, setting and recruitment, disability, and sociocultural factors for determining how relevant the test items, scoring, and criteria are for your clients. When evaluating how strong normative data are, the first question you need to ask is, "How representative is the sample used to generate the normative data of the population that the test was designed for?" For example, if a test was designed for use with children from the United States, then the sample should be selected using a stratified, randomized procedure to ensure that demographic variables such as race, gender, socioeconomic status, and urban and rural populations are all represented, and in approximately the same proportions as in the U.S. population. It is also important that the sample you are using to compare your client's performance is relevant for your purposes. A relevant sample is one that has characteristics similar to the clients with whom you intend to use the test. It is important that you take the time to review the methods used to obtain the sample (ie. random selection, recruitment of volunteers) which should be covered in detail in the test manual. Generally, the more recent the norms the better (ideally within 10--15 years), and the larger the normative sample the better; samples obtained through a randomized process are more representative of populations being targeted than convenient samples. **\Reliability** A reliable assessment tool is one that is designed in such a way that all those who administer the test to the same individual under the same set of circumstances would obtain the same results. In addition, if the test is reliable, then children who are asked to do the test items should perform in a relatively consistent manner if given the test more than once within a reasonably short period (days or a couple of weeks). The SEM discussed earlier is one measure of **reliability**. There are three other types of reliability that you should look for when evaluating the overall reliability of an assessment tool: inter-rater, test-retest, and internal consistency. To evaluate how reliable an assessment tool is, review research studies reported in the test manual designed to evaluate all three of these types of reliability. Studies examining the reliability of an assessment tool may also be available in refereed journals after the test was published and are important sources of information about a test's reliability. \Inter-Rater Reliability **Inter-rater reliability**, also referred to as inter-rater agreement, examines the extent to which test results vary because of factors introduced into the testing situation by the test administrators. Typically, inter-rater reliability studies determine the correlation between the scores of two independent raters scoring the same child simultaneously. Inter-rater reliability is especially important when scoring involves some degree of subjectivity. If a test has strong inter-rater reliability, then two different raters (with adequate training in the test administration and scoring procedures) scoring the same child should get the exact results. There is no universal agreement as to how high the minimum acceptable inter-rater reliability coefficient should be, and factors such as the type of behaviors being measured and the range of possible scores should be taken into consideration. A standard was suggested by Anastasi^83^ of.80; however, specifically for the types of tests we use in pediatric occupational therapy, minimum values of acceptability for inter-rater reliability should be closer to.90. \Test-Retest Reliability **Test-retest reliability** refers to how consistently a group of children will perform on the same test when it is given on more than one occasion (usually just twice) in a relatively short period of time. Scores from the two testing situations are correlated with one another and represent a measure of stability of the test results over the time interval. Essentially, a test with good test-retest reliability assures that if you gave a test to a child and then gave it to the child again (within a couple of days or weeks) they would perform similarly, generating similar scores. When evaluating an assessment tool's research related to test-retest reliability, look for the number of subjects used in the study (should be \>20, and the more the better) and the length of time between testing (should be 1--2 weeks). Test-retest correlation coefficients tend to be stronger when there is a shorter interval between testing. Again, there are no hard and fast, definitive rules about what the minimal acceptable correlation coefficients should be. It is important to note that the stability of the behaviors being measured greatly impacts the test-retest reliability. For example, fine motor skills would not be expected to vary naturally near as much as social behavior, mood, or ability to attend to a task. It is important that studies examining test-retest reliability for pediatric assessments use a time interval is not too long because developmental changes could naturally occur, or too short, which would allow the children to remember the items, potentially causing a practice effect. A practice effect occurs when children tend to score slightly higher the second time because of their previous experience. In general, acceptable coefficients for test-retest reliability are those *r*-values equal to or greater than.8.^83-84^ \Internal Consistency A test is believed to have **internal consistency** when the individual test items positively correlate with one another. For example, if a test is designed to measure fine motor ability, then, in theory, all of the test items should be related to or be able to measure some aspect of fine motor performance. Therefore, a child regardless of having strong or weak fine motor skills should perform somewhat similarly on all of the test items that measure this area. Studies of internal consistency often will correlate a group of scores from one half of the test with those from the other half (first half with second half; odd- versus even-numbered items), also referred to as split-half reliability. Various more sophisticated statistical techniques are commonly used to calculate reliability coefficients for measuring internal consistency (e.g., Kuder-Richardson and Cronbach's alpha)^85^ that specifically account for the ways of grouping the test items and for allowing the total length of the test to be considered in the analyses. The split-half method reduces the test length in half resulting in a slightly lower coefficient than if the whole test was accounted for in the calculation. High correlations indicate that the test is measuring a homogeneous construct and tests with acceptable internal consistency have correlation coefficients or Cronbach's alpha values greater than.6 according to Anastasia.^83^ **\Validity** **Validity** may simply be referred to as the extent to which a test actually measures what it intends to measure.^87^ However, it is a much more complex idea than it seems. What type of evidence is needed in order to interpret test scores as a sufficient indication of a specific function like social competency or sensory processing? How many test items, and what type of test items are needed to measure comprehensively a certain skill or function? How well does a test designed to predict kindergarten readiness actually predict success or failure in early elementary school? Is it ethical to justifiably allow or disallow a child to enroll in a specific program based on his or her test scores? What evidence do we have to support the kinds of test score interpretations we make, and how we use test scores? Messick^86^ defined validity as an integrated, evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of our inferences and actions based on test scores or other modes of assessment. He further emphasized the importance of the meaning, relevance, clinical utility, and value implications of scores as a basis for action, and of concerning ourselves with the social consequences of their use. Test validation, then, is scientific inquiry into score meaning. Although Messick^86^ emphasized that validity is a unitary idea, different types of validity (content, criterion-related, and construct) have been described in the literature, and these different types are often addressed separately in the test manuals of pediatric standardized assessments. Therefore, each of these facets of validity is discussed separately in the following subsections; however, be mindful that various forms of validity overlap greatly, and that they all address the issue of score meaning and examine the extent to which the test is measuring what it purports to measure. \Content Validity **Content validity** is the extent to which the test items of a particular test adequately and accurately sample the skill areas or behaviors it is designed to measure while not being contaminated by items measuring other types of behaviors or skills. For example, if a test is designed to measure fine motor performance, there should be enough items to adequately measure all aspects of fine motor skill (such as eye-hand coordination, speed and dexterity, grasp patterns, visual-motor control, and functional fine motor tasks). There should be evidence in the manual that items addressing all facets of fine motor skill were systematically analyzed and then the best items were selected. It is never feasible to include all possible test items, which would make tests too long. Often, evidence for selecting adequate test content is derived from a panel of experts who determine how each test item relates to each domain being tested. The experts then comment on the thoroughness of the items for measuring each of the domains/constructs the test is designed to measure. Then, statistical techniques are used to analyze the items and to help select the "best" or most meaningful items. A table of specifications may be included in the test manual to summarize how test items relate to each of the domains, and there should be a description in the manual regarding the selection process, with a clear rationale for the test items that were ultimately selected. \Construct Validity **Construct validity** addresses the extent to which the test measures the construct or domain it purports to measure. Common constructs measured in pediatric occupational therapy include fine motor performance, gross motor function, adaptive behavior, sensory processing and integration, play, self-help skills, occupational performance, self-esteem, and social skills. Many constructs are largely abstract entities that cannot be measured directly. For example, a performance-based test of sensory processing and integration really measures directly behaviors that are theoretically believed to represent sensory processing and integration since we cannot measure this brain function directly using this type of assessment tool. When a test is developed, hypotheses are generated about the relationship among variables believed to be measured by the test. Several sources of evidence are necessary to evaluate construct validity including an examination of the theory or conceptual framework supporting the domains or constructs being measured. One source of evidence comes from studies correlating the scores of a sample of children on the test of interest, with their scores on a similar test. For example, the Test of Visual Motor Skills--3^48^ and the Beery-Buktenica Developmental Test of Visual-Motor Integration^47^ both state that they measure visual-motor skills. Therefore, children should perform similarly (their scores should correlate positively) on these tests. This type of construct validity evidence may be referred to as a type of criterion-related, concurrent validity. Another source of evidence of construct validity comes from studies that compare scores from different groups of children. For example, on tests measuring areas of development or functions that are known to be acquired with age it would be expected that the scores of older children would reflect a higher level of performance than the scores of younger children. Similarly, the scores from a group of neurotypical children would be expected to reflect a higher level of performance than scores from age-matched peers with known developmental delays. Research applying multivariate statistical techniques, particularly grouping techniques such as factor and cluster analyses also provide evidence for construct validity. An example of a standardized test that has been studied extensively using these techniques is the Sensory Integration and Praxis Tests^87^. The purpose of factor analyses is to group test items that are alike or that all relate to some underlying construct. Factor analyses assists in identifying sub-scores or subscales such as different patterns of sensory processing or different areas of gross motor functioning. Assessment tools that use these techniques to support construct validity are multi-dimensional in nature or measure multiple functions that are believed to be related in some way. Another grouping technique is cluster analysis. This technique aims to cluster like individuals together meaningfully so that patterns of dysfunction, specific problem areas, or diagnostic groupings can be identified based upon the pattern of scores obtained. \Criterion-Related Validity **Criterion-related validity** is the ability of a test to predict performance on other measures or activities. A test may have predictive validity which is the ability to predict future performance on some criterion, or concurrent validity which is the ability to predict performance on some criterion within the same timeframe. To evaluate concurrent validity, the scores of a sample of children are correlated with their performance on the criterion measure. For example, if a test of visual-motor skills is believed to predict handwriting performance, then scores on the visual-motor test would be expected to be positively correlated with scores on a test that measures handwriting ability. Predictive validity studies examine the relationship between a test score and performance on some criterion measured in the future. For example, it is of interest to occupational therapists and other child development experts to identify infants and preschool children who are at risk for developmental problems or for school problems once they reach school age. Using a test of development that can predict reasonably well whether a child is at risk for developing problems later in life may detect concerns early on so that appropriate early intervention services can be-initiated. **\USING STANDARDIZED ASSESSMENT TOOLS FOR OUTCOMES-BASED RESEARCH** Today it is important for all health and human service professionals to conduct services using evidence-based practices. This includes selecting assessment tools that have strong validity and reliability data, and choosing intervention approaches that have research evidence demonstrating effectiveness for achieving desired therapy outcomes. Occupational therapists must be knowledgeable and confident that the therapy they provide will yield the outcomes expected based on previous research. As with many other professions, the body of evidence upon which occupational therapists can draw for clinical decision making is developing. Therefore, as a profession, we need to direct resources toward the generation of more efficacy research, and OT practitioners need to be good consumers of research to effectively apply the existing research for clinical decision making. Standardized, occupation-based assessment tools like those listed in Table 3-2 are ideal for use in research studies aiming to evaluate the effectiveness of occupational therapy interventions. In addition, as noted earlier, these assessments are also clinically useful for determining client priorities for intervention planning, identifying important, relevant therapy outcomes, and for measuring and documenting progress. In order to conduct rigorous clinical research, sound measures are essential in occupational areas such as play, school activities, self-care skills, and social and community participation. Assessment tools that measure levels of participation in meaningful activities associated with these occupational areas, and levels of satisfaction with performance and participation, are also useful outcome measures. In clinical practice and research, the collection of objective data for demonstrating and documenting progress for individual clients can also be achieved through a technique called goal attainment scaling (GAS). GAS is particularly useful when desired outcomes are so individualized that they cannot be measured adequately using a standardized assessment tool. GAS requires therapists to craft individual therapy goals using a special method. Each goal is written using a 5-point scale (--2 to +2) where the midpoint (0) represents the expected or probable level of performance of the desired outcome skill or behavior (see Mailloux and colleagues).^88^ Measurable behaviors are then identified at each level of the scale. To illustrate, an example of a goal related to the achievement of wheelchair mobility skills for a child is provided in Box 3-5. An advantage of GAS is that only those functional outcomes that are relevant for the child with whom you are working are measured. In a research context, GAS allows for group studies examining the effectiveness of occupational therapy services to be conducted using a sample of children who have different therapy goals. More information on applying GAS in OT practice and research is provided by Mailloux and colleagues^88^ and Ottenbacher and Cusiak.^89^ \ In selecting evaluation measures for the purposes of conducting efficacy research for building the body of evidence upon which therapists can draw for clinical decision making, it is also important to consider a test's psychometric properties discussed in this chapter, the specific purposes for which the test was designed, and the sensitivity of the measure for detecting change in performance. Most importantly, be reminded of the occupational lens through which occupational therapists view the success of their interventions, and ultimately you will see that the occupation-based measures listed in Table 3-2 yield the most important information about your client's desired outcomes and the potential effectiveness of occupational therapy. **\ CULTURAL AND ETHICAL RESPONSIBILITIES WHEN USING STANDARDIZED TESTS** As an occupational therapist also must abide by AOTA's Code of Ethics throughout all aspects of your service delivery, and some ethical principles apply to the use of standardized testing. It is important for you to understand the purposes and limitations of the tests that you use. For test scores to be valid, tests must be administered in the standardized manner detailed in the manual. If you must deviate from the standardized procedures to accommodate for child needs or limitations, you must disclose this in your evaluation report and interpret scores cautiously. You may decide to use descriptive interpretations only instead of reporting the scores. Assessment information obtained through administration of the test can still be valuable as clinical observation data and can be used in the interpretation process. It is also important as always to respect client rights, including matters related to confidentiality and the client's right of refusal to participate in or agree to procedures. The development and use of standardized tests in pediatric occupational therapy is quite extensive and has allowed for a more scientific, objective approach to evaluation. Although the use of standardized tests enhances the credibility of our profession and provides us with an objective means to identify problem areas and to detect change or progress, these tests are not without drawbacks. First, many tests are expensive, and it may be labor intensive to become competent in their use. Second, young children are often difficult to test due to behavioral problems such as inattention or inability to comprehend directions. Third, most tests were normed on typical children instead of on the clinical populations often seen by occupational therapists. While normative data are useful, without clinical populations represented in the test development process, the ability of the test for evaluating client progress is limited. Fourth, some standardized tests are limited in their usefulness for developing intervention programs, and, for some populations, such as children with severe and profound disabilities, relatively few tests are available. Finally, when reporting scores, it is important to explain standardized test scores in language that can easily be understood by all interested parties including parents and other caregivers. **\Summary** This chapter provided information about the use of standardized assessment tools including factors to consider when selecting specific tests and how to evaluate the psychometric properties of a test. The interpretation of standardized test scores was covered in detail. Ethical and professional considerations were discussed and lists of available pediatric standardized assessment tools organized by skill/domain area were provided. Although standardized assessment tools provide useful, objective assessment information, test scores should never be a substitute for sound clinical judgment, and test scores should be considered as only one of many sources of data that you will use in your evaluation process. **\Chapter Review Questions\ \** 1\. What are the main characteristics and advantages of using criterion-referenced and norm-referenced standardized tests? 2\. Why might all types of validity evidence be considered as contributing to the test's construct validity? 3\. What are some of the most psychometrically sound measures of occupational performance available to us in our work with children? What types of standardized tests are we lacking, representing areas for research and development? 4\. In what ways do standardized tests contribute to research examining the effectiveness of OT interventions and the promotion of evidence-based practices? 5\. What would be effective means for helping you to learn to administer a standardized assessment tool? 6\. Why is it important for occupational therapists to understand the characteristics of the normative sample of a norm-referenced test they would like to use? **\References** **\** 1\. Folio R, Fewell R. *The Peabody Developmental Motor Scales.* 3rd edition, Austin, TX: Pro-Ed , Pearson Assessments, 2023. 2\. Haley S, Coster W, Ludlow L, et al. *Pediatric Evaluation of Disability Inventory*. San Antonio, TX: Psychological Corp; 1992. 3\. Frankenburg WK, Dodds J, Archer P, et al. *Denver II Technical Manual*. Denver, CO: Denver Developmental Materials, Inc;1992. 4\. Provence S, Erikson J, Vater S, et al. *Infant-Toddler Developmental Assessment-2 (IDA-2).* Austin TX, PRO-ED Publishing, 2016; available from ida2.org 5\. Furuno S, O'Reilly K, Hosaka CM, et al. *The Hawaii Early Learning Profile.* Palo Alto, CA: VORT; 2004. 6\. Newborg J. *Battelle Developmental Inventory.* 3rd. Itasca, IL; Riverside; 2020. 7\. Miller LJ. *The First Step Screening Tool*. San Antonio, TX: Psychological Corporation; 1993. 8\. Harrison P, Oakland T. *Adaptive Behavior Assessment System-3.* Torrance, CA: Western Psychological Services; 2015. 9\. Bayley N. & Aylward, GP. *Bayley Scales of Infant Toddler Development Screening Test,* San Antonia, TX: Pearson Assessments; 2019. 10\. Bayley N. & Aylward, GP. *Bayley Scales of Infant Development.* 4^th^ edition San Antonia, TX: Pearson Assessments; 2019. 11\. Mullen EM. *Mullen Scales of Early Learning.* AGS ed. Circle Pines, MN: American Guideline Service, Inc; 1995. 12\. Squires, J. & Bricker, D. *Ages and Stages Questionnaires, 3^rd^ Edition* (ASQ-3), Paul H. Brookes Publishing; 2009 13\. Haley S, Coster W, Dumas H, et al. The PEDI-CAT, Pediatric Evaluation of Disability Inventory---Computer Adaptive Test; CRE Care. Boston, MA: Boston University Health and Disabilities Research Institute; 2018. http://pedicat.com/category/home/. 14\. Amundson SJ. *Evaluation Tool of Children's Handwriting (ETCH)*. Homer, AK: O.T. Kids, Inc; 1995. 15\. Reisman JE. *Minnesota Handwriting Assessment*. San Antonio, TX: Psychological Corporation; 1999. 16\. Feifer, SG. (FAW) Feifer Assessment of Writing. 2020, PAR Inc. 17\. Fisher AG, Bryze K, Hume V, et al. *School AMPS: School Version of the Assessment of Motor and Process Skills*. Fort Collins, CO: Three Star Press; 2005. 18\. Fisher AG, Bray K. *Assessment of Motor and Process Skills, Development, Standardization and administration Manual.* 7th ed. Fort Collins, CO: Three Star Press; 1999. 19\. Sparrow SS, Cicchetti DV & Saulnier CA. *Vineland-3 Adaptive Behavior Scales*, San Antonio TX; 2016. 20\. Law M, Baptist S, Carswell S, et al. *Canadian Occupational Performance Measure.* 5th ed. Toronto, ON: CAOT Publications, Canadian Occupational Therapy Association; 2014. 21\. Keller J, Kafkes A, Basu S, et al. *A Users Guide, to the Child Occupation Self-Assessment (COSA) Version 2.2.* Chicago, IL: MOHO Clearinghouse; 2014. 22\. Hamilton BB, Granger CU. *Functional Independence Measure for Children (Wee-FIM II System) Version 6.0.* Buffalo, NY: Research Foundation of the State University of New York; 2006. 23\. Skard G, Bundy A. Test of playfulness. In: Parham LD, Fazzio LS, eds. *Play in Occupational Therapy for Children.* 2nd ed. St. Louis, MO: Mosby--Year Book; 2008: 71--94. 24\. Knox S. Development and current use of the revised Knox preschool scale. In: Parham LD, Fazio L, eds. *Play in Occupational Therapy for Children.* 2nd ed. St. Louis, MO: Elsevier; 2008: 55--70. 25\. Stagnitti, K. *Child-initiated Pretend Play Assessment-2 (*ChIPPA-2). 2020, available from learntoplaytherapy.com. 26\. King G, Law M, King S, et al. *Children's Assessment of Participation and Enjoyment/Preference for Activities of Children (CAPE/PAC)*; San Antonia, TX: Harcourt Assessment/Psychological Corporation; 2004. 27\. Piper M, Darrah J. *Alberta Infant Motor Scale (AIMS*). Philadelphia, PA: W. B. Saunders; 1994. 28\. Campbell, SK. *The Test of Infant Motor Performance Users manual*, *Version 3 for the TIMP Version 5*, Chicago IL: Infant Motor Performance Scales LLC, 2012, available from 29\. Ellison P. *The INFANIB: A Reliable Method for the Neuromotor Assessment of Infants*. Tucson, AZ: Therapy Skill Builders; 1994. 30\. Miller LJ, Roid GH. *The T.I.M.E. Toddler and Infant Motor Evaluation: A Standardized Assessment*. Tucson, AZ: Therapy Skill Builders; 1994. 31\. Ayres AJ. *Sensory Integration and Praxis Tests.* Rev. manual. Los Angeles, CA: Western Psychological Services; 2004. 32\. Dunn W. *Sensory Profile-2: User's Manual*. San Antonio, TX: Pearson Assessments; 2014. 33\. Brown C, Dunn W. *Adolescent/Adult Sensory Profile: User's Manual.* San Antonia, TX*:* The Psychological Corporation; 2002. 34\. DeGangi GA, Greenspan SI. *Test of Sensory Function in Infants*. Los Angeles, CA: Western Psychological Services; 1989. 35\. Parham D, Ecker C, Kuhaneck HM, et al. *Sensory Processing Measure-2 (SPM-2).* Torrance, CA: Western Psychological Services; 2021. 36\. Mulligan S, Schoen S, Miller LJ., Valdez A., & Magalhaes D. The Sensory Processing 3-Dimensions Scale: Initial studies of reliability and item analyses. *Open Journal of Occupational Therapy,* 7(1), 1-12; 2019. DOI: 10.15453/2168-6408.1505 37\. Blanche E., Reinoso G., & Blanche Keifer D. *Structured Observations of Sensory Integration-Motor (SOSI-M) and Comprehensive Observations of Proprioception (COP-R).* Framingham, MA: Therapro; 2021 38\. Mailloux Z, Grady-Dominguez P, Petersen J, Parham LD, Roley SS, Bundy A, Schaaf RC. *Evaluation in Ayres Sensory Integration® (EASI)* Vestibular and Proprioceptive Tests: Construct Validity and Internal Reliability. Am J Occup Ther. 2021 Nov 1;75(6):7506205070. doi: 10.5014/ajot.2021.043166. PMID: 34792540. 39\. Mutti M, Martin N, Sterling H, et al. *Quick Neurological Screening Test (QNST).* 3rd edition, revised Los Angeles, CA: Western Psychological Services; 2017. 40\. Bruininks RH, Bruininks B. *Bruininks-Oseretsky Test of Motor Proficiency.* 2nd ed. Circle Pines, MN: American Guidance Service; 2005. 41\. Russell D, Rosenbaum P, Avery L, et al. *Gross Motor Function Measure (GMFM-66 & GMFM-88).* London, UK: Mac Keith Press; 2002. *Clinics in Developmental Medicine*, vol 159. 42\. Ulrich D. *Test of Gross Motor Development.* 3rd ed. Austin TX: ProEd; 2019. 43\. Henderson SE & Barnet, A. *Movement Assessment Battery for Children-3 (M-ABC-3)* Austin, TX: Pearson Assessments, 2023. 44\. Dematteo C, Law M, Russell D, et al. The Quality of Upper Extremity Skills Test (QUEST). Hamilton, ON: CanChild Center for Disability Research. www. canchild.ca/en/measures.quest.asp. 45\. Miller LJ. *Miller Function and Participation Scales*. San Antonio, TX. Pearson; 2006. 46\. Hammill D, Pearson NA, Voress JK. *Developmental Test of Visual Perception (DVPT-3).* Austin, TX: Pearson Assessments; 2013. 47\. Beery K, Buktenica N, Beery N. *Beery-Buktenica Developmental Test of Visual-Motor Integration (VMI).* 6th ed. San Antonio, TX; Pearson, PsychCorp; 2010. 48\. Martin N. *Test of Visual--Motor Skills.*3rd ed. Ann Arbor, MI: Academic Therapy Publications; 2006. 49\. Martin N. *Test of Visual-Perceptual Skills (non-motor) 4^th^*. edition, Ann Arbor, MI: Academic Therapy Publications; 2017. 50\. Colarussi R, Hammill D. *The Motor--Free Visual Perception Test.* 4th. Ann Arbor, MI: Academic Therapy Publications; 2015. 51\. Adams W, Sheslow D. *Wide Range Assessment of Visual Motor Ability (WRAVMA)*. Lutz, FL: Psychological Association Services; 1995. 52\. Zeitlin S, Williamson GG, Szczepanski M. *Early Coping Inventory.* Bensenville, IL: Scholastic Testing Service; 1988. 53\. Brazelton TB., & Nugent, JK. *The Neonatal Behavioral Assessment Scale*, *3^rd^ Edition*, Mac Keith Press, Cambridge University Press; 1995. 54\. Greenspan S, DeGangi G, Wieder S. *Functional Emotional Assessment Scale* (FEAS) for Infancy and Early Childhood: Clinical and Research Applications, Bethesda, MD: The Interdisciplinary Council on Developmental and Learning Disorders; 2001. 55\. Bagnato S, Neisworth JT, Salvia J, et al. *Temperament and Atypical Behavior Scales*. Baltimore, MD: Brookes Publisher; 1999. 56\. Carter AS, & Briggs-Gowan M. The Infant-Toddler Social Emotional Assessment (ITSEA), San Antonia, TX: PsychCorp; 2005. 57\. Briggs-Gowan et al*. Brief Infant Toddler Social Emotional Assessment (BITSEA)*, PsychCorp, 2006. 58\. Gioia G, Isquith P, Guy S, & Kentworthy, L. *Behavior Inventory of Executive Function-2 (BRIEF),* PAR Publisher, available from par.in.com; 2015. 59\. John K, Gammon GD, Prusoff BA, et al. The Social Adjustment Inventory for Children and Adolescents (SAICA): Testing of a new semi-structured interview. *J Am Acad Child Adolesc Psychiatry*. 1987; 26:898--911. 60\. Elliot S. *Social Skills Improvement Rating Scales*. Circle Pines, MN: American Guidance Service, Inc; 2008. 61\. Fisher AG, Griswold LA. *Evaluation of Social Interaction.* 4^th^ edition. Ft. Collins, CO: Three Star Press; 2018. 62\. Beck AT, Steer RA, Brown GK. *Beck Depression Inventory-II*. San Antonio, TX: Pearson Education, Inc; 1996. 63\. Reynolds CR, Kamphaus RW. *Behavior Assessment Rating Scale (BASC)*. 2nd ed. San Antonio, TX: Pearson Education, Inc; 2004. 64\. Achenbach TM. *Child Behavior Checklist for Ages 6-18 (CBCL/6-18)*. Burlington, VT: ASEBA; 2001. 65\. Zeitlin S. *Coping Inventory*. Bensonville, IL: Scholastic Testing Services, Inc; 2004. 66\. Basu S, Kaflkes A, Geist R, et al. *Pediatric Volitional Questionnaire (PVQ), Version 2.1*. Chicago, IL: MOHO Clearinghouse; 2008. 67\. Piers EV, Harris DB, Herzberg DS. *Piers-Harris Self Concept Scale,* 2nd ed. Torrance, CA: Western Psychological Services; 2002. 68\. Yarrow LJ, Rubenstein JL, Pederson FA. *Infant and Environment: Early Cognitive and Motivational Development*. New York, NY: Wiley; 1975. 69\. Metro Toronto Community Services Incorporated. *Child Care Centre Accessibility Checklist*. Toronto, ON: Metro Toronto Community Services Incorporated; 1991. 70\. Stern GG, Walker WJ. *Classroom Environment Index*. Syracuse, NY: Evaluation Research Associates; 1971. 71\. Poresky RH. Environmental assessment index: reliability, stability and validity of the long and short forms. *Educational Psychological Measurement*. 1987;47:969--975. 72\. Caldwell BM, Bradley RH. *Home Observation for Measurement of the Environment: Administration Manual*, 3^rd^ edition, Tempe, AZ: Family & Human Dynamics Research Institute, Arizona State University; 2003. 73\. Harms T, Cryer D, Clifford RM. *Infant/Toddler Environment Rating Scale (ITERS).Rev. ed.* New York, NY: Teachers College Press; 2006. 74\. Coulton CJ. Developing an instrument to measure person-environment fit. *J Soc Serv Res.* 1979;3:159--173. 75\. La Paro KM, Pianta RC, Stuhlman M. The classroom assessment scoring system: findings from the prekindergarten year. *Elementary School J*. 2004;104:409--426. 76\. Bridget K, Hamre BK, Mashburn AJ, et al. *Classroom Assessment Scoring System*. Charlottesville, VA: University of Virginia Press; 2005. 77\. Hemmingsson H, Egilson S, Hoffman OR, et al. *School Setting Interview (SSI) Version 3.0*. Chicago, IL: MOHO Clearinghouse; 2005. 78\. American Occupational Therapy Association. Occupational therapy code of ethics and ethics standards. *Am J Occup Ther*. 2020;74:(suppl 3):1-13. 79\. *2018 Standards of Practice for Occupational Therapy.* Available from www.aota.org. 80\. Individuals with Disabilities Education Improvement Act of 2004 (IDEA) (PL.108--446), 120 U.S.C. §1400. 81\. Ludlow, L & Haley S. Rasch model logits: Interpretation, use and transformation. *Education and Psychological Measurement*, 55:6, 967-975. 1995. 82\. Murphy KR, Davidshofer CO. *Psychological Testing: Principles and Applications-2^nd^ edition.* Englewood Cliffs, NJ: Prentice Hall; 2001. 83\. Anastasi A. *Psychological Testing.* 5th ed. New York, NY: Macmillan. 1998. 84\. Rosenkoetter, U, & Tate, R. Assessing features of psychometric assessment instruments: A comparison of the COSMIN Checklist with other critical appraisal tools*. Brain Impairment, 19*(1), 103-118. doi:10.1017/BrImp.2017.29; 2018. 85\. Urbina, S. *Essentials of Psychological Testing*. Hobken, NJ: Wiley Publications, 2004. 86\. Messick S. Validity of psychological assessment: validation inferences from person's responses and performances as scientific inquiry into score meaning. *Am Psychologist*. 1995;50(9):741--749. 87\. Mailloux Z, Mulligan S, Smith Roley S, et al. Verification and clarification of patterns of sensory integrative dysfunction. *Am J Occup Ther.* 2011;65(2):143--151. 88\. Mailloux Z, May-Benson T, Summers C, et al. The issue is goal attainment scaling as a measure of meaningful outcomes for children with sensory integration disorders. *Am J Occup Ther.* 2007;61(2):254--259. 89\. Ottenbacher KJ, Cusiak A. Goal attainment scaling as a method of clinical service evaluation. *Am J Occup Ther.* 1990;44:519--525. **[Figure Legends]** **\Figure 3-1** A child performing a balance item from the Sensory Processing 3 Dimensions Assessment^36^. **\Figure 3-2** Therapist administering a visual discrimination test item from the Sensory Processing 3 Dimensions Assessment^36^ **\Figure 3-3** Therapist administering a fine motor/visual motor item from the Peabody Developmental Motor Scales-3^1^ **\Figure 3-4** The normal distribution and associated standard scores. (Adapted from: Chang, M.C., & Richardson P. Use of standardized tests in pediatric practice. 2020. In: Clifford O'Brien's J, & Kuhaneck's Case-Smith's *Occupational Therapy for Children*, 8^th^ edition. Mary Heights, MO: Mosby Elsevier). **[TABLES]** **\Table 3-1** \STANDARDIZED DEVELOPMENTAL SCREENINGS AND EVALUATIONS +-----------------------+-----------------------+-----------------------+ | \Test Name | \Age Range | \Purpose/Domains | | (Author\[s\]) | | /Description | | | | and General Comments | +-----------------------+-----------------------+-----------------------+ | Denver 2 | 0--6 years | A screening tool | | Developmental | | designed for children | | Screening Test | | at risk for | | (DDST-2) (Frankenburg | | developmental | | and Dodds)^3^ | | problems. Domains | | | | include | | | | personal-social, fine | | | | motor adaptive, | | | | language, and gross | | | | motor. Takes 20--30 | | | | min to administer; | | | | easy to learn; | | | | psychometrics are | | | | adequate, although | | | | the number of test | | | | items is relatively | | | | small | +-----------------------+-----------------------+-----------------------+ | Infant-Toddler | 0--36 months | A comprehensive, | | Developmental | | multi-disciplinary, | | Assessment-2 | | standardized | | (Provence et al.)^4^ | | assessment system | | | | using naturalistic | | | | observation and | | | | parent report | | | | regarding eight | | | | domains: gross and | | | | fine motor, | | | | cognition, | | | | language/communicatio | | | | n, | | | | self-help, | | | | psychosocial, and | | | | emotional. | | | | Reliability and | | | | validity are | | | | adequate; assists | | | | with early | | | | intervention | | | | programming | +-----------------------+-----------------------+-----------------------+ | Hawaii Early Learning | 0--36 months | A | | Profile (Furuno et | | criterion-referenced | | al.)^5^ | | assessment tool | | | | measuring | | | | personal/social, | | | | cognition, | | | | communication, | | | | self-help, gross | | | | motor, fine motor, | | | | and visual-motor | | | | domains; a relatively | | | | quick and easy | | | | screening tool often | | | | used by | | | | interdisciplinary | | | | early intervention | | | | teams | +-----------------------+-----------------------+-----------------------+ | Battelle | 0--8 years | Norm-referenced test | | Developmental | | including five | | Inventory, 3^rd^ | | domains: | | Edition (Newborg)^6^ | | personal/social, | | | | adaptive including | | | | self-help skills; | | | | motor, expressive and | | | | receptive | | | | communication, and | | | | cognition; | | | | information is | | | | gathered through | | | | structured | | | | observations, | | | | administration of | | | | test items, | | | | interviews and | | | | scoring and can take | | | | 45min--1.5 hours; | | | | normative data are | | | | strong, and | | | | reliability and | | | | validity is adequate | +-----------------------+-----------------------+-----------------------+ | The First Step | Preschool children | A norm-referenced | | (Miller)^7^ | | screening tool | | | | designed to identify | | | | preschool children at | | | | risk for | | | | developmental | | | | problems. Domains | | | | include cognition, | | | | communication, motor, | | | | social-emotional, and | | | | adaptive behavior. | | | | Has strong | | | | psychometrics; quick | | | | to administer; | | | | training is | | | | recommended | +-----------------------+-----------------------+-----------------------+ | Adaptive Behavior | Birth--21 years | Parent survey (also | | Assessment System-3 | | has teacher forms) | | (Harrison and | | measuring adaptive | | Oakland)^8^ | | behavior using a | | | | rating scale in 10 | | | | domains including | | | | communication, motor, | | | | school skills, | | | | self-care, safety, | | | | social skills, | | | | community use, home | | | | living, work and | | | | leisure; 30 minutes | | | | to administer; | | | | psychometrics are | | | | strong | +-----------------------+-----------------------+-----------------------+ | Bayley Scales of | 1--42 months | Norm-referenced; | | Infant & Toddler | | includes cognitive, | | Development Screening | | language, and motor | | Test- Bayey-4 | | scales; and a | | Screening Test | | behavior scale | | (Bayley & Aylward)^9^ | | (social, interests, | | | | activity level). | | Bayley Scales of | | Takes 15--25 minutes | | Infant Development-4 | | to administer; well | | (Bayley)^10^ | | researched; | | | | | | | | Norm-referenced; | | | | including 5 | | | | developmental | | | | domains: physical | | | | (fine and gross | | | | motor, vision, | | | | hearing), cognitive, | | | | communication, | | | | social/emotional, | | | | adaptive behavior; | | | | Takes 45--60 minutes | | | | to administer; strong | | | | psychometrics and | | | | well researched; | | | | training is | | | | recommended | +-----------------------+-----------------------+-----------------------+ | Mullen Scales of | 0--6 years | Norm-referenced; | | Early Learning | | measures gross motor, | | (Mullen)^11^ | | fine motor, | | | | expressive and | | | | receptive language | | | | and visual reception; | | | | provides an Early | | | | Learning Composite | | | | Score; strong | | | | psychometrics | +-----------------------+-----------------------+-----------------------+ | Ages and Stages | 1-66 months | Standardized, | | Questionnaires, 3rd | | norm-referenced; | | Edition (ASQ-3, | | Measures | | Squires & | | communication, gross | | Bricker)^12^ | | motor, fine motor, | | | | problem solving, and | | | | personal -social; | | | | caregiver | | | | questionnaire; takes | | | | 15-20 minutes to | | | | administer and score; | | | | well researched with | | | | strong psychometrics | +-----------------------+-----------------------+-----------------------+ \ **\Table 3-2** \STANDARDIZED ASSESSMENTS MEASURING OCCUPATIONAL PERFORMANCE AREAS (ADL, IADL, PLAY/LEISURE, AND SCHOOL PERFORMANCE) ------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- \Test Name and Authors \Age Range \Main Purpose Pediatric Evaluation of Disability Inventory (Haley) Haley et al.^2^ 6 months--7.5 years Assesses self-care, functional mobility, and social functioning through structured interview, observation, or both. Considers level of caregiver assistance and use of adapted devices; psychometric properties are strong. Suggest using updated computer version listed below. The Pediatric Evaluation of Disability---Computer Adaptive Test (PEDI-CAT; Haley et al.)^13^ Birth through age 20 The PEDI-CAT is a revised edition of the Pediatric Evaluation of Disability Inventory^2^ (listed above). It measures abilities in the three functional domains of daily activities, mobility, and social/cognitive. It also has a responsibility domain measuring the extent to which the caregiver or child takes responsibility for managing complex, multi-step life tasks. It is completed by caregivers or others who are familiar with the child. psychometrics are strong, and research on this tool is ongoing (www.bu.edu/bostonroc/instruments/pedicat/) Evaluation Tool of Children's Handwriting (ETCH; Amundson)^14^ Grades 1--6 Criterion-referenced tool measuring cursive and manuscript handwriting, including alphabet and number writing, copying, dictation, and sentence generation. It assesses legibility, pencil grasp and pressure, hand preference, manipulative skills with the writing tool, and classroom performance; easy to learn; psychometrics are adequate. Minnesota Handwriting Assessment (Reisman)^15^ Grades 1--3 Criterion-referenced test measuring cursive and manuscript handwriting, including speed, legibility, form, alignment with baseline, size, and letter and word space. Research is limited, although research to date supports reliability and validity; easy to administer. Feifer Assessment of Writing (FAW; Feifer)^16^ Pre-K to College- age Achievement test. Norm-referenced; Measures graphomotor skills, dyslexia indicators, and executive functions that are believed to represent the motor, cognitive and linguistic processes that support written language; Takes 15-65 minutes to administer with shorter time for younger ages; School Function Assessment (Coster, Deeney, Haltiwanger, et al.)^93^ Children in kindergarten through 6th grade Criterion-referenced tool for evaluating child performance, level of participation, and need for assistance in school-related activities, including physical and cognitive tasks, and behavior; psychometric properties are strong School Assessment of Motor and Process Skills (Fisher et al.)^17^ School-aged children Child is observed during 3--5 school-related tasks in context. Process (cognitive) and motor skills are measured as they relate to school performance. Training is required; psychometrics are strong. Assessment of Motor and Process Skills (Fisher et al.)^18^ Children 3 years--adults The child performs five to six tasks from a list of calibrated ADL tasks. Process (cognitive) and motor skills are measured as they relate to task performance in a given context; used to predict performance in ADL areas. Training is required; psychometrics are strong. Vineland Adaptive Behavior Scales-3 (VABS-3; Sparrow, Cicchetti & Saulnier)^19^ Birth through 18 years Measures communication, daily living skills, socialization, and motor skills; uses a behavior rating scale that is completed through a structured parent interview; easy to administer; psychometrics are adequate. Canadian Occupational Performance Measure (Law et al.)^20^ Children of all ages; parents may complete on child's behalf Measures the child's performance and satisfaction in areas of self-care, leisure, and productivity through a structured interview. Helpful in prioritizing intervention goals and measuring functional outcomes; well researched, and psychometrics are adequate. Child Occupational Self-Assessment (Keller et al.)^21^ Children and adolescents Self-report measure completed during a structured interview, based on the Model of Human Occupation; includes a card sort version and a checklist form version, examining the child's perceived competencies in self-care, school-related activities and community participation. Functional Independence Measure (Wee-FIM; Hamilton and Granger)^22^ Child version- 6 months--6 years Universal tool designed to measure rehabilitation outcomes related to functional skills, including self-care, mobility, sphincter control, communication, and social cognition; well researched. Test of Playfulness (Skard and Bundy)^23^ Children of all ages 60-item observational tool examining playfulness, including intrinsic motivation, suspension of reality, and internal locus of control, in the context of free play; training is required; research is ongoing, and preliminary studies support it as a valid and reliable tool. Knox Preschool Scales--Revised, (Knox)^24^ Birth--3 years A rating scale for evaluating play behaviors including space management, gross motor play behaviors, materials management, pretense-symbolic play, and social participation. Research is limited for this revised edition. Child-initiated Pretend Play Assessment-2 (ChIPPA-2; Stagnitti)^25^ 3-7 years A norm-referenced test designed to assess the quality of a child's play, play styles and themes, and imaginative play skills. It is well-researched with adequate psychometrics. Adaptive Behavior Assessment System-3 (Harrison and Oakland)^8^ 0--18 years Parent survey (also has teacher forms) measuring adaptive behavior using a rating scale in 10 domains including communication, motor, school skills, self-care, safety, social skills, community use, home living, work and leisure; 30 minutes to administer; psychometrics are strong. Children's Assessment of Participation and Enjoyment and Preferences for Activities of Children (King et al.)^26^ 6--21 years Questionnaire examining level of participation in recreation/leisure activities, and children's activity preferences in recreational, active-physical, social, skill-based and self-improvement domains. \ADL, Activities of daily living. ------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- **\Table 3-3** \STANDARDIZED TESTS OF SENSORY PROCESSING AND INTEGRATION, GROSS AND FINE MOTOR SKILLS, POSTURAL CONTROL AND MOTOR PLANNING ---------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- \Test Name (Author\[s\]) \Age Range \Purpose/Domains/Description and General Comments Alberta Infant Motor Scale (Piper and Darrah)^27^ 0--18 months Norm-referenced; measures gross motor movements and quality of movement. Consists of 58 items with the child observed in prone, supine, sitting, and standing; takes about 30 min to administer; second edition scoring sheets are available; psychometrics are strong; well researched. Infant Motor Performance Scales (TIMP Version 3; Campbell)^28^ 34 weeks gestation -- 4 months Norm-referenced test measuring reflex integration, postural control, and the development of antigravity positions and movement. Quick and easy to administer and score; helpful with premature infants; psychometrics are fair; research is limited. Infant Neurological International Battery (Ellison)^29^ 1--15 months Norm-referenced, 20-item test measuring neuromotor behavior and competency, including primitive reflexes, hand and head positions, and movement. Quick and easy to administer; normed on 305 infants; reliability and validity measures are fair; research is limited. Toddler and Infant Motor Evaluation (Miller and Roid)^30^ 4 months--3.5 years Norm-referenced test with eight subtests measuring mobility, stability, motor organization, functional performance, social/emotional abilities, movement component analysis, movement quality, and atypical positions. Complex to administer, but thorough test of quality of movement and postural control; psychometrics are strong. Sensory Integration and Praxis Tests (Ayres)^31^ 4 years--8 years, 11 months Norm-referenced test consisting of 17 tests measuring nonmotor visual perception, praxis, somatosensory, vestibular processing, and sensory motor skills. Psychometrics are strong except for test-retest reliability of four tests; extensive training is required; well researched; computer scored. Sensory Profile-2 (Dunn)^32^; Adolescent/Adult Sensory Profile (Dunn and Brown)^33^ Infant/toddler form birth--3 years; Child form 3-15 years; Adolescent /Adult version 12 years and older Questionnaire requiring caregivers to rate child behaviors believed to measure aspects of sensory processing, modulation, and emotional/behavioral responses to sensory input; Also has a school form for teachers to complete; Psychometrics are strong; easy to administer and score. Adolescent/adult version is a self-report measure. Test of Sensory Function in Infants (DeGangi and Greenspan)^34^ 4--18 months Provides an overall measure of sensory processing and reactivity; includes 24 items measuring reactivity to tactile deep pressure and vestibular input, adaptive motor functions, visual-tactile integration, and ocular-motor control. Reliability and validity are adequate. Sensory Processing Measure-2 (Parham, Ecker et al.);^35^ 4 months-87 years Provides a measure of sensory processing with norm-referenced scores for 8 areas: social participation, praxis, visual and auditory processing, touch/tactile processing, body awareness/proprioception, vestibular processing, and olfactory/gustatory. Has separate forms for infants/toddlers, children, and adults and has school forms; Psychometrics are adequate; well researched;. Sensory Processing 3 Dimensions Assessment (SP3D; Miller, Schoen, & Mulligan)^36^ 3 years through adulthood Norm-referenced, comprehensive assessment of sensory processing and integration; the performance measure incudes visual, tactile, vestibular, proprioceptive and auditory domains measuring sensory modulation behaviors and perception/discrimination functions, and praxis and postural domains; Also includes supplemental caregiver or self-report sensory inventory, and an occupational performance measure. Preliminary research on psychometrics are positive; anticipated publication by Western Psychological Services, 2024. Structured Observations of Sensory Integration-Motor (SOSI-M) and Comprehensive Observations of Proprioception (COP-R; Blanche, Reinoso, & Blanche Keifer)^37^ 5 through 14 years of age Norm-referenced; assesses proprioceptive and vestibular processing, motor planning and postural control based on Ayres clinical observations; the SOSI-M contains 14 items, while the COP-R is a behavioral observation tool focusing on the processing of proprioceptive sensory input; takes 30-40 minutes to administer; initial psychometrics are adequate; research is minimal Evaluation of Ayres Sensory Integration (EASI; Mailloux, Roley et al.)^38^ 3-12 years Norm-referenced; measures sensory perception, postural/ocular/bilateral motor integration, praxis, and sensory reactivity; training is required; information, research and normative data collection is ongoing; training and materials can be found at ; Quick Neurological Screening Test-3rd edition, revised (Mutti et al.)^39^ 5 years through adulthood Screening tool consisting of 15 test items measuring neurologic functions, including fine and gross motor control, motor planning, spatial organization, visual and auditory perception, and balance. Quick and easy to learn and administer; studies reported in the manual demonstrate adequate reliability and validity. Peabody Developmental Motor Scales--3 (Folio and Fewell)^1^ 1 month through 6 years Norm-referenced and criterion-referenced test measuring fine and gross motor skills. Takes about 60 minutes to administer and is easy to learn; psychometrics are adequate with strong normative data; new version not well researched to date (see review at end of this chapter). Bruininks-Oseretsky Test of Motor Proficiency-2 (Bruininks and Bruininks)^40^ 4.5--14.5 years Norm-referenced test including nine subtests measuring fine and gross motor skills. Validity is strong; reliability measures for composite scores are strong; well researched; takes about 45 minutes to administer; fairly easy to learn. Gross Motor Function Measure--Revised (Russell et al.)^41^ 5 months--16 years Criterion-referenced observational tool designed to measure gross motor functions in children with cerebral palsy and Down's syndrome. Consists of 88 items; administration time is 30--45 minutes; easy to learn; psychometrics are strong. Test of Gross Motor Development-3 (Ulrich)^42^ 3--11 years Norm-referenced assessment measuring basic gross motor skills; includes two subtests: locomotor and ball skills. Quick (about 20 minutes) and easy to administer; psychometrics are adequate; not well researched. Movement Assessment Battery for Children-3 (M-ABC-3; Henderson & Barnett)^43^ 3-25 years Norm-referenced; measures fine and gross motor skills including manual dexterity, ball skills, static and dynamic balance; takes 30-40 minutes to administer and score; psychometrics fair, not well researched; 3^rd^ edition available 2023. Quality of Upper Extremity Skills (Dematteo et al.)^44^ 18 months--8 years Criterion-referenced tool evaluating quality of upper extremity movement and hand function in children with cerebral palsy, including dissociated movements, grasp, protective extension, and weight bearing. Easy to administer; psychometrics are strong. Miller Function and Participation Scales (Miller)^45^ 2 years 6-months--7 years, 11 months Norm-referenced; measures fine motor, gross motor and visual-motor skills in the context of functional, play and school-related tasks; psychometrics are strong. ---------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- \ **\Table 3-4** \STANDARDIZED ASSESSMENTS OF VISUAL-MOTOR AND VISUAL-PERCEPTUAL SKILLS ------------------------------------------------------------------------------------------------- ---------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ \Test Name (Author\[s\]) \Population \Description/Domains/General Comments Developmental Test of Visual Perception, 3rd edition (Hammill et al.)^46^ 4--13 years Norm-referenced tool measuring: Eye-hand Coordination, copying, spatial relations, position in space, figure-ground, visual closure, visual-motor, speed, and form constancy; Administration time is 30--40 minutes; strong normative data, reliability, and validity. Beery-Buktenica Developmental Test of Visual-Motor Integration 6^th^ Edition (Beery et al.)^47^ 2--through adulthood Norm-referenced, design copy test. Quick and easy to administer; includes a non-motor visual-perceptual screening and a motor coordination screening test; psychometric properties are strong. Test of Visual-Motor Skills-3 (Martin)^48^ 3--14 years Norm-referenced design copy test. Quick and easy to administer. Unique in that type of errors are classified and scored to give qualitative information; psychometric properties are adequate; quick and easy to administer. Test of Visual-Perceptual Skills (nonmotor), Third Edition (TVPS-4) (Martin)^49^ 5--21 years Norm-referenced test of visual perception measuring figure-ground, spatial relations, visual memory and discrimination, visual sequential memory, visual form constancy, and visual closure; Quick and easy to administer; psychometrics are adequate. Motor-Free Visual Perception Test, 4th Edition (Colarussi and Hammill)^50^ 4--80+ years Norm-referenced test of visual perception measuring figure-ground, spatial relations, visual memory and visual closure, and visual discrimination. Quick and easy to administer; psychometrics are adequate, well researched. Wide Range Assessment of Visual Motor Ability (Adams and Sheslow)^51^ 3--18 years Norm-referenced; assesses visual-spatial, fine motor and integrated visual-motor skills through design copy, visual-spatial matching and peg-board tasks; psychometrics are strong. ------------------------------------------------------------------------------------------------- ---------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ **\Table 3-5** \ASSESSMENT TOOLS FOR EVALUATING PSYCHOLOGICAL, SOCIAL, BEHAVIORAL, AND EMOTIONAL AREAS +-----------------------+-----------------------+-----------------------+ | \Test Name | \Age Range | \Description and | | (Author\[s\]) | | Purpose | +-----------------------+-----------------------+-----------------------+ | Early Coping | Infants and toddlers | Observational | | Inventory (Zeitlin et | | instrument that | | al.)^52^ | | evaluates | | | | sensorimotor | | | | organization, | | | | reactive behaviors, | | | | and self-initiated | | | | behaviors, all | | | | believed to be | | | | important for | | | | effective coping | +-----------------------+-----------------------+-----------------------+ | Neonatal Behavioral | Infants | Rating scale that | | Assessment Scale | | assesses reflex | | (Brazelton)^53^ | | behavior and motor | | | | maturity, responses | | | | to sensory stimuli, | | | | temperament, and | | | | adapting and coping | | | | strategies | +-----------------------+-----------------------+-----------------------+ | Functional Emotional | 3--48 months | Evaluates emotional | | Assessment Scale | | and social capacities | | (Greenspan et | | throughout different | | al.)^54^ | | stages of sensory | | | | motor and cognitive | | | | development; a rating | | | | scale to assist in | | | | organizing and | | | | interpreting | | | | unstructured | | | | observations of the | | | | child and the child | | | | with his or her | | | | caregiver(s) | +-----------------------+-----------------------+-----------------------+ | Temperament and | Birth--3 years | Rating scale | | Atypical Behavior |

Use Quizgecko on...
Browser
Browser