SS 104 - Test & Measurement in Human Movement Lecture Notes (PDF)

SS 104 – TEST & MEASUREMENT IN HUMAN MOVEMENT (PART 1) Sports and physical education professionals can effectively use tests and measurements to assess students and athletes and help them achieve their goals and maximize their potential. Tests and measurements form the objective core of the evaluation process. Reasons for Test, Measurement and Evaluation in Sports and PE 1. Motivation – encourage people to become better 2. Diagnosis – assess strengths and weaknesses. Ex: determining areas for improvement in fitness 3. Classification – classify groups according to an attribute. Ex. According to age, skill level, or fitness level 4. Evaluation of Instruction & Programs – To determine if exercise, health, or sports programs were successful 5. Prediction – To predict future success based on skill level, athleticism, etc. Ex. NFL and NBA combine 6. Research – To study and answer questions; add new knowledge or support existing knowledge While some physical abilities are innate and not amenable to change, other physical abilities can be improved through physical training. Tests can be used by teachers and coaches to determine which deficits can be addressed by participating in prescribed group or individual programs. Testing Terminologies Variable – a trait or characteristic that can assume any given value. Ex: name, age, height Test – An instrument, tool, or process used to make a particular measurement Measurement – The collection of numerical data Evaluation – The interpretation or judgment about a particular measurement Field Test – a test used to assess ability that is performed away from the laboratory and does not require extensive training or expensive equipment Statistics – The collection, organization, analysis and presentation of data Pretest – A test administered at the beginning to determine initial characteristic or ability level Posttest – A test administered after a period of time, usually after an intervention like a training program, to determine changes from the pretest Test Battery – A series of tests that are designed to take specific measurements of performance or capacity More Statistical Terms Data – numerical result of measurement Population – Refers to all members in a defined group Sample – A subgroup of the population Parameter – A value or characteristic of a population Statistic – A value or characteristic of a sample Descriptive Statistics – statistics that describes or summarizes a given data set Inferential Statistics – statistics that aims to draw conclusions beyond the immediate data Levels of Measurement of Variables Nominal Scale – Describes the identity of a variable but has no numerical value. Used simply for labeling. Ex: name, nationality, marital status, gender Ordinal Scale – Describes the order of the values of the variables relation to each other. Ex: 1st-2nd-3rd, Gold-Silver-Bronze, Excellent-Good-Average-Poor Interval Scale – Compares the values but has no “true zero” point. May have negative values. Ex: Temperature in F or C, Score in Golf, Likert scales in surveys Ratio Scale – Variables that have specific values and have a “true zero”. Cannot have negative values. Ex: Height, Weight, Population, Correct answers in an exam DESCRIPTIVE AND INFERENTIAL STATISTICS When analyzing data, such as the exam scores of 100 students, it is possible to use both descriptive and inferential statistics in your analysis. Typically, in most research conducted on groups of people, you will use both descriptive and inferential statistics to analyze your results and draw conclusions. Descriptive Statistics Descriptive statistics is the term given to the analysis of data that helps describe, show or summarize data in a meaningful way such that patterns might emerge from the data. Descriptive statistics do not, however, allow us to make conclusions beyond the data we have analyzed or reach conclusions regarding any hypotheses we might have made. Descriptive statistics are very important because if we simply presented our raw data it would be hard to visualize what the data was showing, especially if there was a lot of it. Descriptive statistics therefore enables us to present the data in a more meaningful way, which allows simpler interpretation of the data. Typically, there are two general types of statistic that are used to describe data – Measures of Central Tendency and Measures of Variability: Measures of central tendency - describe the central position of a frequency distribution for a group of data. We can describe this central position using the mean, median and mode. Mean The mean (or average) is the most popular and well known measure of central tendency. It can be used with both discrete and continuous data, although its use is most often with continuous. The mean is equal to the sum of all the values in the data set divided by the number of values in the data set. There can only be one mean. The formula for the mean is: Median The median is the middle score for a set of data that has been arranged in order of magnitude. In order to calculate the median, suppose we have the data below: 65 55 89 56 35 14 56 55 87 45 92 We first need to rearrange that data into order of magnitude (smallest first): 14 35 45 55 55 56 56 65 87 89 92 Our median mark is the middle mark - in this case, 56. It is the middle mark because there are 5 scores before it and 5 scores after it. This works fine when you have an odd number of scores, but what happens when you have an even number of scores? What if you had only 10 scores? Well, you simply take the average of the middle two scores. Mode The mode is simply the most frequent score in our data set. There can be more than on or no mode at all. In the above set of scores, the modes are 55 and 56. When to use mean, median or mode to describe data There can often be a "best" measure of central tendency with regards to the data you are analyzing, but there is no one "best" measure of central tendency. This is because whether you use the median, mean or mode will depend on the type of data you have The mean is usually the best measure of central tendency to use when your data distribution is continuous and symmetrical, such as when your data is normally distributed. The median is usually preferred to other measures of central tendency when your data set is skewed. The median is also preferred when the data has outliers because the value of the mean can be distorted by the outliers. The mode is the least used of the measures of central tendency and can only be used when dealing with nominal data. For this reason, the mode will be the best measure of central tendency (as it is the only one appropriate to use) when dealing with nominal data. Measures of Variability or Spread - these are ways of summarizing a group of data by describing how spread out the scores are. For example, the mean score of 10 students may be 81 out of 100 percent. However, not all students will have scored 81. Rather, their scores will be spread out. Some will be lower and others will be higher. To describe this spread, a number of statistics are available to us most commonly the Range and Standard Deviation: Range – Difference between the highest score and the lowest score. R = XHighest – XLowest Standard Deviation – Describes the scatter of scores around the mean. The most useful and sophisticated measure of variability. Example: For the example above, the data can be described as: “The mean exam score of the 10 students in 81.3 + 25.4 points with a range of 88 points.” Inferential Statistics We have seen that descriptive statistics provide information about our immediate group of data. For example, we could calculate the mean and standard deviation of the exam marks for the 100 students and this could provide valuable information about this group of 100 students. Often, however, you are interested in investigating the whole population but only a limited number of data exists. For example, you might be interested in the GWA of all UP Diliman students for a particular semester. It is not feasible to measure all GWA’s of ALL students in UP Diliman so you have to measure a smaller sample of students (eg: 1,000 students) which are used to represent the larger population. Inferential statistics are techniques that allow us to use these samples to make generalizations about the populations from which the samples were drawn. It is, therefore, important that the sample accurately represents the population. Here, appropriate sample size and sampling methods allows us to make more accurate conclusions beyond the available data. EVALUATION OF TEST QUALITY Test results are useful only if the test actually measures what it is supposed to measure (validity) and if the measurement is repeatable (reliability). These two characteristics are the key factors in evaluating test quality. Validity Validity refers to the degree to which a test or test item measures what it is supposed to measure, and is the most important characteristic of testing. For tests of physical properties such as height and weight, validity is easy to establish. The validity of tests of some abilities and characteristics is more difficult to establish. There are several types of validity, including construct validity, face validity, content validity, and criterion-referenced validity. Construct validity is the ability of a test to represent the underlying construct. The construct represents the theory developed to organize and explain some aspects of existing knowledge and observations. Face validity is the appearance to the test subject and other casual observers that the test measures what it is purported to measure. Content validity is the assessment by experts that the testing covers all relevant subtopics or component abilities in appropriate proportions. Sometimes referred to as expert validity. While the terms face validity and content validity are sometimes used interchangeably, content validity relates to actual validity as approved by experts while face validity relates to the appearance of validity to non-experts. Criterion-Referenced Validity is the extent to which test scores are associated with some other measure of the same ability. There are four types of criterion-referenced validity: concurrent, convergent, predictive, and discriminant. o Concurrent validity is the extent to which test scores are associated with those of other accepted tests that measure the same ability. o Convergent validity is evidenced by high positive correlation between results of the test being assessed and those of the “gold standard”. A test may be preferable over the gold standard if it exhibits convergent validity but is less demanding in terms of time, equipment, expense, or expertise. o Predictive Validity is the extent to which the test score corresponds with future behavior or performance. This can be measured through comparison of a test score with some measure of success in sport. For example, one could calculate the correlation between the overall score on a battery of tests used to assess potential for basketball and a measurement of actual basketball performance as indicated by such quantities as points scored, rebounds, assists, blocked shots, forced turnovers and steals (much like the NBA combine). o Discriminant Validity is the ability of a test to distinguish between two different constructs. Discriminant validity of tests in a battery avoids unnecessary expenditures of time, energy, and resources in administering tests that may be measuring the same component. Reliability Reliability is a measure of the degree of consistency or repeatability of a test. If an individual whose ability does not change is measured twice, very similar scores must be obtained on both times. On an unreliable test, an individual could obtain a high score on one occasion and a low score on another. A test must be reliable to be valid because highly variable results have little meaning. There are several ways to determine the reliability of a test; the most obvious one is to administer the same test twice to the same group of individuals. Statistical correlation of the scores from the two administrations provides a measure of test-retest reliability. A significant difference between the two sets indicates a variability and is due to any of the following: Intrasubject Variability is a lack of consistent performance by the person tested. Lack of Interrater Reliability is a lack of consistency in scoring between different testers conducting the same test on the same individual on separate instances. Intrarater Variability is the lack of consistent scoring by a given tester. In intrarater variability, for example, a coach eager to see improvement may unintentionally be more lenient on a posttest than on a pretest. Other causes of intrarater variability include inadequate training, inattentiveness, failure to follow standardized procedures. Failure of the test itself to provide consistent results. Sometimes the test itself is the problem due to various reasons including being in the trial stages, lack of calibration of equipment or dysfunctional equipment. TEST ADMINISTRATION To achieve accurate test results, tests must be administered safely, correctly, and in an organized manner. Staff should ensure the health and safety of participants, testers should be carefully selected and trained, tests should be well organized and administered efficiently, and participants should be properly prepared and instructed. Health and Safety Considerations Tester must be aware of testing conditions that can threaten the health of athletes and be observant of signs and symptoms of health problems that warrant exclusion from testing. Strenuous exercise, such as maximal runs or 1-repetition maximum (1RM) tests, can uncover or worsen existing heart problems, such as impaired blood flow to the heart muscle and irregular heartbeats. When aerobic endurance exercise tests are being administered in a hot environment, caution must be observed to protect both the health and safety of the participant and the validity of the test Selection and Training of Testers Test administrators should be well trained and should have a thorough understanding of all testing procedures and protocols. The testing supervisor should make sure that all novice personnel perform and score all tests correctly. It is essential that all testers have sufficient practice so that the scores they obtain correlate closely with those produced by experienced and reliable personnel. The testers should be trained to explain and administer the tests as consistently as possible. Recording Forms (Score sheets) Scoring forms should be developed before the testing session and should have space for all test results and comments. This allows test time to be used more efficiently and reduces the incidence of recording errors. At least 2 sets of recording forms should be provided – one copy for the test administrators and one copy that the participant can keep. Test Format A testing session wherein the participants are aware of testing purpose and procedures usually enhances the reliability of test measures. Test planning must address such issues as whether athletes will be tested all at once or in groups and whether the same person will administer a given test to all participants. Having the same tester assigned to a specific test eliminates the possibility of interrater variability. As a rule, each tester should administer only one test at a time, especially when the test requires complex judgments. Sequence of Tests Testers must carefully design the order of tests and duration of rest periods between tests to ensure test reliability. Tests requiring high-skill, non-fatiguing movements should be administered before tests that are likely to produce fatigue and confound the results of subsequent tests. A logical sequence, although there are some variations, is to administer tests in this order: Anthropometric tests (e.g., height, weight, skinfold and girth measurements) Non-fatiguing tests (e.g., ruler drop, flexibility, vertical jump, broad jump) Agility tests (e.g., T-test, pro agility test) Maximum power and strength tests (e.g., 1RM power clean, 1RM bench press) Sprint tests (e.g., 40-yard sprint, 100-m sprint) Local muscular endurance tests (e.g., partial curl-up test, 1-minute pushup test) Anaerobic capacity tests (e.g., 400 m run, 300-yard shuttle) Aerobic capacity tests (e.g., 1.5-mile run, 12-minute run, 3-minute step test) An effort should be made to administer aerobic tests on a different day than the other tests if possible. If performed on the same day, aerobic tests should be performed last, after an adequate rest period. Preparing Participants for Testing The date, time, and purpose of a test battery should be announced in advance to allow athletes to prepare physically and mentally. Instructions should cover the purpose of the test, how it is to be performed, the number of practice attempts allowed, the number of trials, test scoring, criteria for disallowing attempts, and recommendations for maximizing performance. The participants should be given opportunities to ask questions before and after the demonstration. After the anthropometric tests, adequate warmup should be given before the administration of the other tests as this improves reliability. An appropriately organized warm-up consists of a general warm-up followed by a specific warm-up. Both types of warm-ups include body movements similar to those involved in the test. An organized, instructor-led general warm-up ensures uniformity. It is acceptable to allow two to three activity-specific warm-up trials, depending on the test protocol. Depending on the test protocol, the score can be the best or the average of the trials. ANTHROPOMETRY AND THE COMPONENTS OF PHYSICAL FITNESS In test and measurement for sports and PE, a test will typically measure anthropometric scores and any of the following components of fitness categorized according to health-related or skill-related. Anthropometry Anthropometry is the science of measurement applied to the human body and generally includes measurements of height, weight, selected body girths as well as skinfolds and bone breadths to determine somatotypes. Health-Related components - The most important factors related to one’s health. o Cardiovascular Endurance The ability of the circulatory system (heart and blood vessels) to supply oxygen to working muscles during prolonged exercise. Also known as cardiovascular endurance, cardiorespiratory endurance, aerobic capacity, aerobic power. o Body Composition The relative percentage of fat and lean tissues (muscle, bone) to overall body weight. Examples are body fat percentage, bone mass, muscle mass. o Flexibility The maximum range of motion possible at various joints. o Muscular strength The maximum amount of force that can be produced by a single contraction of a muscle. Involve relatively low movement speeds (up to about 4 seconds) against a maximum resistance. Also known as low-speed muscular strength, maximum muscular strength. o Muscular endurance The ability of a muscle group to continue muscle movement over a length of time against a submaximal resistance. Also known as Local Muscular Endurance. Example are pushup tests, situp tests Skill-Related components - Aspects of fitness which form the basis for successful sport or activity participation. o Speed The ability to quickly cover a fixed distance in a straight line. Tests of speed are not usually conducted over distances greater than 200 m because longer distances reflect anaerobic or aerobic capacity more than absolute ability to move the body at maximal speed. o Agility The ability of the body to stop, start and change direction. Recently, the definition of agility has added the need for a response to a stimulus rather than simply a change of direction. o Balance The ability to maintain a desired posture while still or in motion o Coordination Integration with hand and/or foot movements with the input of the senses to perform a desired task o Reaction Time The time it takes to quickly respond to a stimulus such as pressing a button that lights up or catching a ruler. o Power The ability to do maximal muscle exertion at a high velocity. Tests involve maximal movement speeds lasting 1 second or less. Examples are vertical jump, standing long jump, power clean. Also known as high-speed muscular strength, explosive strength, and anaerobic power. o Anaerobic Capacity The maximal rate of energy production for moderate-duration activities. It is typically quantified as the maximal power output during muscular activity between 30 and 90 seconds using a variety of tests for the upper and lower body. It is characterized by the combined phosphagen and lactic acid energy systems. Stamina, although commonly (and mistakenly) used to refer to cardiovascular endurance, is endurance specific to a task, sport or activity which may include one or more combinations of the fitness components. As such, there is no universal definition for stamina.

SS 104 - Test & Measurement in Human Movement Lecture Notes (PDF)

Document Details

Tags

Related

Summary

Full Transcript