Lecture#6 - Measurement in Kinesiology PDF
Document Details
Uploaded by SaneRiemann
University of Windsor
2023
Andrew S. Perrotta
Tags
Summary
This lecture discusses the importance of measurement in kinesiology research. It defines validity and reliability, examines the validity and reliability of exercise equipment, and introduces statistical methods for assessing these concepts. The lecture also includes examples of different types of validity and reliability, as well as case studies on specific technologies.
Full Transcript
Measuring Research Variables Chapter 11 Dr. Andrew S. Perrotta University of Windsor Faculty of Human Kinetics | Department of Kinesiology ASP|2023 Learning Objectives I. Define various forms of validity and reliabili...
Measuring Research Variables Chapter 11 Dr. Andrew S. Perrotta University of Windsor Faculty of Human Kinetics | Department of Kinesiology ASP|2023 Learning Objectives I. Define various forms of validity and reliability II. Examine the validity and reliability of exercise equipment and fitness assessments III. Introduction to statistics used to examine validity and reliability IV. Analyze raw data for assessing validity and reliability ASP|2023 Why Measure? “Don’t assume, measure!” Dr. Benno Nigg University of Calgary Founder: Human Performance Laboratory ASP|2023 Why Measure? Developing an exercise program for a patient must be based on scientific evidence using quantitative assessments. The difference between winning and losing a medal is a fine line. Performance must be examined using accurate equipment. ASP|2023 Accuracy Validity: The degree to which a test or instrument measures what it purports to measure. Types of Validity I. Criterion Validity II. Logical (Face) Validity III. Content Validity IV. Construct Validity ASP|2023 Accuracy Criterion validity (Gold Standard) The degree to which scores on a test are related to some recognized standard or criterion. I. Concurrent validity: Involves correlating a “new” instrument to a criterion instrument. II. Predictive validity: The degree to which scores of a predictor variable can accurately predict criterion scores. ASP|2023 Concurrent Validity ECG = Criterion Measure “Gold Polar® H10 Monitor = Practical & Standard” Applied Usage Pereira, R. D. A., Alves, J. L. D. B., Silva, J. H. D. C., Costa, M. D. S., & Silva, A. S. (2020). Validity of a smartphone application and chest strap for recording RR intervals at rest in athletes. International journal of sports physiology and performance, 15(6), 896-899. ASP|2023 Concurrent Validity (Specific to Activity) ECG = Criterion Measure “Gold Polar® H10 Monitor = Practical & Standard” Applied Usage Gilgen-Ammann, R., Schweizer, T., & Wyss, T. (2019). RR interval signal quality of a heart rate monitor and an ECG Holter at rest and during exercise. European journal of applied physiology, 119(7), 1525–1532. ASP|2023 Concurrent Validity (Specific to Environment) Apple Watch® = User Polar® T31 Monitor = “Gold Friendly Standard” Held, N. J., Perrotta, A. S., Mueller, T., & Pfoh-MacDonald, S. J. (2022). Agreement of the Apple Watch® and Fitbit Charge® for recording step count and heart rate when exercising in water. Medical & Biological Engineering & Computing, 60(5), 1323-1331. ASP|2023 Predictive Validity DEXA= Criterion Measure “ NEW Gold Standard” Hydrostatic Weighing = Criterion Measure “ OLD Gold Standard” ASP|2023 Accuracy Logical (Face) validity The degree to which the intended measure, is actually being measured. Is the assessment VALID by definition ASP|2023 Vertical Jump = Force Logical (Face) Validity Plates (cm | W) Running Speed = Photo Timing Gates (5m | 10m | 30m) ASP|2023 Accuracy Content validity The degree to which a test adequately samples what was collected/covered from the assessment. Is the assessment representative of ALL ASPECTS of the CONSTRUCT. Definition of Construct = “to make or form by combining or arranging parts or elements” Staff, M. W. (2004). Merriam-Webster's collegiate dictionary (Vol. 2). Merriam-Webster. ASP|2023 Accuracy Construct validity is the degree to which scores from a test measure a hypothetical construct and is usually established by relating the test results to some behavior. Example: Certain behaviours are expected from someone who has poor sleep quality…. I. Snoring II. Problems falling asleep III. Problems staying asleep ASP|2023 Construct Validity Samuels, C., James, L., Lawson, D., & Meeuwisse, W. (2016). The athlete sleep screening questionnaire: a new tool for assessing and managing sleep in elite athletes. British journal of sports medicine, 50(7), 418-422. ASP|2023 Reliability Reliability pertains to the consistency or repeatability of a measure. Test scores can be reliable yet NOT valid. ASP|2023 Components of Reliability I. Observed score The obtained score that comprises a person’s true score and error score. II. True score The part of the observed score that represents the person’s real score and does not contain measurement error. III. Error score The part of an observed score that is attributed to measurement error. ASP|2023 Professional Certification (A Know Error) International Society for the Advancement of Kinanthropometr ISAK has developed international standards for anthropometric assessment and an international anthropometry accreditation scheme (IAAS). ASP|2023 Applied Perspectives Around Reliability A test may be valid, but NOT reliable. Too much NOISE (i.e. variation in performance) that prevents any confidence to assess change in athletic performance. Test to test variation can reach > 6% in performance! Must standardize testing environment to reduce noise/variation Ex. Surface, ambient temperature, shoes, fatigue level Morral-Yepes, M., Moras, G., Bishop, C., & Gonzalo-Skok, O. (2020). Assessing the Reliability and Validity of Agility Testing in Team Sports: A Systematic Review. Journal of Strength and Conditioning Research. ASP|2023 Applied Perspectives Around Reliability Most of the error in the reliability of an assessment is NOT technical when using quality, un-assisted equipment. Ex. Brower speed gates are accurate ( < ± 0.05 sec). Shalfawi, S., Ingebrigtsen, J., Rodahl, S., Enoksen, E., & Tonnessen, E. (2010, June). Validity and reliability of the Brower timing system speed trap II. In 15th Annual ECSS Congress, Antalya, Turkey (pp. 23-26). ASP|2023 Applied Research Questions 1. Can we examine the reliability of equipment that measures physiological function and human performance ? Ex. Timing gates, Force Plates, Heart Rate monitor ASP|2023 Statistics for Examining Validity & Reliability Tests for Validity I. t Test & ANOVA II. Effect Size (i.e. Cohen’s d) III. Bland Altman Analysis (Limits of Agreement) Tests for Reliability IV. Test-retest Method (Correlation = R2) V. SEM (Standard Error of Measurement) ASP|2023 Statistics for Assessing Validity Case Study#1 Examine the validity and reliability of the EXSURGO Technologies® G-flight to determine vertical jump. Reason for Study: G-Flight’s are easy to transport and can be used on-field or on-court. ASP|2023 Statistics for Assessing Validity Step#1 – Establish Criterion (Force Plates) Step#2 – Use equipment simultaneously Step#3 – Choose Methods 20 participants Perform 3 jumps each (CMJ & VJ) 30-60 sec rest between jumps Watkins, C. M., Maunder, E., Tillaar, R., & Oranchuk, D. J. (2020). Concurrent Validity and Reliability of Three Ultra- Portable Vertical Jump Assessment Technologies. Sensors (Basel, Switzerland), 20(24), 7240. ASP|2023 Statistics for Assessing Validity Dependent (Paired) t test A test of the significance of differences between the means of two sets of scores that are related. MD = Mean difference between criterion & G UD = Null hypothesis = 0 (i.e. there is no di SMD = Standard deviation between the diffe ASP|2023 Students t Test Student t-test Table for calculating significance William Sealy Gosset, ASP|2023 Statistics for Assessing Validity Two-tailed t test A test that assumes the difference between the two means could go in two directions (+ ‘or’ -). One-tailed t test A test that assumes the difference between the two means will go in one direction only. * Unless you know the direction of the difference when comparing equipment, choose a two-tailed * ASP|2023 Statistics for Assessing Validity.. Forc G- e Fligh Plat t e p < 0.05 ASP|2023 Statistics for Assessing Validity Meaningfulness The importance or practical significance of an effect or relationship. Effect Size (ES) The standardized value that is the difference between the means divided by the standard deviation. Effect size gives us an indication of the magnitude of the effect/difference. M1 = Criterion mean of scores M2 = G-flight mean of scores s = Pooled standard deviation of combines scores Cohen, J. (1988). The effect size. Statistical power analysis for the behavioral sciences, 77-83. ASP|2023 Effect Size (Meaningfulness) *An Effect Size (ES) quantifies the difference between groups using standard deviations* ES Interpretation Trivial < 0.20 Small 0.20 – 0.49 Moderate 0.50 – 0.79 Large > 0.80 Adapted from Hopkins et al. 2009. Cohen, J. (1988). The effect size. Statistical power analysis for the behavioral sciences, 77-83. ASP|2023 Statistics for Assessing Validity Bland and Altman (BA) Plots A graph/plot to describe agreement between two quantitative measurements. Force Plate (cm) – G-Flight An ideal situation is exactly the same results between G-flight and force plates. Allows us to examine if the G- flight under or over estimates jump height using the mean bias. (cm) 95% of the differences must fall insides the limits of agreement (i.e. 2SD of the mean) Force Plate Jump Height (cm) Giavarina, D. (2015). Understanding bland altman analysis. Biochemia medica, 25(2), 141- 151. ASP|2023 Statistics for Assessing Reliability Test re-test Method A correlation coefficient (ex. Pearson product moment coefficient) is the most used method for calculating the correlation between two variables. This coefficient is a bivariate statistic, meaning that it is used to correlate two variables (Force Plate ‘vs’ G- Flight) When a test is given twice, the scores on the first test are correlated with the scores on the second test to determine their degree of consistency. ASP|2023 Test re-test Method Two correlation tests must be conducted to examine the difference between each coefficient. Test #1 Test #2 Test Test # 1 # 2 Force G- Differen Force Differen G-Flight Plate Flight ce Plate ce (cm) (cm) (cm) (cm) (cm) (cm) 23 25 -2 20 22 -2 34 37 -3 30 32 -2 54 58 -4 52 55 -3 46 48 -2 44 47 -3 32 33 -1 30 31 -1 31 31 0 28 28 0 Mean 36.7 38.7 -2.0 34.0 35.8 -1.8 SD 11.3 12.2 1.4 11.7 12.5 1.2 ASP|2023 180 Test re-test Method 160 G-Flight Vertical Jump (cm) 140 120 Perform TWO tests and find the correlation between 100 scores. 80 r = 0.99 * r value must be > 60 0.80 * SE = 0.50 (cm) 40 40 60 80 100 120 140 160 180 200 Force Plate Vertical Jump (cm) R2 = Coefficient of determination (% of variation between each variable than can be explained) r = Correlation coefficient (the strength of the relationship) 0.2 – 0.49 = Small | 0.50 – 0.79 = Medium | > 0.80 =Large Coolican H. Research methods and statistics in psychology. London: Hodder and Stoughton, 1994 ASP|2023 Standard Error of Measurement (SEM) SEM Although a test never yields a person’s true score, test scores should be considered as within the range containing the true score. No guideline on an acceptable SEM value. Must develop your own! (SE M) S = standard deviations of scores r = correlation coefficient ASP|2023 Standard Error of Measurement (SEM) S = 5.5 (SE cm M) r =.87 = 5.5 = 5.5 = 5.5 x (0.36) SEM = 1.98 cm (G-Flight) ASP|2023 Make it Applicable We can monitor an athlete’s vertical jump with 95% confidence by multiplying our SEM with a z-statistic (i.e. standard deviation of 1.96) M (95% CI) x = 1.98 x 1.96 (SD) = 3.8 cm * When an athlete’s vertical jump increases or decrease more than >3.8cm, we can be 95% certain the score is not a measurement error. * ASP|2023