Mathematics in the Modern World - Module 5 PDF
Document Details
Uploaded by UnmatchedRhodochrosite7699
Batangas State University
Tags
Summary
This document is a module on data management in Mathematics, covering descriptive and inferential statistics, and measurement. It includes data types, scales of measurement, and key statistical concepts.
Full Transcript
Mathematics in the Modern World magnitude of difference between values. Module 5: Data Management Or...
Mathematics in the Modern World magnitude of difference between values. Module 5: Data Management Ordered but not equidistant. Lesson 5.1: The Data Example: Finishing positions in a race (1st, 2nd, 3rd). Descriptive Statistics ✓ Involves summarizing and organizing data so it can be easily understood. ▪ Interval Scale ✓ To describe the main features of a dataset Data with meaningful in quantitative terms. intervals between ✓ Tools: measurements but no true o Mean, median, mode (central zero point. tendency) Example: Temperature in o Range, variance, standard Celsius. deviation (variability) ✓ Example: Reporting the average ▪ Ratio Scale monthly income of a group of workers as P20,000. Data with a true zero point, allowing for the computation Inferential Statistics of ratios. ✓ Draws conclusions and makes Example: Height, weight. predictions about a population based on a sample. ✓ To infer properties of an entire population without surveying each member. Key Statistics in Statistics ✓ Tools: Population o Hypothesis testing ▪ The complete set of individuals o Confidence intervals or items being studied. ✓ Example: ▪ Example: All students enrolled in a university. Estimating the average income of all workers in a region by sampling a subset. Parameter Measurement ▪ A numerical value that - Quantifying observations according to a describes a characteristic of a rule. population. Type: ▪ Example: The average age of ▪ Variable all university students. An attribute that can vary (e.g., height, income). Sample ▪ Constant ▪ A smaller subset selected from An attribute that does not change the population. (e.g., a fixed number). ▪ Example: 100 students randomly chosen from a Scale of Measurement university. ▪ Nominal Scale Data categorized without a Statistics numerical value; used for ▪ A numerical value that labeling variables without describes a characteristic of a any quantitative value. sample. Categories are mutually ▪ Example: The average age of exclusive. the 100 sampled students. Example: Marital status (1 for single, 2 for married). ▪ Ordinal Scale Data that represents order or rank but does not show the Graphical Representation ▪ Graphs To visually summarize data to make interpretation easier. Types: o Bar Graphs: Used for categorical data. o Histograms: Used for continuous data to show frequency distribution. ▪ Score Distribution Organizing data from highest to lowest or into frequency distributions to show data behavior. Lesson 5.2: Measure of Central Tendency The Median The median represents the middle value The Mean when data is ordered from lowest to highest. If The mean is the arithmetic average of all there is an even number of observations, the data points. It is calculated by summing all the median is the average of the two middle values. values and dividing by the number of Unlike the mean, the median is not affected by observations. The mean is widely used and is outliers or skewed data, making it a better choice suitable when the distribution is balanced (not when the distribution is not balanced or when skewed). However, it can be heavily influenced extreme values are present. by outliers or extreme values, leading to a - How to Find: distorted picture. For example, if one income in ✓ Arrange the data in order. a data set is significantly higher than the others, ✓ If odd number of observations: middle the mean might suggest a higher overall income value. than is typical for the majority of the group. ✓ If even number: average of the two middle values. ∑𝒙 𝑴𝒆𝒂𝒏 = 𝑵 The Mode where ∑𝑥 is the sum of all scores and 𝑁 The mode is the most frequent value in a is the number of scores. data set. It is particularly useful for categorical data, where other measures of central tendency Example: (like the mean and median) are not applicable. For annual incomes: The mode is the quickest way to identify the most common value in a distribution. Appropriate use of Mean, Median, and Mode ✓ If the data is symmetrically distributed, the mean is a reliable measure of central tendency. ✓ If the data is skewed (either positively or negatively), the median is often preferred since it is less affected by extreme values. ✓ The mode is useful for identifying the most common value in categorical or frequency-based data. Effects of the scale of Measurement Used Interval data can accommodate all three measures of central tendency (mean, Without outliers: Mean = Php 190,083.00 median, mode). Mean of Skewed Distribution ▪ In a positively skewed distribution Ordinal data can use the median and mode, (where a few extremely high values but the mean is not appropriate due to its pull the tail to the right), the mean will reliance on numerical values. be greater than the median. ▪ In a negatively skewed distribution Nominal data can only be analyzed using (where a few very low values pull the the mode, as the values are categorical and tail to the left), the mean will be less do not have inherent numeric meaning. than the median. ▪ The median remains a more accurate reflection of the central value in skewed distributions because it is not influenced by extreme values. Lesson 5.3: Measures of Dispersion both indicate more dispersion, and smaller values indicate less Measure of Variability dispersion. Range The range is the simplest measure of Comparing Standard Deviation and Variance variability, calculated by subtracting the The standard deviation is generally more lowest score from the highest score in a interpretable because it is in the same units as the distribution. original data, while the variance is in squared 𝑹 = 𝑯𝒊𝒈𝒉𝒆𝒔𝒕 − 𝑳𝒐𝒘𝒆𝒔𝒕 units. ▪ While the range provides a Although the standard deviation is more simple overview of the spread, it intuitive, variance is often used in statistical only considers the two extreme testing because of its mathematical properties, values and can be heavily especially when comparing multiple datasets. affected by outliers. Adding a new extreme value could greatly change the range without affecting the overall distribution's spread. The Standard Deviation The standard deviation measures how much the individual scores deviate, on average, from the mean of the distribution. It takes into account every score in the distribution and provides a more accurate picture of variability than the range. ∑(𝑥 − 𝑥̅ )2 𝑆𝐷 = √ 𝑁 Where; x is the individual score 𝑥̅ is the mean 𝑁 is the number of scores Concepts ▪ A smaller standard deviation indicates that the data points are closer to the mean (homogeneous). ▪ A larger standard deviation indicates that the data points are more spread out from the mean (heterogeneous). ▪ A standard deviation of 0 means that all scores are identical. The Variance Variance is the square of the standard deviation. It measures how much the individual scores differ, on average, from the mean, but without taking the square root, which makes it useful in specific situations like statistical analysis and F-ratio tests. ∑(𝑥 − 𝑥̅ )2 𝑉= 𝑁 Variance and standard deviation are closely related: larger values for Lesson 5.4: Measures of Relative Position ▪ 𝝈 population standard deviation Central Tendency and Variability For a sample, the formula is slightly adjusted: Central tendency refers to the measure that 𝑿−𝒙 ̅ describes the center or average of a 𝒛= 𝒔 distribution, with the most common ▪ 𝑋 refers to raw score from the sample measure being the mean. ▪ ̅ pertains to the mean of the sample 𝒙 Variability refers to how spread out the data ▪ 𝒔 estimated standard deviation is, with the most common measure being the standard deviation. Example: Scores: Comparing Distributions o Physics: Score = 95, Mean = 85, SD = 10 In the four cases (A, B, C, D), presented are o Biology: Score = 85, Mean = 75, SD = 5 different relationships between the means and standard deviations of two distributions. This Calculate Z-scores: illustrates how comparing distributions can 95−85 become complex depending on the data. o Physics: 𝑍𝑃 = = 𝟏. 𝟎 10 Case Scenarios 85−75 o Case A: Similar means and standard o Biology: 𝑍𝐵 = = 𝟐. 𝟎 5 deviations. Interpretation: This shows that, even though your physics score is numerically higher, you performed better relative to your peers in Biology, as your z-score in Biology is higher. Percentile o Case B: Different means, similar Percentiles are points in a distribution that standard deviations. divide it into 100 equal parts. The p-th percentile is the score below which p% of the data falls. 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒔𝒄𝒐𝒓𝒆𝒔 𝒃𝒆𝒍𝒐𝒘 𝑿 𝑷=( ) × 𝟏𝟎𝟎 𝑻𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒔𝒄𝒐𝒓𝒆𝒔 o Case C: Same means, different standard deviations. Quartile Quartiles divide a dataset into four equal parts. The 1st quartile (Q1) is the 25th percentile, 2nd quartile (Q2) is the median (50th percentile), and 3rd quartile (Q3) is the 75th percentile. 𝑄1 = 0.25(𝑛 + 1) o Case D: Different means and standard 𝑄2 = 0.50(𝑛 + 1) deviations. 𝑄3 = 0.75(𝑛 + 1) Box-and-Whisker Plots A box-and-whisker plot visually represents the five-number summary of a dataset: the minimum, Q1, median (Q2), Q3, and maximum. This plot helps to visualize the The Z-Score distribution of data, especially in terms of A z-score standardizes a score by expressing it in the spread and central tendency. terms of how many standard deviations it is away from the mean. The formula for a z-score is: 𝑿−𝝁 𝒛= 𝝈 ▪ 𝑋 refers to the raw scores from the population. ▪ 𝝁 pertains to the mean of the population Lesson 5.5: Normal Distribution Case 2: Finding the Percentage Above the Given Normal Curve Z-score The normal curve, also known as the Gaussian distribution, is a theoretical model that depicts the distribution of data points. It is unimodal and symmetrical, meaning most of the data clusters around the center of the curve, with fewer points appearing as you move toward the tails. The X-axis represents the scores, and the Y-axis represents the frequency of those scores. The key characteristics include: ▪ Case 2(a) For a positive z-score, find the The mean, median, and mode are all equal area by subtracting the z-table and located at the center. value from 50%. It is perfectly symmetrical. ▪ Case 2(b) The curve is asymptotic, never touching the For a negative z-score, add the X-axis, and the total area under the curve z-table value to 50%. sums to 1 or 100%. Case 3: It is defined using the mean (μ) and standard Finding the Percentage Below the Given deviation (σ). Z-score Empirical Rule for a Normal Distribution ▪ Case 3(a) The empirical rule states that in a normal distribution: For a negative z-score, subtract the z-table value from 50%. o Approximately 68% of data lies within one standard deviation (±1σ) of the mean. ▪ Case 3(a) o Around 95% falls within two standard For a positive z-score, add the deviations (±2σ). z-table value to 50%. o About 99.7% lies within three standard Case 4: deviations (±3σ). Finding the Percentage Between Two Z- scores Z-scores A z-score represents how many standard deviations a data point is from the mean. It provides insight into the relative position of a score within the distribution. Case 1: Finding the Percentage Between the Z- score and the Mean ✓ Add the z-table values of both z-scores to find the area between them. Translating the raw score into the z-score The z-score can be calculated using the formula: ✓ For a z-score, find the area between 𝑿−μ 𝒛= the score and the mean using the z- σ table. where x is the raw score, μ is the mean, and σ is the standard deviation. Case A: Percentage Between the Raw Score and the Mean ✓ Convert the raw score to a z- score and look up the value in the z-table. Case B: Percentage Below the Raw Score ✓ Convert to a z-score and add 50% to the z-table value. Case C: Percentage Above the Raw Score ✓ Convert to a z-score and subtract the z-table value from 50%. Case D: Percentage Between Two Raw Scores ✓ Calculate the z-scores for both raw scores and add their respective z-table values. Lesson 5.6: The Linear Correlation: Pearson 𝑹 Lesson 5.7: The Least-Squares Regression Line The Pearson “𝑹” Linear Correlation Bivariate Scatter Plot ✓ A statistical tool used to measure the linear ✓ A bivariate relationship involves two relationship between two variables. variables: x (independent variable) and y ✓ Also known as “Product-Moment (dependent variable). Correlation Coefficient” ✓ Scatter Plot: A graphical representation ✓ Helps determine the strength and direction where each point represents a pair of values of the correlation, but does not imply (x, y). causation. ✓ Helps visualize the relationship between two variables and determine the strength of Formula for Pearson 𝒓 their correlation. ∑ 𝑋𝑌 Constructing a Scatter Plot − (𝑥̅ )(𝑦̅) 𝑟= 𝑁 o Plot each pair of data points (x, y) on a 𝑆𝐷𝑥 𝑆𝐷𝑦 graph. Variables in the formula: Example: The relationship between hours of study (x) o 𝑋: One variable and grade (y) is plotted. o 𝑌: Another variable o 𝑁: Number of data points o 𝑥̅ : mean of x o 𝑦 ̅ : mean of y o 𝑆𝐷𝑥 and 𝑆𝐷𝑦 : Standard deviations of 𝑋 and 𝑌. Steps to calculate: 1. Compute the means of 𝑋 and 𝑌. 2. Calculate the deviations of each 𝑋 and Regression Line 𝑌 from their means. ✓ The regression line (also known as the least- 3. Compute the sum of products of these squares line) is the straight line that best fits deviations. the data points. 4. Plug values into the formula. ✓ It minimizes the sum of the squares of the vertical deviations (distances) from each Range of values: data point to the line. 𝑟 ranges from -1 to +1. ✓ The goal is to find the line that best o +1 : Perfect positive correlation. represents the data. o -1 : Perfect negative correlation. o 0 : No correlation. Least-Squares Regression Line Formula The equation of the line is written as: 𝑦 = 𝑚𝑥 + 𝑏 o Where: ▪ m is the slope (rate of change). ▪ b is the y-intercept (value when x Guilford’s Interpretation for the values of 𝒓 = 0). The formulas to calculate m (slope) and b - < 0.20: Almost negligible relationship. (y-intercept): - 0.20 - 0.40: Small but definite relationship. o Slope 𝑚: - 0.40 - 0.70: Substantial relationship. - 0.70 - 0.90: Marked relationship. 𝑛(∑ 𝑥𝑦) − (∑ 𝑥 )(∑ 𝑦) 𝑚= - 0.90 - 1.00: Very dependable relationship. 𝑛(∑ 𝑥 2 ) − (∑ 𝑥 2 ) o Y-intercept 𝑏: ∑ 𝑦 − 𝑚(∑ 𝑥 ) 𝑏= 𝑛