ACM Lecture 1 2025 PDF

Document Details

Uploaded by Deleted User

2025

Tags

levels of measurement descriptive statistics statistical tests data analysis

Summary

This lecture provides an overview of levels of measurement, including nominal, ordinal, interval, and ratio scales. It also covers descriptive statistics like mean, median, mode, range, variance, and standard deviation. Finally, it introduces various statistical tests commonly employed in data analysis.

Full Transcript

Advanced Computer Module Lecture - 1 Class code 6aqohy2 Levels of measurement also called scales of measurement, tell you how precisely variables are recorded. In scientific research, a variable is anything that can take on different values across your data set (e.g., height or test...

Advanced Computer Module Lecture - 1 Class code 6aqohy2 Levels of measurement also called scales of measurement, tell you how precisely variables are recorded. In scientific research, a variable is anything that can take on different values across your data set (e.g., height or test scores). There are 4 levels of measurement: Nominal: the data can only be categorized Ordinal: the data can be categorized and ranked Interval: the data can be categorized, ranked, and evenly spaced Ratio: the data can be categorized, ranked, evenly spaced, and has a natural zero. Depending on the level of measurement of the variable, what you can do to analyze your data may be limited. There is a hierarchy in the complexity and precision of the level of measurement, from low (nominal) to high (ratio). Nominal data is the least precise and complex level. The word nominal means “in name,ˮ so this kind of data can only be labelled. It does not have a rank order, equal spacing between values, or a true zero value. Examples of nominal data At a nominal level, each response or observation fits only into one category. Nominal data can be expressed in words or in numbers. But even if there are numerical labels for your data, you canʼt order the labels in a meaningful way or perform arithmetic operations with them. In social scientific research, nominal variables often include gender, ethnicity, political preferences or student identity number. Nominal, ordinal, interval, and ratio data Going from lowest to highest, the 4 levels of measurement are cumulative. This means that they each take on the properties of lower levels and add new properties. Nominal level Examples of nominal scales You can categorize your data by labelling them in City of birth mutually exclusive groups, but there is no order Gender between the categories. Ethnicity Car brands Marital status Ordinal level Examples of ordinal scales You can categorize and rank your data in an Disease stages order, but you cannot say anything about the Language ability (e.g., intervals between the rankings. beginner, intermediate, Although you can rank the top 5 Olympic fluent) medallists, this scale does not tell you how close Likert-type questions or far apart they are in number of wins. (e.g., very dissatisfied to very satisfied) Interval level Examples of interval scales You can categorize, rank, and infer equal Test scores (e.g., IQ or intervals between neighboring data points, but exams) there is no true zero point. Personality inventories The difference between any two adjacent Temperature in temperatures is the same: one degree. But zero Fahrenheit or Celsius degrees is defined differently depending on the scale – it doesn’t mean an absolute absence of temperature. The same is true for test scores and personality inventories. A zero on a test is arbitrary; it does not mean that the test-taker has an absolute lack of the trait being measured. Ratio level Examples of ratio scales You can categorize, rank, and infer equal Height intervals between neighboring data points, and Age there is a true zero point. Weight A true zero means there is an absence of the Temperature in Kelvin variable of interest. In ratio scales, zero does mean an absolute lack of the variable. For example, in the Kelvin temperature scale, there are no negative degrees of temperature – zero means an absolute lack of thermal energy. Descriptive statistics Descriptive statistics help you get an idea of the “middleˮ and “spreadˮ of your data through measures of central tendency and variability. 1. Measures of Central Tendency: a. Mean: The arithmetic average of a set of values. It is calculated by adding up all the values and dividing by the number of values. b. Median: The middle value when the data is arranged in order. If there is an even number of values, the median is the average of the two middle values. c. Mode: The value that appears most frequently in a dataset. 2. Measures of Dispersion: a. Range: The difference between the maximum and minimum values in a dataset. b. Variance: A measure of how data points vary from the mean. It is the average of the squared differences from the mean. c. Standard Deviation: The square root of the variance. It provides a measure of the average distance between data points and the mean. 3. Measures of Shape: a. Skewness: A measure of the asymmetry of the data distribution. Positive skewness indicates a longer tail on the right, while negative skewness indicates a longer tail on the left. b. Kurtosis: A measure of the "tailedness" of the data distribution. It indicates whether the data has heavy or light tails compared to a normal distribution. 4. Percentiles: Percentiles divide the data into 100 equal parts. For example, the 25th percentile (also known as the first quartile) is the value below which 25% of the data falls. 5. Frequency Distributions: These display how often each value or range of values occurs in a dataset. Histograms and frequency tables are common ways to represent frequency distribution. 6. Measures of Association: a. Correlation: Measures the strength and direction of a linear relationship between two variables. The Pearson correlation coefficient is a commonly used measure. b. Covariance: Measures how two variables change together. It is related to correlation but not normalized. c. Cross-Tabulation Contingency Tables): Used to summarize and analyze the relationship between two categorical variables. Correlation Coefficient The correlation coefficient is the term used to refer to the resulting correlation measurement. It will always maintain a value between one and negative one. When the correlation coefficient is one, the variables under examination have a perfect positive correlation. In other words, when one moves, so does the other in the same direction, proportionally. If the correlation coefficient is less than one, but still greater than zero, it indicates a less than perfect positive correlation. The closer the correlation coefficient gets to one, the stronger the correlation between the two variables. When the correlation coefficient is zero, it means that there is no identifiable relationship between the variables. If one variable moves, it’s impossible to make predictions about the movement of the other variable. If the correlation coefficient is negative one, this means that the variables are perfectly negatively or inversely correlated. If one variable increases, the other will decrease at the same proportion. The variables will move in opposite directions from each other. If the correlation coefficient is greater than negative one, it indicates that there is an imperfect negative correlation. As the correlation approaches negative one, the correlation grows. When two variables move in the same direction, this is referred to as a positive, or proportional, relationship. This means that as one variable increases, the second variable also increases in value. The covariance value for a proportional relationship is always a positive number greater than 0. When two variables move in the opposite direction, this is known as a negative, or inverse, relationship. In these cases, one variable increases in value, while the second variable decreases. The covariance value for an inverse relationship is a negative number less than 0. 7. Summary Statistics: These are concise summaries of the main characteristics of a dataset and may include the minimum and maximum values, mean, median, quartiles, and other key statistics. 8. Box Plots Box-and-Whisker Plots): Graphical representations that display the median, quartiles, and potential outliers in a dataset. 9. Frequency Polygons and Cumulative Frequency Curves: Graphical representations that show the distribution of data values. 10. Measures of Position: a. ZScore: Indicates how many standard deviations a data point is from the mean. It is used to assess the relative position of a data point in a distribution. b. Percentile Rank: Shows the percentage of data points that are below a particular value. Types of variables In statistics and research, independent and dependent variables are terms used to describe the relationship between two or more variables in an experiment or study. These variables play distinct roles in understanding and analyzing the relationships and effects under investigation: 1. Independent Variable: a. Definition: The independent variable is the variable that is intentionally manipulated or changed by the researcher in an experiment or study. It is the presumed cause or factor that is being tested to see how it affects the dependent variable. b. Characteristics: It is often plotted on the x-axis of a graph. It is under the control of the researcher. c. Example: In a study examining the effect of different doses of a drug (independent variable) on blood pressure (dependent variable), the researcher administers varying doses of the drug to different groups of participants and 2. Dependent Variable: a. Definition: The dependent variable is the variable that is observed, measured, or recorded as an outcome or response in an experiment or study. It is the variable that researchers are interested in understanding how it is influenced by changes in the independent variable. b. Characteristics: It is often plotted on the y-axis of a graph. It is not directly controlled by the researcher; its values are the result of changes in the independent variable. c. Example: In the same drug study mentioned earlier, blood pressure (dependent variable) is measured after administering Confidence interval It provides a way to quantify the uncertainty or variability in your estimate based on a sample of data. The key components of a confidence interval: 1. Point Estimate: The point estimate is the single value that you calculate from your sample data and that you believe represents the population parameter you're interested in estimating. For example, if you're estimating the population mean based on sample data, your point estimate would be the sample mean. 2. Margin of Error: The margin of error is a range of values added to and subtracted from the point estimate. It quantifies the uncertainty in your estimate. The margin of error depends on the level of confidence you choose and the variability in your sample data. 3. Level of Confidence: The level of confidence represents the probability that the true population parameter falls within the calculated confidence interval. Commonly used levels of confidence are 90%, 95%, and 99%. A 95% confidence interval, for example, means that if you were to take many random samples and construct confidence intervals from each, you would expect about 95% of those intervals to contain the true population parameter. The formula for calculating a confidence interval depends on the type of parameter you're estimating (e.g., mean, proportion) and the underlying probability distribution (e.g., normal distribution, t-distribution) that applies to your data. a. Confidence Interval for Population Mean (With Known Population Standard Deviation): Confidence Interval = Sample Mean ± (Z * (Population Standard Deviation / √Sample Size)) where Z is the critical value from the standard normal distribution corresponding to the desired level of confidence. b. Confidence Interval for Population Mean (With Unknown Population Standard Deviation): Confidence Interval = Sample Mean ± (t * (Sample Standard Deviation / √Sample Size)) where t is the critical value from the t-distribution with (n-1) degrees of freedom, where n is the sample size. c. Confidence Interval for Population Proportion: Confidence Interval = Sample Proportion ± (Z * √((Sample Proportion * (1 - Sample Proportion)) / Sample Size)) where Z is the critical value from the standard normal distribution corresponding to the desired level of confidence. Statistical tests 1. TTest: The t-test is used to compare means between two groups. It's often used in clinical trials to compare the effectiveness of a treatment versus a control group. 2. ANOVA Analysis of Variance): ANOVA is used to compare means among more than two groups. It's used when there are multiple treatment groups in a study. 3. Chi-Square Test: The chi-square test is used to analyze categorical data, such as the frequency of disease occurrence in different groups. 4. Regression Analysis: Regression helps examine relationships between variables. Linear regression can be used to predict a continuous outcome variable based on one or more predictor variables. 5. Logistic Regression: This is used when the dependent variable is categorical, such as presence or absence of a disease. 6. Survival Analysis: Survival analysis is used to analyze time-to-event data, 7. Wilcoxon Rank-Sum Test Mann-Whitney U Test): This non-parametric test is used when assumptions of normality aren't met. It's used to compare two independent samples. 8. Kaplan-Meier Analysis: This is used for estimating survival probabilities over time. 9. Fisher's Exact Test: Similar to the chi-square test, Fisher's exact test is used for analyzing categorical data, particularly when sample sizes are small. 10. McNemar's Test: This is used to analyze paired categorical data, such as before-and-after treatment data. 11. Receiver Operating Characteristic ROC Curve Analysis: This is used to assess the accuracy of a diagnostic test by plotting sensitivity against 1-specificity. 12. Bayesian Analysis: This involves updating probabilities based on prior knowledge and new evidence. It's becoming more popular in medical

Use Quizgecko on...
Browser
Browser