Summary

This document provides an introduction to statistics and statistical thinking. It covers topics from descriptive and inferential statistics to observational study designs offering a foundational overview of the subject.

Full Transcript

Here is the conversion of the text from the images into a structured Markdown format, keeping all important facts, figures, and formulas. # STATISTICS DA 1-7 ## Define Statistics and Statistical Thinking Is the science of collecting, organizing, summarizing, and analyzing information to draw conc...

Here is the conversion of the text from the images into a structured Markdown format, keeping all important facts, figures, and formulas. # STATISTICS DA 1-7 ## Define Statistics and Statistical Thinking Is the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. In addition, statistics is about providing a measure of confidence in any conclusion. Results should be reported using some measure that represents how convinced we are that our conclusions reflect reality. Information is data > data varies. Variability (using same data different results but both correct) * **Population:** The entire group to be studied. * An individual is a person or object that is a member of the population being studied. * **Sample:** A subset of the population that is being studied. Descriptive statistics consists of organizing and summarizing data. Descriptive statistics describe data through numerical summaries, tables, and graphs. **Inferential statistics:** Uses methods that take a result from a sample, extend it to the population, and measure the reliability of the result. One goal of inferential statistics is to use statistics to estimate parameters. A parameter is a numerical summary of a population. Process of statistics ### Distinguish between Qualitative and Quantitative Variables * **Qualitative (or categorical) Variables:** Allow for classification of individuals based on some attribute or characteristic. * **Quantitative Variables:** Provide numerical measures of individuals. The values of a quantitative variable can be added or subtracted and provide meaningful results. 1. **Discrete variable:** A quantitative variable that has either a finite number of possible values or a countable number of possible values. The term countable means that the values result from counting, such as 0, 1, 2, 3, and so on. A discrete variable cannot take on every possible value between any two possible values. 2. **Continuous variable:** A quantitative variable that has an infinite number of possible values that are not countable. A continuous variable may take on every possible value between any two values. ### Determine the Level of Measurement of a Variable * **Nominal:** If the values of the variable name, label, or categorize. Does not allow for the values of the variable to be arranged in a ranked or specific order. * **Ordinal:** If it has the properties of the nominal level of measurement; however, the naming scheme allows for the values of the variable to be arranged in a ranked or specific order. Example: 70°F vs 10°F * **Interval:** If it has the properties of the ordinal level of measurement and the differences in the values of the variable have meaning. A value of zero does not mean the absence of the quantity. Arithmetic operations such as addition and subtraction can be performed on values of the variable. * **Ratio:** Level of measurement if it has the properties of the interval level of measurement and the ratios of the values of the variable have meaning. A value of zero means the absence of the quantity. Arithmetic operations such as multiplication and division can be performed on the values of the variable. ## 1.2 Observational Studies versus Designed Experiments ### Distinguish between an observational study and an experiment * **Observational study:** Measures the value of the response variable without attempting to influence the value of either the response or explanatory variables. That is, in an observational study, the researcher observes the behavior of the individuals without trying to influence the outcome of the study. * If a researcher assigns the individuals in a study to a certain group, intentionally changes the value of an explanatory variable, and then records the value of the response variable for each group, the study is a designed experiment. * **Confounding:** Confounding in a study occurs when the effects of two or more explanatory variables are not separated. Therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable or variables not accounted for in the study. * **Lurking variable:** Is an explanatory variable that was not considered in a study, but that affects the value of the response variable in the study. In addition, lurking variables are typically related to explanatory variables considered in the study. * **Confounding variable:** is an explanatory variable that was considered in a study whose effect cannot be distinguished from a second explanatory variable in the study. ### Explain the various types of observational studies 1. **Cross-sectional studies:** Collect information about individuals at a specific point in time or over a very short period of time. Cheap and quick to do. 2. **Case-control studies:** Retrospective, meaning that they require individuals to look back in time or require the researcher to look at existing records. In case-control studies, individuals who have a certain characteristic may be matched with those who do not. 3. **Cohort studies:** A cohort study first identifies a group of individuals to participate in the study (the cohort). The cohort is then observed over a long period of time. During this period, characteristics about the individuals are recorded and some individuals will be exposed to certain factors (not intentionally) and others will not. At the end of the study the value of the response variable is recorded for the individuals. #### Existing Sources of Data and Census Data A **census** is a list of all individuals in a population along with certain characteristics of each individual. ## 1.3 Simple Random Sampling ### Obtain a simple random sample * **Sampling:** * Random sampling is the process of using chance to select individuals from a population to be included in the sample. If convenience is used to obtain a sample, the results of the survey are meaningless. * A sample of size *n* from a population of size *N* is obtained through simple random sampling if every possible sample of size *n* has an equally likely chance of occurring. The sample is then called a simple random sample. ## 1.4 Other Effective Sampling Methods * **Stratified sample:** Obtained by separating the population into non-overlapping groups called *strata* and then obtaining a simple random sample from each *stratum*. The individuals within each stratum should be homogeneous (or similar) in some way. * **Systematic sample:** Obtained by selecting every *k*th individual from the population. The first individual selected corresponds to a random number between 1 and *k*. * **Cluster sample:** Selecting all individuals within a randomly selected collection or group of individuals. ### Steps in Systematic Sampling 1. If possible, approximate the population size, $N$. 2. Determine the sample size desired, $n$. 3. Compute $\frac{N}{n}$ and round down to the nearest integer. This value is $k$. 4. Randomly select a number between 1 and $k$. Call this number $p$. 5. The sample will consist of the following individuals: $p, p + k, p + 2k, ..., p + (n-1)k$ **CAUTION!** Stratified and cluster samples are different. In a stratified sample, we divide the population into two or more homogeneous groups. Then we obtain a simple random sample from each group. In a cluster sample, we divide the population into groups, obtain a simple random sample of some of the groups, and survey all individuals in the selected groups. **Convenience Sampling:** A convenience sample is a sample in which the individuals are easily obtained and not based on randomness. Self-selected (the individuals themselves decide to participate in a survey). Studies that use convenience sampling generally have results that are suspect. The results should be looked on with extreme skepticism. ## 1.5 Bias in Sampling ### Explain the Sources of Bias in Sampling * **Bias:** If the results of the sample are not representative of the population, then the sample has bias. There are three sources of bias in sampling: 1. **Sampling bias:** Favor one part of the population over another. Convenience sample. Due to undercoverage: when the proportion of one segment of the population is lower in a sample than it is in the population, incomplete or not representative of the population. 2. **Nonresponse bias:** Exists when individuals selected to be in the sample who do not respond to the survey have different opinions from those who do. Nonresponse can occur because individuals selected for the sample do not wish to respond or the interviewer was unable to contact them. 3. **Response bias:** Response bias exists when the answers on a survey do not reflect the true feelings of the respondent. Response bias can occur in a number of ways. * Interviewer Error * Misrepresented Answers * Wording of Questions * Ordering of Questions or Words * Type of Question * Data-entry error Can a Census Have Bias? Nonsampling errors result from undercoverage, nonresponse bias, response bias, or data-entry error. Such errors could also be present in a complete census of the population. Sampling error results from using a sample to estimate information about a population. This type of error occurs because a sample gives incomplete information about a population. ## 1.6 The Design of Experiments ### Describe the characteristics of an experiment An experiment is a controlled study conducted to determine the effect varying one or more explanatory variables or factors has on a response variable. Any combination of the values of the factors is called a *treatment*. In an experiment, the experimental unit is a person, object, or some other well-defined item upon which a treatment is applied. A control group serves as a baseline treatment that can be used to compare it to other treatments. A placebo is an innocuous medication, such as a sugar tablet, that looks, tastes, and smells like the experimental medication. In single-blind experiments, the experimental unit (or subject) does not know which treatment he or she is receiving. In double-blind experiments, neither the experimental unit nor the researcher in contact with the experimental unit knows which treatment the experimental unit is receiving. ### Explain the steps in designing an experiment 1. Identify the Problem to Be Solved. 2. Determine the Factors That Affect the Response Variable. 3. Determine the Number of Experimental Units. 4. Determine the Level of Each Factor. Control Randomize 5. Conduct the Experiment. Replication Collect and process data. 6. Test the claims (this is the subject of inferential statistics) * **Completely randomized design:** One in which each experimental unit is randomly assigned to a treatment. * **Matched-pairs design:** A matched-pairs design is an experimental design in which the experimental units are paired up. The pairs are selected so that they are related in some way (that is, the same person before and after a treatment, twins, husband and wife, same geographical location, and so on). There are only two levels of treatment in a matched-pairs design. # PART 2: DESCRIPTIVE STATISTICS ## 2.1 Organizing Qualitative Data Organize qualitative data in tables. Qualitative data provide measures that categorize or classify an individual. * **Frequency distribution:** Lists each category of data and the number of occurrences for each category of data. The relative frequency is the proportion (or percent) of observations within a category and is found using the formula: $Relative \ frequency = \frac{frequency}{sum \ of \ all\ frequencies}$ A relative frequency distribution lists each category of data together with the relative frequency. ### Construct bar graphs A bar graph is constructed by labeling each category of data on either the horizontal or vertical axis and the frequency or relative frequency of the category on the other axis. Rectangles of equal width are drawn for each category. The height of each rectangle represents the category's frequency or relative frequency. *Pareto chart:* A bar graph whose bars are drawn in decreasing order of frequency or relative frequency. *Side-by-Side Graph* ### Construct pie chart A pie chart is a circle divided into sectors. Each sector represents a category of data. The area of each sector is proportional to the frequency of the category. ## 2.2 Organizing Quantitative Data 1. Organize discrete data in tables 2. Construct histograms of discrete data A histogram is constructed by drawing rectangles for each class of data. The height of each rectangle is the frequency or relative frequency of the class. The width of each rectangle is the same, and the rectangles touch each other. 1. Organize continuous data in tables Classes are categories into which data are grouped. Interval represents a class, has lower and upper limit. Open ended either upper or lower end has no limit 1. Construct histograms of continuous data 2. Draw stem-and-leaf plots In a stem-and-leaf plot (or stem plot), use the digits to the left of the rightmost digit to form the stem. Each rightmost digit forms a leaf. ### Construction of a Stem-and-Leaf Plot 1. The stem of a data value will consist of the digits to the left of the right-most digit. The leaf of a data value will be the rightmost digit. 2. Write the stems in a vertical column in increasing order. Draw a vertical line to the right of the stems. 3. Write each leaf corresponding to the stems to the right of the vertical line. 4. Within each stem, rearrange the leaves in ascending order, title the plot, and include a legend to indicate what the values represent. One advantage of the stem-and-leaf plot over frequency distributions and histograms is that the raw data can be retrieved from the stem-and-leaf plot. When data is large, they lose the advantage. Split Stems: rather than using from 0.1 to 0.9 I can split them in half(0.1-0.5 and 0.6-0.9) 1. Draw dot plots We draw a dot plot by placing each observation horizontally in increasing order and placing a dot above the observation each time it is observed. A dot plot is shown here as horizontal number line from 1 to 11. The data points 2, 3, 4, 5, 6, 7, 8, 9 and 1 have dots above them in an increasing count of 1, 2, 3, 4, 8, 2, 1, 1 respectively, and the rest had none. 1. Identify the shape of a distribution Classified as *symmetric:* If we split the histogram down the middle, the right and left sides are mirror images. *Skewed left* or *skewed right*. The image illustrates a variety of distribution shapes: - **Uniform (Symmetric):** A histogram where all bars have roughly the same height (frequency). - **Bell-shaped (Symmetric):** A histogram resembling a bell curve, with the highest frequency in the center and tapering off symmetrically on both sides. - **Skewed Right:** A histogram with a long tail extending to the right. - **Skewed Left:** A histogram with a long tail extending to the left. * Draw time-series graphs Variable is measured at different points in time. A time-series plot is obtained by plotting the time in which a variable is measured on the horizontal axis and the corresponding value of the variable on the vertical axis. Line segments are then drawn connecting the points. ## 2.3 Graphical Misrepresentations of Data * Describe what can make a graph misleading or deceptive * Incorrect size of graph * READ PAGES 123-12 # 3: Numerically Summarizing Data ## 3.1 Measures of Central Tendency * Determine the arithmetic mean of a variable from raw data AVERAGE = MEAN The arithmetic mean of a variable is computed by adding all the values of the variable in the data set and dividing by the number of observations. The population arithmetic mean, (pronounced "mew"), is computed using all the individuals in a population. The population mean is a parameter. The sample arithmetic mean, $\bar{x}$ (pronounced "x-bar"), is computed using sample data. The sample mean is a statistic. If $x_1$, $x_2$, ..., $x_N$ are the $N$ observations of a variable from a population, then the population mean, $\mu$, is $\mu = \frac{x_1 + x_2 + ... + x_N}{N} = \frac{\Sigma x_i}{N}$ (1) * Determine the median of a variable from raw data The median of a variable is the value that lies in the middle of the data when arranged in ascending order. We use M to represent the median. * If the number of observations is odd, then the median is the data value exactly in the middle of the data set. That is, the median is the observation that lies in the $\large \frac{n+1}{2}$ position. * If the number of observations is even, then the median is the mean of the two middle observations in the data set. That is, the median is the mean of the observations that lie in the $\large \frac{n}{2}$ position and the $\large \frac{n}{2} + 1$ position. * Explain what it means for a statistic to be resistant: the mean is sensitive to extreme values while the median is not. A numerical summary of data is *said to be resistant* if extreme values (very large or small) relative to the data do not affect its value substantially. * Determine the mode of a variable from raw data The mode of a variable is the most frequent observation of the variable that occurs in the data set. If it occurs once, it has no mode. It can have more than one mode = bimodal. Can be multimodal. If data is nominal, we cannot determine the three M's. Look P148 for summary ## 3.2 Measures of Dispersion Dispersion is the degree to which the data is spread out. 1. Determine the range of a variable from raw data The range, $R$, of a variable is the difference between the largest and the smallest data value. That is, $Range = R = largest \ data \ value - smallest \ data \ value$ It is not resistant. * Determine the standard deviation of a variable from raw data It is Deviation about the mean, a value- the mean of the set values. The sum of all deviations about the mean must be zero. The *population standard deviation* of a variable is the square root of the sum of squared deviations about the population mean divided by the number of observations in the population, N. That is, it is the square root of the mean of the squared deviations about the population mean. The population standard deviation is symbolically represented by $\sigma$ (lowercase Greek sigma). $\sigma = \sqrt{\frac{(x_1 - \mu)^2 + (x_2 - \mu)^2 + ... + (x_N - \mu)^2}{N}} = \sqrt{\frac{\Sigma (x_i - \mu)^2}{N}}$ (1) where $x_1, x_2, ..., x_N$ are the $N$ observations in the population and $\mu$ is the population mean. The *sample standard deviation*, $s$, of a variable is the square root of the sum of squared deviations about the sample mean divided by $n - 1$, where n is the sample size. $s = \sqrt{\frac{(x_1 - \bar{x})^2 + (x_2 - \bar{x})^2 + ... + (x_n - \bar{x})^2}{n-1}} = \sqrt{\frac{\Sigma (x_i - \bar{x})^2}{n-1}}$ (3) where $x_1, x_2, ..., x_n$ are the $n$ observations in the sample and $\bar{x}$ is the sample mean. * Determine the variance of a variable from raw data The variance of a variable is the square of the standard deviation. The population variance is $\sigma^2$, and the sample variance is $s^2$ If you look carefully at Formulas (1) and (3), you will notice that the value under the radical represents variance. * Use the Empirical Rule to describe data that are bell-shaped #### The Empirical Rule If a distribution is roughly bell-shaped, then: * Approximately 68% of the data will lie within 1 standard deviation of the mean. That is, approximately 68% of the data lie between $\mu - 1\sigma$ and $\mu + 1\sigma$. * Approximately 95% of the data will lie within 2 standard deviations of the mean. That is, approximately 95% of the data lie between $\mu - 2\sigma$ and $\mu + 2\sigma$. * Approximately 99.7% of the data will lie within 3 standard deviations of the mean. That is, approximately 99.7% of the data lie between $\mu - 3\sigma$ and $\mu + 3\sigma$. Note: We can also use the Empirical Rule based on sample data with $\bar{x}$ used in place of $\mu$ and $s$ used in place of $\sigma$. * Use Chebyshev's Inequality to describe any set of data #### Chebyshev's Inequality For any data set or distribution, at least $(1 - \frac{1}{k^2}) \cdot 100\%$ of the observations lie within $k$ standard deviations of the mean, where $k$ is any number greater than 1. That is, at least $(1 - \frac{1}{k^2}) \cdot 100\%$ of the data lie between $\mu - k\sigma$ and $\mu + k\sigma$ for $k > 1$. Note: We can also use Chebyshev's Inequality based on sample data. ## 3.3 Measures of Central Tendency and Dispersion from Grouped Data * Approximate the mean of a variable from grouped data A class midpoint is the sum of consecutive lower class limits divided by 2. #### Approximate Mean of a Variable from a Frequency Distribution | Population Mean | Sample Mean | | :-------------------------------------------------------------------- | :---------------------------------------------------------------- | | $\mu = \frac{\Sigma x_i f_i}{\Sigma f_i} = \frac{x_1 f_1 + x_2 f_2 + ... + x_n f_n}{f_1 + f_2 + ... + f_n}$ | $\bar{x} = \frac{\Sigma x_i f_i}{\Sigma f_i} = \frac{x_1 f_1 + x_2 f_2 + ... + x_n f_n}{f_1 + f_2 + ... + f_n}$ | where $x_i$ is the midpoint or value of the $i$th class $f_i$ is the frequency of the $i$th class $n$ is the number of classes * Compute the weighted mean Data values have different weight The weighted mean, $\bar{x}_w$, of a variable is found by multiplying each value of the variable by its corresponding weight, adding these products, and dividing this sum by the sum of the weights. It can be expressed using the formula $\bar{x}_w = \frac{\Sigma w_i x_i}{\Sigma w_i} = \frac{w_1 x_1 + w_2 x_2 + ... + w_n x_n}{w_1 + w_2 + ... + w_n} $ (2) where $w_i$ is the weight of the $i$th observation $x_i$ is the value of the $i$th observation approximate the standard deviation of a variable p.143 Approximate Standard Deviation of a Variable from a Frequency Distribution Population Standard Deviation| Sample Standard Deviation ---|--- $\sigma = \sqrt{\frac{\Sigma (x_i - \mu)^2 f_i}{\Sigma f_i}}$ | $s = \sqrt{\frac{\Sigma (x_i - \bar{x})^2 f_i}{(\Sigma f_i) - 1}}$ (3) where $x_i$ is the midpoint or value of the $i$th class $f_i$ is the frequency of the $i$th class * * * ## 3.4 Measures of Position and Outliers 1. Determine and interpret $z$-scores The *$z$-score* represents the distance that a data value is from the mean in terms of the number of standard deviations. We find it by subtracting the mean from the data value and dividing this result by the standard deviation. There is both a population $z$-score and a sample $z$-score: $z = \frac{x - \mu}{\sigma}$ (Population z-score), $z = \frac{x - \bar{x}}{s}$ (Sample z-score) (1) The $z$-score is unitless. It has a mean of 0 and a standard deviation of 1. * Interpret percentiles * The median is a special case of a general concept called the percentile. * The $k$th percentile, denoted $P_k$, of a set of data is a value such that $k$ percent of the observations are less than or equal to the value. A percentile rank of 74% means that 74% of SAT Mathematics scores are less than or equal to 600 and 26% of the scores are greater. So 26% of the students who took the exam scored better than Jennifer. - Determine and interpret quartiles: Quartiles divide data sets into fourths, or four equal parts. The first quartile, $Q_1$, is equivalent to the 25th percentile, $P_{25}$. ### Finding Quartiles * Step 1: Arrange the data in ascending order. * Step 2: Determine the median, $M$, or second quartile, $Q_2$. * Step 3: Divide the data set into halves: the observations below (to the left of) $M$ and the observations above $M$. The first quartile, $Q_1$, is the median of the bottom half of the data and the third quartile, $Q_3$, is the median of the top half of the data. 1. Determine and interpret the interquartile range The *interquartile range*, IQR, is the range of the middle 50% of the observations in a data set. That is, the IQR is the difference between the third and first quartiles and is found using the formula $IQR = Q_3 - Q_1$ ### Summary: Which Measures to Report | Shape of Distribution | Measure of Central Tendency | Measure of Dispersion | | :---------------------- | :-------------------------- | :-------------------- | | Symmetric | Mean | Standard deviation | | Skewed left or skewed right | Median | Interquartile range | 1. Check a set of data for outliers ### Checking for Outliers by Using Quartiles 1. Determine the first and third quartiles of the data. 2. Compute the interquartile range. 3. Determine the fences. Fences serve as cut-off points for determining outliers. Lower fence $= Q_1 - 1.5(IQR)$ Upper fence $= Q_3 + 1.5(IQR)$ 4. If a data value is less than the lower fence or greater than the upper fence, it is considered an outlier. ## 3.5 The Five-Number Summary and Boxplots 1. Compute the five-number summary Five-Number Summary MINIMUM $Q_1$ M $Q_3$ MAXIMUM 1. Draw and interpret boxplots ### Drawing a Boxplot 1. Determine the lower and upper fences Lower fence = $Q_1 - 1.5(IQR)$ Upper fence = $Q_3 + 1.5(IQR)$ where $IQR = Q_3 - Q_1$ 2. Draw a number line long enough to include the maximum and minimum values. Insert vertical lines at $Q_1$, $M$, and $Q_3$. Enclose these vertical lines in a box. 3. Label the lower and upper fences. 4. Draw a line from $Q_1$ to the smallest data value that is larger than the lower fence. Draw a line from $Q_3$ to the largest data value that is smaller than the upper fence. These lines are called whiskers. 5. Any data values less than the lower fence or greater than the upper fence are outliers and are marked with an asterisk (\*). * * * ## Chapter 4 ## 4.1 The *response variable* is the variable whose value can be explained by the value of the explanatory or predictor variable. The distance traveled is response and the speed at which it is hit is the explanatory or predictor variable in golf example 1. Draw and interpret scatter diagrams A *scatter diagram* is a graph that shows the relationship between two quantitative variables measured on the same individual. Each individual in the data set is represented by a point in the scatter diagram. The *explanatory variable* is plotted on the horizontal axis, and the *response variable* is plotted on the vertical axis. Distinguish scatter diagrams that are linear relation, a nonlinear relation, or no relation. The image is a scatterplot titled Driving Distance vs Club-head Speed. The x-axis represents Club-head Speed (mph), ranging from 99 to 105. The y-axis represents Distance (yards), ranging from 255 to 280. There are six data points plotted on the graph. * If two variables are positively associated, then as one goes up the other also tends to go up. The image illustrates different types of scatterplots showing relationships between explanatory and response variables. The scatterplots show linear and nonlinear relationships, as well as the absence of any relation. 1. Describe the properties of the linear correlation coefficient #### The linear correlation coefficient or Pearson product moment correlation coefficient Is a measure of the strength and direction of the linear relation between two quantitative variables. The Greek letter p represents the population correlation coefficient, and r represents the sample correlation coefficient. We present only the formula for the sample correlation coefficient. #### Sample Linear Correlation Coefficient $ r = \frac{\Sigma (\frac{x_i - \bar{x}}{s_x})(\frac{y_i - \bar{y}}{s_y})}{n-1}$ (1) where $x_i$ is the $i$-th observation of the explanatory variable $\bar{x}$ is the sample mean of the explanatory variable $s_x$ is the sample standard deviation of the explanatory variable $y_i$ is the $i$-th observation of the response variable $\bar{y}$ is the sample mean of the response variable $s_y$ is the sample standard deviation of the response variable $n$ is the number of individuals in the sample #### Properties of the Linear Correlation Coefficient 1. The linear correlation coefficient is always between -1 and 1, inclusive. That is, -1≤ r ≤ 1. 2. If r = +1, then a perfect positive linear relation exists between the two variables. See the first scatterplot. 3. If r = -1, then a perfect negative linear relation exists between the two variables. See the fourth scatterplot. 4. The closer r is to +1, the stronger is the evidence of positive association between the two variables. 5. The closer r is to -1, the stronger is the evidence of negative association between the two variables. 6. If r is close to 0, then little or no evidence exists of a linear relation between the two variables. So r close to 0 does not imply no relation, just no linear relation. 7. The linear correlation coefficient is a unit-less measure of association. So the unit of measure for x and y plays no role in the interpretation of r. 8. The correlation coefficient is not resistant. Therefore, an observation that does not follow the overall pattern of the data could affect the value of the linear correlation coefficient. The image shows eight different scatter plots depicting different types of linear relationships between two variables. Each graph has a different visual pattern showing a variable degree and type of the relationship. For example, the first one is perfect positive linear relationship where r = 1. 1. Compute and interpret the linear correlation coefficient See example 2 p. 209 1. Determine whether a linear relation exists between two variables ### Testing for a Linear Relation 1. Determine the absolute value of the correlation coefficient. 2. Find the critical value in Table II from Appendix A for the given sample size. 3. If the absolute value of the correlation coefficient is greater than the critical value, a linear relation exists between the two variables. Otherwise, no linear relation exists. 4. Explain the Difference between Correlation and Causation\ A lurkin