Chapter 1 Total.pptx
Document Details
Uploaded by DelectablePoplar
Howard University
Tags
Related
Full Transcript
Because learning changes everything. ® Chapter One The Nature of Probability and Statistics Section 1 Descriptive and Inferential Statistics © McGraw Hill LLC. All rights reserv...
Because learning changes everything. ® Chapter One The Nature of Probability and Statistics Section 1 Descriptive and Inferential Statistics © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Learning Objectives Demonstrate knowledge of statistical terms. Differentiate between the two branches of statistics. © McGraw Hill LLC Monkey Business Images/Shutterstock 2 What is Statistics? Statistics is the science of conducting studies to. collect, organize, summarize, analyze, and. draw conclusions from data. In this section we will learn. The different branches of statistics. What are data. © McGraw Hill LLC 3 Variables and Data A variable is a characteristic or attribute that can assume different values. The values that a variable can assume are called data. A collection of data is a data set, and each individual value is called a data value or datum. A variable whose values are determined by chance are called random variables. An insurance company studies its records over the past several years (data) and determines that, on average, 3 out of every 100 automobiles insured were involved in accidents in a 1-year period. There is no way to predict the specific automobiles that will have accidents. The number of accidents in one year is a random variable. © McGraw Hill LLC 4 Populations A population consists of all subjects (human or otherwise) that are being studied. When data is collected from every subject in the population, it is called a census. The United States conducts a census every ten years as mandated in the Constitution, but it is a time-consuming and expensive process. Most of the time, it is not possible to use the entire population for a statistical study, therefore, researchers use samples. © McGraw Hill LLC 5 Samples A sample is a group of subjects selected from a population. If the subjects in the sample are properly selected, they will be representative of the population as a whole. This way, studying the sample helps us learn about the population of interest. If the subjects are not well selected, the sample will be biased because the subjects in the sample are not representative of the population as a whole. We will study how to properly select a sample in Section 1-3. © McGraw Hill LLC 6 Descriptive Statistics Descriptive statistics consists of the collection, organization, summarization, and presentation of data. The census is an example of descriptive statistics. Data are collected from everyone in the United States, from which average ages, household sizes, and other demographic information is determined. This information is presented in tables of values, but also in charts and graphs, in order to effectively summarize and present the information. © McGraw Hill LLC 7 Inferential Statistics Inferential statistics consists of generalizing from samples to populations, performing estimations and hypothesis tests, determining relationships among variables, and making predictions. After taking a sample, descriptive statistics is done to summarize and present the data collected about the sample. However, the goal of taking a sample is to draw conclusions about the population as a whole. In order to effectively make these inferences, we must understand probability, the chance of an event occurring. For example, suppose we are testing light bulbs for defects, and 3 out of the 10 we sample are defective. How likely is that to happen if the light bulbs are generally non-defective? © McGraw Hill LLC 8 Hypothesis Testing Hypothesis testing is a process for evaluating claims about a population, based on information obtained from a sample. When we tested the lightbulbs before, we may have assumed that 1% of all lightbulbs are defective, due to chance. However, 30% of the lightbulbs in our sample were defective! Hypothesis testing gives us a rigorous way to determine if we can conclude that the manufacturing process needs adjusting, based on this information. Statistics can also be used to find relationships between variables. For example, there is a relationship between the heights of parents and the heights of children: taller parents have taller children. © McGraw Hill LLC 9 Descriptive or Inferential? A pharmaceutical company wants to test a new drug to prevent heart attacks. They, in cooperation with a local hospital, find 300 people with heart disease, give half of them the new drug, and give the other half a placebo (a substance with no medical benefit or harm). 50 of the people who received the new drug had a heart attack in a 6 month period, versus 90 of the people who received the placebo. “The new drug prevented almost 50% of heart attacks in the sample” is descriptive statistics. The rate of heart attacks among people who got the new drug was 5/9 the rate among people who didn’t. “The new drug reduces the risk of heart attacks by 50%” is inferential statistics. It is generalizing the result from the sample to make a conclusion about the population. © McGraw Hill LLC 10 End of Main Content Because learning changes everything. ® www.mheducation.com © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Because learning changes everything. ® Chapter One The Nature of Probability and Statistics Section 2 Variables and Types of Data © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Qualitative and Quantitative Variables A qualitative variable is a variable that has distinct categories according to some characteristic or attribute. These are sometimes called categorical variables. Qualitative variables generally have non-numerical values. For example, the birth month (January, February, etc.), hometown, and favorite color of a person are qualitative. A quantitative variable is a variable that can be counted or measured. The values of quantitative variables are always numerical. For example, a person’s age, height, and weight are quantitative. Because the values are numerical, you can sort them from smallest to largest. © McGraw Hill LLC 13 Discrete and Continuous Variables Quantitative variables can be further classified as discrete or continuous. A discrete variable assumes values that can be counted, or assigned values such as 1, 2, 3 and so on. The number of children in a family, or the number of cars in a parking lot are discrete variables. A continuous variable assumes values within an interval, and can have infinitely many values between any two specific values. They are obtained by measuring. They often include fractions or decimals. Height is a continuous variable. Given any two people of different heights, you could find a person taller than one, but shorter than the other. © McGraw Hill LLC 14 Variables and Types of Data: Overview Access the text alternative for slide images. © McGraw Hill LLC 15 Discrete or Continuous? Which of these variables are discrete and which are continuous? Hours of YouTube watched per day. Number of books read in a year. Number of apples on a tree. Weight of a delivery truck. Hours of YouTube watched per day is continuous. Hours is measured, and the possible values could be fractional (4.3 hours, for example). Number of books is discrete, because it is counted (19 books, not 19.3 books). Number of apples on a tree is discrete, because it is counted. Weight of a deliver truck is continuous, because it is measured. © McGraw Hill LLC 16 Continuous Variables and Class Boundaries Continuous variables must be measured, and so the values must be rounded due to the limits of the measuring device. For example, weight could be rounded to the nearest pound. The boundary of a number is the class of values in which the data value would fall before being rounded. For example, the boundary of 112.3 pounds is 112.25– 112.35 pounds because any weight greater than or equals to 112.25 pounds and less than 112.35 pounds would get recorded as 112.3. Note that 112.35 would be rounded to 112.4 and so is in the class 112.35–112.45. © McGraw Hill LLC 17 Examples of Class Boundaries If the temperature outside is recorded as 73° Fahrenheit, the boundaries are 72.5°–73.5° Fahrenheit. If the length of a frog is recorded as 17.9 cm, the boundaries are 17.85 cm–17.95 cm. If a runner finishes a race with a time of 13:01.6, the boundaries are 13:01.55–13:01.65. © McGraw Hill LLC 18 Measurement Scales In addition to being classified as quantitative or qualitative, variables can be classified by how they are categorized, counted, or measured. There are four common measurement scales used to classify variables. Qualitative variables can have the nominal level of measurement or the ordinal level of measurement. Quantitative variables can have the interval level of measurement or the ratio level of measurement. © McGraw Hill LLC 19 Nominal and Ordinal The nominal level of measurement classifies data into mutually exclusive (non-overlapping) categories in which there is no natural order or ranking of the categories. A person’s favorite color is a nominal-level measurement. You could order colors (alphabetically, for example), but there is no significance to that ordering. The ordinal level of measurement classifies data into categories that can be ranked; however, precise differences between the ranks do not exist. A person could win first, second or third place in a race, but the difference between first and second place is not the same as the difference between second and third place. © McGraw Hill LLC 20 Interval and Ratio The interval level of measurement ranks data, and precise differences between units of measure do exist, but there is no meaningful zero. Ratios of values are not meaningful. Temperature is an interval-level measurement. 80° F is 20° warmer than 60° F, but 60° F is not twice as warm as 30° F, and 0° F does not mean zero temperature. The ratio level of measurement possesses all the characteristics of interval measurement, and there is a true zero. As a result, ratios of data values are meaningful. Weight is a ratio-level measure. 100 grams is twice as much as 50 grams, and something that weighs 0 grams weighs 0 using any other unit of measuring weight. © McGraw Hill LLC 21 Which Measurement Scale? Zip code is nominal-level data. Even though a Zip code is represented as a number, it doesn’t make sense to say 63021 is a greater Zip code than 11050. Pizza size is ordinal-level data. A small pizza is smaller than a medium pizza, which is smaller than a large pizza, but it doesn’t make sense to talk about the difference of a large and a medium pizza. SAT scores are interval-level data. It makes sense to say 1450 is 200 points more than 1250, but not to say that 1300 is twice as good as 650. Age is ratio-level data. There is an obvious age 0, regardless of units, and it makes sense to say a 30-year-old is twice as old as a 15-year-old. © McGraw Hill LLC 22 Measurement Scale Hierarchy Notice that the measurement scales improve on each other. Ratio-level data is interval-level data, interval-level data is ordinal-level data, and ordinal-level data is nominal-level data. Variable Nominal Ordinal Interval Ratio Level Hair Color Yes No Nominal Zip Code Yes No Nominal Table summarizes nominal, Letter Grade Yes Yes ordinal, interval, ratio, and No Ordinal level data for seven variables ACT Score Yes Yes listed in column 1. First and Yes No Interval second variables have no data for interval and ratio, Height Yes Yes and third variable has no data Yes Yes Ratio for ratio. Age Yes Yes Yes Yes Ratio Temperature (F) Yes Yes Yes No Interval © McGraw Hill LLC 23 Collecting Data Sometimes data can be collected by directly observing a situation, or by surveying existing records, or by conducting surveys. Surveys are a common method for collecting data. In a survey, subjects are asked a series of questions, and their answers are the data that are collected. Surveys are used in many different scenarios. Manufacturers use customer surveys to find out how to effectively market their products. Political surveys are used by news organizations and political campaigns to determine people’s preferred policies and politicians. © McGraw Hill LLC 24 Telephone Surveys Telephone surveys are less costly than in-person surveys, and people tend to be more candid during a telephone survey than an in-person survey. However, doing a telephone survey requires that your population of interest has telephones, and that they are willing to answer a phone call from an unknown number, which is increasingly unlikely. © McGraw Hill LLC 25 Mailed Questionnaire Surveys Mailed questionnaire surveys can be used to cover a wider area and contact more people than a telephone survey or personal interview. Respondents can remain anonymous, and they are even less costly than a telephone survey. However, response rates for mailed surveys are extremely low, and respondents could give responses that are inappropriate, due to misunderstanding the question or filling out the survey incorrectly. © McGraw Hill LLC 26 Personal Interview Surveys Personal interview surveys allow researchers to get in- depth answer to their questions, due to the face-to-face nature of the interview, and the ability to ask follow-up questions. Additionally, the response rate is much better than mail or telephone surveys because the subjects are there, in person. Conducting a personal interview survey is time-consuming and expensive. Interviewers must be carefully trained to ask the questions in the same way, every time, to prevent getting biased responses based on treating different subjects differently. © McGraw Hill LLC 27 Sampling Methods Researchers use samples to collect data and information about a large population. In order for the sample to be representative of the population, so that the sample can be used to make valid inferences, the sample must be unbiased. A good way to obtain an unbiased sample is to use chance to pick the subjects in the sample. Four common sampling methods are. Random Sampling. Systematic Sampling. Stratified Sampling. Cluster Sampling. © McGraw Hill LLC 28 Random and Systematic Samples A random sample is a sample in which all members of the population have an equal chance of being selected. For example, assign a number to each member of your population, randomly choose 100 numbers, and make your sample from the 100 subjects corresponding to those numbers. A systematic sample is a sample obtained by selecting every kth member of the population, where k is a counting number. For example, giving a customer survey to every tenth person to enter a grocery store. © McGraw Hill LLC 29 Stratified and Cluster Samples A stratified sample is a sample obtained by dividing the population into subgroups or strata according to some characteristic relevant to the study. Then subjects are selected at random from each subgroup. A population is broken down into three age groups (18 to 30, 30 to 65, 65 and older) and then each age group is further broken down by gender. Then, a random sample is taken from each group: Men 18 to 30, Women 18 to 30, etc. This is a stratified sample. A cluster sample is obtained by dividing the population into sections or clusters and then selecting one or more clusters at random, and using every member of the selected cluster or clusters as the sample. A population is broken down by state, and then by county with in each state. 30 counties are selected at random, and every person in those counties is in the sample. This is a cluster sample. © McGraw Hill LLC 30 Other Sampling Methods A convenience sample is a sample chosen out of convenience, with no randomization. For example, a restaurant manager asks everyone at the restaurant one evening to rate the service on a scale from 1 to 5. There is no randomization, and all the members of the sample were at the restaurant at a specific time, so the sample is probably not representative of the customer base of the restaurant as a whole. A volunteer or self-selected sample is when the subjects decide if they want to participate in the study or not, rather than being selected by the researcher. People who volunteer for studies are obviously different from people who don’t, so the sample is not representative. © McGraw Hill LLC 31 Sampling and Non-sampling Error Sampling error is the difference between the results obtained from a sample and the results obtained from the population from which the sample was selected. For example, if a population is 50% men and 50% women, and a random sample is 46% women and 54% men, this is sampling error. Nothing was done incorrectly, it’s just due to chance. Non-sampling error occurs when the data are obtained erroneously or the sample is biased. For example, a scale could be miscalibrated, so all the measurements are off by one pound. Or, the data are collected correctly, but recorded incorrectly, due to human error. © McGraw Hill LLC 32 Observational Studies In an observational study, the researcher merely observes what is happening or what has happened in the past and tries to draw conclusions based on these observations. Examples of observational studies are watching the flow of traffic at a busy intersection over the course of a month, or doing historical research into the relationship between the introduction of indoor plumbing and improvements in public health in communities in the 1800s. © McGraw Hill LLC 33 Observational Studies: Advantages Observational studies occur in a natural setting. The artificial setting of an experimental study can make people act differently than they normally would. It is unethical or impossible to study some things using experiments. Researchers cannot ethically study murder or suicide in an experimental setting. If researchers wanted to determine the effect of left or right- handedness, they cannot randomly assign subjects to a left- handed group or a right-handed group. © McGraw Hill LLC 34 Observational Studies: Disadvantages Researchers do not have direct control over the variables, so it is not possible to definitely establish a cause-and- effect relationship. Having to travel to a distant location to perform the observations can be more expensive and time consuming than having subjects come to a research lab. Observational studies that depend on data collected by third parties are subject to the inaccuracies of that data. The researchers cannot ensure the accuracy of the results because they are not collecting the data themselves. © McGraw Hill LLC 35 Experimental Studies In an experimental study, the researcher manipulates one of the variables and tries to determine how the manipulation influences other variables. In a true experimental study, subjects should be randomly assigned to groups, and then treatments should be randomly assigned as well. In some settings, random assignment to groups is not possible. In this case, a quasi-experimental study is done instead. Treatments are still randomly assigned to the pre-existing groups. © McGraw Hill LLC 36 Independent and Dependent Variables The independent variable (or explanatory variable) in an experimental study is the one that is being manipulated by the research. The resultant variable is called the dependent variable (or outcome variable). In an experimental study, researchers are trying to determine how changing the independent variable affects the dependent variable, and how powerful that effect is, if it exists. © McGraw Hill LLC 37 Example: Medical Research Recall our experimental study of the efficacy of a heart attack preventative. Half of the sample received the new drug and half of the sample received the old drug. Then, the number of heart attacks in each group was recorded. The independent variable in this study is which treatment a subject got: the new drug or the old drug. The dependent variable in this study is whether the subject had a heart attack or not in the next six months. © McGraw Hill LLC 38 Treatment and Control The heart disease study has two groups of subjects: the subjects who got the drug, and the subjects who got the placebo. The treatment group of an experimental study is the group that receives the new drug or treatment that the study is evaluating. The control group of an experimental study does not receive any special treatment. By comparing the differences between the treatment group and the control group, researchers can determine how effective the treatment is relative to doing nothing. © McGraw Hill LLC 39 Experimental Studies: Advantages In an experimental study, researchers have a great deal of control on the subjects, how they are assigned to groups, and how the groups differ. Researchers can also precisely change the independent variable, and then measure the effects of this change. This makes it easier to precisely understand how the independent variable affects the dependent variable. © McGraw Hill LLC 40 Experimental Studies: Disadvantages Experiments are done in an unnatural setting. Results found in an experimental study may not be apparent in the real world. An example of this is the Hawthorne effect. Subjects who know they are participating in an experiment will change their behavior in ways that affect the results of the study. Another problem is confounding variables: variables that affect the dependent variable and are not sufficiently separated from the independent variable. For example, subjects put on an exercise program might also change their diet, or engage in other healthier habits, so that improvements to their health could be due to the exercise, or to the other changes. © McGraw Hill LLC 41 The Placebo Effect Remember that a placebo is a medical treatment that has no benefit or harm. The placebo effect is a favorable response or improvement by subjects in a study due to the fact that they are participating in a study. Example: A study of the efficacy of antidepressants divided the sample into three groups. The treatment group gets the antidepressant, a control group gets nothing, and the placebo group gets a sugar pill (a placebo). Researchers find that the treatment group improves relative to the control group. However, the placebo group also improves relative to the control group, and the placebo group and the treatment group both improve roughly the same amount. The improvement in the placebo group is an example of the placebo effect. © McGraw Hill LLC 42 Blinding The placebo group was given a sugar pill instead of nothing. This way, a subject cannot tell if they are in the treatment group or the placebo group (subjects in the control group, who received no treatment, know they are not being treated). A blinded study is one where the subjects do not know if they are in the treatment or control groups. Using a placebo blinds a subject to which group they’re in. A double-blinded study is one where neither the subjects nor the researchers know which group the subjects are in. This prevents the researchers from being biased in their data collection. © McGraw Hill LLC 43 Blocking and Completely Randomized Design Researchers have reason to believe men and women will react differently to the heart attack prevention drug they are testing. To control this confounding variable, they use blocking: They separate the sample into two blocks (men and women) and then split each block into treatment and control groups. This way, they can separate the effects of gender from the effects of the drug on preventing heart attacks. If subjects are assigned to groups randomly, and the treatments are assigned randomly, the experiment has a completely randomized design. © McGraw Hill LLC 44 Matched-Pair Design Researchers have reason to believe that a person’s weight affects their risk of heart attack. Instead of using blocking to ensure that the treatment and control groups have similar distributions of weight, the researchers use a matched-pair design. The subjects in the sample are sorted by weight, and then assigned to the treatment and control groups in pairs, so that each pair has roughly the same weight. This ensures that the treatment and control groups have almost identical distributions of weight. Matched-pair designs are a useful alternative to blocking when the confounding variable has too many possible values to separate the subjects into blocks. © McGraw Hill LLC 45 Replication An important way to validate studies is to try to replicate them. Here a study is repeated by different researchers using the same procedure, and the results of the new study are compared to the results of the old study. If a study can be replicated, then researchers can be confident that its results are real and are not due to statistical errors. If a study cannot be replicated, this suggests that there was a flaw in the original experiment. There may have been a confounding variable that the researchers didn’t control, or the researchers may have simply made errors in their data collection or analysis. © McGraw Hill LLC 46 Guidelines for Experimental Design 1. Formulate the purpose of the study. 2. Identify the variables for the study. 3. Define the population. 4. Decide what sampling method you will use to collect the data. 5. Collect the data. 6. Summarize the data and perform any statistical calculations needed. 7. Interpret the results. © McGraw Hill LLC 47 Experimental Design: Example Researchers randomly assign 20 people to each of two different groups. Group 1 watches a motivational video about succeeding despite adversity. Group 2 sits in a quiet room for 15 minutes. Both groups then solve a Sudoku puzzle, and the time required for them to complete it is recorded. Was this an experimental or observational study? What is the independent variable? What is the dependent variable? What are possible confounding variables? © McGraw Hill LLC 48 Experimental Design: Solutions This is an experimental study. The researchers randomly assign the subjects to groups and then randomly give them the treatment (the video) or the control (the quiet room). The independent variable is whether or not the subjects watched the video. The dependent variable is the time required to complete the puzzle. Confounding variables could include education level, familiarity with Sudoku-solving techniques, or skill at mathematics. Random assignment should eliminate the effects of these factors. © McGraw Hill LLC 49 Uses and Misuses of Statistics 1 Suspect samples: Very small samples, or convenience or volunteer samples, can lead to deceptive results. “Three out of four doctors recommend this product” is less impressive if only four doctors were asked! Ambiguous averages: “Average” can refer to the mean, median, mode, or midrange. Calculating all these averages and picking the most impressive one is deceptive. Changing the subject: Using different values to represent the same information can change how it is perceived. 0.1% of GDP sounds much smaller than $20 billion. © McGraw Hill LLC 50 Uses and Misuses of Statistics 2 Detached Statistics: Statistics without a comparison are deceptive. If a product has “20% less fat”, what product is it being compared to? Implied Connections: A breakfast cereal says it “may help prevent heart disease”, but there’s no reason to believe it actually does! Misleading Graphs: Graphs with misleading or mislabeled axes can cause readers to misinterpret the results being summarized. Faulty Survey Questions: Questions can be vague, or phrased in a biased way that encourages a specific response. © McGraw Hill LLC 51