PSY 112 Introduction to Statistics PDF
Document Details
Romeo Padilla School of Education and Arts
Tags
Summary
This document provides an introduction to probability and statistics, including descriptive and inferential statistics, variables, data collection, and experimental design. It also discusses computers and calculators in statistics.
Full Transcript
Romeo Padilla – School of Education and Arts Bachelor of Arts in Psychology 1 PSY 112 (Introduction to Statistics) Module for AB...
Romeo Padilla – School of Education and Arts Bachelor of Arts in Psychology 1 PSY 112 (Introduction to Statistics) Module for ABPSYCH 1 Module Title: The Nature of Probability and Statistics I. OUTLINE A. Descriptive and Inferential Statistics B. Variables and Types of Data C. Data Collection and Sampling Techniques D. Experimental Design E. Computers and Calculators II. OBJECTIVES After completing this module, you should be able to: 1. Demonstrate knowledge of statistical terms. 2. Differentiate between the two branches of statistics. 3. Identify types of data. 4. Identify the measurement level for each variable. 5. Identify the four basic sampling techniques. 6. Explain the difference between an observational and an experimental study. 7. Explain how statistics can be used and misused. 8. Explain the importance of computers and calculators in statistics. III. INTRODUCTION You may be familiar with probability and statistics through radio, television, newspapers, magazines, and the Internet. For example, you may have read statements like the following found on social media. If you work indoors, you need to work out 30 minutes longer to get the same benefits of working outdoors. A study by Wayne State University found that older drivers are much worse than younger drivers when texting while driving. A bipolar disorder results in, on average, 9.2 years’ reduction in the expected life span of those who have the disorder as compared to those who do not suffer from this disorder. A recent study found that people who are in a close relationship may have a lower risk of heart disease than those who are in a negative relationship. A survey by Cengage found that 43% of college students say that they have skipped meals in order to afford the cost of college course materials. Forty-nine percent of U.S. adults think that they could become a victim of identity theft. Statistics is the science of conducting studies to collect, organize, summarize, analyze, and draw conclusions from data. Statistics is used in almost all fields of human endeavor. In sports, for example, a statistician may keep records of the number of yards a running back gains during a football game, or the number of hits a baseball player gets in a season. In other areas, such as public health, an administrator might be concerned with the number of residents who contract a new strain of flu virus during a certain year. In education, a researcher might want to know if new methods of teaching are better than old ones. These are only a few examples of how statistics can be used in various occupations. Furthermore, statistics is used to analyze the results of surveys and as a tool in scientific research to make decisions based on controlled experiments. Other uses of statistics include operations research, quality control, estimation, and prediction. There are several reasons why you should study statistics. Like professional people, you must be able to read and understand the various statistical studies performed in your fields. To have this understanding, you must be knowledgeable about the vocabulary, symbols, concepts, and statistical procedures used in these studies. You may be called on to conduct research in your field, since statistical procedures are basic to research. To accomplish this, you must be able to design experiments; collect, organize, analyze, and summarize data; and possibly make reliable predictions or forecasts for future use. You must also be able to communicate the results of the study in your own words. You can also use the knowledge gained from studying statistics to become better consumers and citizens. For example, you can make intelligent decisions about what products to purchase based on consumer studies, about government spending based on utilization studies, and so on. IV. DISCUSSION Preliminary Terminologies A variable is a characteristic or attribute that can assume different values. A population consists of all subjects (human or otherwise) that are being studied. A sample is a group of subjects selected from a population. A. Descriptive and Inferential Statistics Descriptive statistics consists of the collection, organization, summarization, and presentation of data. In descriptive statistics the statistician tries to describe a situation. Consider the national census conducted by the Philippine Statistics Authority. Results of this census gives the average age, income, and other characteristics of the Philippine population. To obtain this information, the agency must have some means to collect relevant data. Once data are collected, it must organize and summarize them. Finally, it needs a means of presenting the data in some meaningful form, such as charts, graphs, or tables. Inferential statistics consists of generalizing from samples to populations, performing estimations and hypothesis tests, determining relationships among variables, and making predictions. In inferential statistics, the statistician tries to make inferences from samples to populations. Inferential statistics uses probability, i.e., the chance of an event occurring. You may be familiar with the concepts of probability through various forms of gambling. If you play cards, dice, bingo, or lotteries, you win or lose according to the laws of probability. Probability theory is also used in the insurance industry and other areas. Hypothesis testing is a decision-making process for evaluating claims about a population, based on information obtained from the samples. For example, a researcher may wish to know if a new drug will reduce the number of heart attacks in men over age 70 years of age. For this study, two groups of men over age 70 would be selected. One group would be given the drug, and the other would be given a placebo (a substance with no medical benefits or harm). Later, the number of heart attacks occurring in each group of men would be counted, a statistical test would be run, and a decision would be made about the effectiveness of the drug. Statisticians also use statistics to determine relationships among variables. For example, relationships were the focus of the most noted study in the 20th century, “Smoking and Health,” published by the Surgeon General of the United States in 1964. He stated that after reviewing and evaluating the data, his group found a definite relationship between smoking and lung cancer. He did not say that cigarette smoking actually causes lung cancer, but that there is a relationship between smoking and lung cancer. This conclusion was based on a study done in 1958 by Hammond and Horn. In this study, 187,783 men were observed over a period of 45 months. The death rate from lung cancer in this group of volunteers was 10 times as great for smokers as for nonsmokers. Finally, by studying past and present data and conditions, statisticians try to make predictions based on this information. For example, a car dealer may look at past sales records for a specific month to decide what types of automobiles and how many of each type to order for that month next year. EXAMPLE: Determine whether descriptive or inferential statistics were used. 1. A study of 5000 diners found that people who used a menu printed with calorie counts for meals reduced their caloric consumption by 45 calories per meal. 2. One person in seven has diabetes and one-third of them don’t know that they have the disorder. 3. Twenty million Americans are living with chronic pain that interferes with their daily lives. 4. When 1350 children got booster shots or vaccines with unpleasant side effects, only 215 got the same reactions as they did when they got the first injections. SOLUTION 1. This is a descriptive statistic since it describes the results of a study using 5000 diners. 2. This is an inferential statistic since it is a generalization about a population. 3. This is an inferential statistic since it is a generalization about a population. 4. This is a descriptive statistic since it is based on the results of a sample of 1300 children. B. Variables and Types of Data Qualitative variables are variables that have distinct categories according to some characteristic or attribute. For example, if subjects are classified according to gender (male or female), then the variable gender is qualitative. Other examples of qualitative variables are religious preference and geographic locations. Quantitative variables are variables that can be counted or measured. For example, the variable age is numerical, and people can be ranked in order according to the value of their ages. Other examples of quantitative variables are heights, weights, and body temperatures. Quantitative variables can be further classified into two groups: discrete and continuous. Discrete variables can be assigned values such as 0, 1, 2, 3 and are said to be countable. Examples of discrete variables are the number of children in a family, the number of students in a classroom, and the number of calls received by a call center each day for a month. Discrete variables assume values that can be counted. Continuous variables, by comparison, can assume an infinite number of values in an interval between any two specific values. They are obtained by measuring. They often include fractions and decimals. Temperature, for example, is a continuous variable, since the variable can assume an infinite number of values between any two given temperatures. EXAMPLE: Classify each variable as a discrete or continuous variable. 1. The number of hours per day that children 6 to 12 years old reported that they played video games 2. The number of home runs a Major League player made each year of his career 3. The amount of money drivers spend on gasoline each week 4. The weights of the players on a hockey team SOLUTION 1. Continuous, since the variable time is measured 2. Discrete, since the number of home runs is counted 3. Discrete, since the smallest value that money can assume is in cents 4. Continuous, since the variable weight is measured Since continuous data must be measured, answers must be rounded because of the limits of the measuring device. Usually, answers are rounded to the nearest given unit. For example, heights might be rounded to the nearest inch, weights to the nearest ounce, etc. Hence, a recorded height of 73 inches could mean any measure from 72.5 inches up to but not including 73.5 inches. Thus, the boundary of this measure is given as 72.5–73.5 inches. The boundary of a number, then, is defined as a class in which a data value would be placed before the data value was rounded. Boundaries are written for convenience as 72.5–73.5 but are understood to mean all values up to but not including 73.5. Actual data values of 73.5 would be rounded to 74 and would be included in a class with boundaries of 73.5 up to but not including 74.5, written as 73.5–74.5. As another example, if a recorded weight is 86 pounds, the exact boundaries are 85.5 up to but not including 86.5, written as 85.5–86.5 pounds. Table below helps to clarify this concept. The boundaries of a continuous variable are given in one additional decimal place and always end with the digit 5. EXAMPLE: Find the boundaries for each measurement. 1. 32.4 feet 2. 86° Fahrenheit 3. 27.54 mg/dl SOLUTION 1. 32.35–32.45 feet 2. 85.5°–86.5° Fahrenheit 3. 27.535–27.545 mg/dl The type of classification, how variables are categorized, counted, or measured—uses measurement scales, and four common types of scales are used: nominal, ordinal, interval, and ratio. The nominal level of measurement classifies data into mutually exclusive (nonoverlapping) categories in which no order or ranking can be imposed on the data. The first level of measurement is called the nominal level of measurement. A sample of college instructors classified according to subject taught (e.g., English, history, psychology, or mathematics) is an example of nominal-level measurement. Classifying survey subjects as male or female is another example of nominal- level measurement. No ranking or order can be placed on the data. Classifying residents according to zip codes is also an example of the nominal level of measurement. Even though numbers are assigned as zip codes, there is no meaningful order or ranking. Other examples of nominal-level data are political party (Democratic, Republican, independent, etc.), religion (Christianity, Judaism, Islam, etc.), and marital status (single, married, divorced, widowed, separated). The ordinal level of measurement classifies data into categories that can be ranked; however, precise differences between the ranks do not exist. The next level of measurement is called the ordinal level. Data measured at this level can be placed into categories, and these categories can be ordered, or ranked. For example, from student evaluations, guest speakers might be ranked as superior, average, or poor. Floats in a homecoming parade might be ranked as first place, second place, etc. Note that precise measurement of differences in the ordinal level of measurement does not exist. For instance, when people are classified according to their build (small, medium, or large), a large variation exists among the individuals in each class. Other examples of ordinal data are letter grades (A, B, C, D, F). The interval level of measurement ranks data, and precise differences between units of measure do exist; however, there is no meaningful zero. The third level of measurement is called the interval level. This level differs from the ordinal level in that precise differences do exist between units. For example, many standardized psychological tests yield values measured on an interval scale. IQ is an example of such a variable. There is a meaningful difference of 1 point between an IQ of 109 and an IQ of 110. Temperature is another example of interval measurement, since there is a meaningful difference of 1°F between each unit, such as 72 and 73°F. One property is lacking in the interval scale: There is no true zero. For example, IQ tests do not measure people who have no intelligence. For temperature, 0°F does not mean no heat at all. The ratio level of measurement possesses all the characteristics of interval measurement, and there exists a true zero. In addition, true ratios exist when the same variable is measured on two different members of the population. The final level of measurement is called the ratio level. Examples of ratio scales are those used to measure height, weight, area, and number of phone calls received. Ratio scales have differences between units (1 inch, 1 pound, etc.) and a true zero. In addition, the ratio scale contains a true ratio between values. For example, if one person can lift 200 pounds and another can lift 100 pounds, then the ratio between them is 2 to 1. Put another way, the first person can lift twice as much as the second person. Note: There is not complete agreement among statisticians about the classification of data into one of the four categories. For example, some researchers classify IQ data as ratio data rather than interval. Also, data can be altered so that they fit into a different category. For instance, if the incomes of all professors of a college are classified into the three categories of low, average, and high, then a ratio variable becomes an ordinal variable. EXAMPLE: What level of measurement would be used to measure each variable? 1. The ages of the instructors at your college 2. The occupations of students who work part time after school 3. The lowest night-time temperatures in December in a large city 4. The ratings of medical doctors at a hospital SOLUTION 1. Ratio 2. Nominal 3. Interval 4. Ordinal C. Data Collection and Sampling Techniques Data can be collected in a variety of ways. One of the most common methods is through the use of surveys. Surveys can be done by using a variety of methods. Three of the most common methods are the telephone survey, the mailed questionnaire, and the personal interview. Telephone surveys have an advantage over personal interview surveys in that they are less costly. Also, people may be more candid in their opinions since there is no face-to-face contact. A major drawback to the telephone survey is that some people in the population will not have phones or will not answer when the calls are made; hence, not all people have a chance of being surveyed. Also, many people now have unlisted numbers and cell phones, so they cannot be surveyed unless the way survey participants are chosen would include unlisted numbers and cell phone numbers. Finally, even the tone of voice of the interviewer might influence the response of the person who is being interviewed. Mailed questionnaire surveys can be used to cover a wider geographic area than telephone surveys or personal interviews since mailed questionnaire surveys are less expensive to conduct. Also, respondents can remain anonymous if they desire. Disadvantages of mailed questionnaire surveys include a low number of responses and inappropriate answers to questions. Another drawback is that some people may have difficulty reading or understanding the questions. Personal interview surveys have the advantage of obtaining in-depth responses to questions from the person being interviewed. One disadvantage is that interviewers must be trained in asking questions and recording responses, which makes the personal interview survey more costly than the other two survey methods. Another disadvantage is that the interviewer may be biased in the interviewer’s selection of respondents. Data can also be collected in other ways, such as surveying records or direct observation of situations. Researchers use samples to collect data and information about a particular variable from a large population. Using samples saves time and money and in some cases enables the researcher to get more detailed information about a particular subject. Remember, samples cannot be selected in haphazard ways because the information obtained might be biased. For example, interviewing people on a street corner during the day would not include responses from people working in offices at that time or from people attending school; hence, not all subjects in a particular population would have a chance of being selected. To obtain samples that are unbiased—i.e., that give each subject in the population an equally likely chance of being selected—statisticians use four basic methods of sampling: random, systematic, stratified, and cluster sampling. Random Sampling A random sample is a sample in which all members of the population have an equal chance of being selected. Random samples are selected by using chance methods or random numbers. One such method is to number each subject in the population. Then place numbered cards in a bowl, mix them thoroughly, and select as many cards as needed. The subjects whose numbers are selected constitute the sample. Since it is difficult to mix the cards thoroughly, there is a chance of obtaining a biased sample. For this reason, statisticians use another method of obtaining numbers. They generate random numbers with a computer or calculator. Before the invention of computers, random numbers were obtained from tables. Below is an example of Random Numbers. Systematic Sampling A systematic sample is a sample obtained by selecting every kth member of the population where k is a counting number. Researchers obtain systematic samples by numbering each subject of the population and then selecting every kth subject. For example, suppose there were 2000 subjects in the population and a sample of 50 subjects was needed. Since 2000 ÷ 50 = 40, then k = 40, and every 40th subject would be selected; however, the first subject (numbered between 1 and 40) would be selected at random. Suppose subject 12 were the first subject selected; then the sample would consist of the subjects whose numbers were 12, 52, 92, etc., until 50 subjects were obtained. When using systematic sampling, you must be careful about how the subjects in the population are numbered. If subjects were arranged in a manner such as wife, husband, wife, husband, and every 40th subject were selected, the sample would consist of all husbands. Numbering is not always necessary. For example, a researcher may select every 10th item from an assembly line to test for defects. Systematic sampling has the advantage of selecting subjects throughout an ordered population. This sampling method is fast and convenient if the population can be easily numbered. Stratified Sampling A stratified sample is a sample obtained by dividing the population into subgroups or strata according to some characteristic relevant to the study. (There can be several subgroups.) Then subjects are selected at random from each subgroup. Samples within the strata should be randomly selected. For example, suppose the president of a two-year college wants to learn how students feel about a certain issue. Furthermore, the president wishes to see if the opinions of first-year students differ from those of second-year students. Cluster Sampling A cluster sample is obtained by dividing the population into sections or clusters and then selecting one or more clusters at random and using all members in the cluster(s) as the members of the sample. In here, the population is divided into groups or clusters by some means such as geographic area or schools in a large school district. Then the researcher randomly selects some of these clusters and uses all members of the selected clusters as the subjects of the samples. Suppose a researcher wishes to survey apartment dwellers in a large city. If there are 10 apartment buildings in the city, the researcher can select at random 2 buildings from the 10 and interview all the residents of these buildings. Cluster sampling is used when the population is large or when it involves subjects residing in a large geographic area. For example, if one wanted to do a study involving the patients in the hospitals in Urdaneta City, it would be very costly and time‑consuming to try to obtain a random sample of patients since they would be spread over a large area. Instead, a few hospitals could be selected at random, and the patients in these hospitals would be interviewed in a cluster. Note: The main difference between stratified sampling and cluster sampling is that although in both types of sampling the population is divided into groups, the subjects in the groups for stratified sampling are more or less homogeneous, that is, they have similar characteristics, while the subjects in the clusters form “miniature populations.” That is, they vary in characteristics as does the larger population. For example, if a researcher wanted to use the class of first-year students at a university as the population, the researcher might use a class of students in a first-year orientation class as a cluster sample. If using a stratified sample, the researcher would need to divide the first-year students into groups according to their major field, sex, age, etc., or other samples from each group. Cluster samples save the researcher time and money, but the researcher must be aware that sometimes a cluster does not represent the population. Below is the summary of the four basic sampling methods. Other Sampling Methods In convenience sample, researcher uses subjects who are convenient. For example, the researcher may interview subjects entering a local mall to determine the nature of their visit or perhaps what stores they will be patronizing. This sample is probably not representative of the general customers for several reasons. For one thing, it was probably taken at a specific time of day, so not all customers entering the mall have an equal chance of being selected since they were not there when the survey was being conducted. But convenience samples can be representative of the population. If the researcher investigates the characteristics of the population and determines that the sample is representative, then it can be used. Another type of sample that is used in statistics is a volunteer sample or self-selected sample. Here respondents decide for themselves if they wish to be included in the sample. For example, a radio station in Urdaneta City asks a question about a situation and then asks people to call one number if they agree with the action taken or call another number if they disagree with the action. The results are then announced at the end of the day. Note that most often, only people with strong opinions will call. The station does explain that this is not a “scientific poll.” Since samples are not perfect representatives of the populations from which they are selected, there is always some error in the results. This error is called a sampling error. Sampling error is the difference between the results obtained from a sample and the results obtained from the population from which the sample was selected. For example, suppose you select a sample of full-time students at your college and find 56% are female. Then you go to the admissions office and get the genders of all full-time students that semester and find that 54% are female. The difference of 2% is said to be due to sampling error. There is another error that occurs in statistics called nonsampling error. A nonsampling error occurs when the data are obtained erroneously or the sample is biased, i.e., nonrepresentative. For example, data could be collected by using a defective scale. Each weight might be off by, say, 2 pounds. Also, recording errors can be made. Perhaps the researcher wrote an incorrect data value. Caution and vigilance should be used when collecting data. EXAMPLE: State which sampling method was used. 1. Out of 14 banks in a city, a researcher selects one bank and records the number of savings deposits made in a one-day period. 2. A researcher divides a group of students who are majoring in criminal justice as male or female, and then divides the group further as first-years, sophomores, juniors, or seniors. Then 8 students from each group are given a survey to answer questions about the program. 3. A researcher numbers the subscribers to a movie rental business, selects 1000 subscribers using random numbers, and records the number of movies rented by each for the month of January. 4. On an assembly line, every 20th automobile is selected and checked for paint defects on the car. The purpose is to ensure that the painting apparatus is working properly.your college The occupations of students who work part time after school. SOLUTION 1. Cluster 2. Stratified 3. Random 4. Systematic D. Experimental Design There are several different ways to classify statistical studies. This section explains two types of studies: observational studies and experimental studies. In an observational study, the researcher merely observes what is happening or what has happened in the past and tries to draw conclusions based on these observations. There are three main types of observational studies. When all the data are collected at one time, the study is called a cross- sectional study. When the data are collected using records obtained from the past, the study is called a retrospective study. Finally, if the data are collected over a period of time, say, past and present, the study is called a longitudinal study. Observational studies have advantages and disadvantages. Advantages: It usually occurs in a natural setting. For example, researchers can observe people’s driving patterns on streets and highways in large cities. It can be done in situations where it would be unethical or downright dangerous to conduct an experiment. Using observational studies, researchers can study suicides, rapes, murders, etc. These can be done using variables that cannot be manipulated by the researcher, such as drug users versus nondrug users and right-handedness versus left‑handedness. Disadvantages: Since the variables are not controlled by the researcher, a definite cause-and- effect situation cannot be shown since other factors may have had an effect on the results. These can be expensive and time-consuming. For example, if one wanted to study the habitat of lions in Africa, one would need a lot of time and money, and there would be a certain amount of danger involved. Since the researcher may not be using his or her own measurements, the results could be subject to the inaccuracies of those who collected the data. For example, if the researchers were doing a study of events that occurred in the 1800s, they would have to rely on information and records obtained by others from a previous era. There is no way to ensure the accuracy of these records. In an experimental study, the researcher manipulates one of the variables and tries to determine how the manipulation influences other variables. For example, a study conducted at Virginia Polytechnic Institute and presented in Psychology Today divided female undergraduate students into two groups and had the students perform as many sit-ups as possible in 90 seconds. The first group was told only to “Do your best,” while the second group was told to try to increase the actual number of sit-ups done each day by 10%. After four days, the subjects in the group who were given the vague instructions to “Do your best” averaged 43 sit-ups, while the group that was given the more specific instructions to increase the number of sit-ups by 10% averaged 56 sit-ups by the last day’s session. The conclusion then was that athletes who were given specific goals performed better than those who were not given specific goals. This study is an example of a statistical experiment since the researchers intervened in the study by manipulating one of the variables, namely, the type of instructions given to each group. In a true experimental study, the subjects should be assigned to groups randomly. Also, the treatments should be assigned to the groups at random. In the sit-up study, the article did not mention whether the subjects were randomly assigned to the groups. Sometimes when random assignment is not possible, researchers use intact groups. These types of studies are done quite often in education where already intact groups are available in the form of existing classrooms. When these groups are used, the study is said to be a quasi-experimental study. The treatments, though, should be assigned at random. Most articles do not state whether random assignment of subjects was used. Statistical studies usually include one or more independent variables and one dependent variable. The independent variable in an experimental study is the one that is being manipulated by the researcher. The independent variable is also called the explanatory variable. The resultant variable is called the dependent variable or the outcome variable. The outcome variable is the variable that is studied to see if it has changed significantly because of the manipulation of the independent variable. For example, in the sit-up study, the researchers gave the groups two different types of instructions, general and specific. Hence, the independent variable is the type of instruction. The dependent variable, then, is the resultant variable, that is, the number of sit-ups each group was able to perform after four days of exercise. If the differences in the dependent or outcome variable are large and other factors are equal, these differences can be attributed to the manipulation of the independent variable. In this case, specific instructions were shown to increase athletic performance. In the sit-up study, there were two groups. The group that received the special instruction is called the treatment group while the other is called the control group. The treatment group receives a specific treatment (in this case, instructions for improvement) while the control group does not. Experimental studies have the advantage that the researcher can decide how to select subjects and how to assign them to specific groups. The researcher can also control or manipulate the independent variable. For example, in studies that require the subjects to consume a certain amount of medicine each day, the researcher can determine the precise dosages and, if necessary, vary the dosage for the groups. There are several disadvantages to experimental studies. First, they may occur in unnatural settings, such as laboratories and special classrooms. This can lead to several problems. One such problem is that the results might not apply to the natural setting. The age-old question then is, “This mouthwash may kill 10,000 germs in a test tube, but how many germs will it kill in my mouth?” Another disadvantage with an experimental study is the Hawthorne effect. This effect was discovered in 1924 in a study of workers at the Hawthorne plant of the Western Electric Company. In this study, researchers found that the subjects who knew they were participating in an experiment actually changed their behavior in ways that affected the results of the study. Another problem when conducting statistical studies is called confounding of variables or lurking variables. A confounding variable is one that influences the dependent or outcome variable but was not separated from the independent variable. Researchers try to control most variables in a study, but this is not possible in some studies. For example, subjects who are put on an exercise program might also improve their diet unbeknownst to the researcher and perhaps improve their health in other ways not due to exercise alone. Then diet becomes a confounding variable. Another factor that can influence statistical experiments is called the placebo effect. In here, the subjects used in the study respond favorably or show improvement due to the fact that they had been selected for the study. They could also be reacting to clues given unintentionally by the researchers. For example, in a study on knee pain done at the Sacred Heart Hospital, researchers divided 180 patients into three groups. Two groups had surgery to remove damaged cartilage while those in the third group had simulated surgery. After two years, an equal number of patients in each group reported that they felt better after the surgery. Those patients who had simulated surgery were said to be responding to what is called the placebo effect. To minimize the placebo effect: In blinding, the subjects do not know whether they are receiving an actual treatment or a placebo. Many times researchers use a sugar pill that looks like a real medical pill. Often double blinding is used. Here both the subjects and the researchers are not told which groups are given the placebos. Researchers use blocking to minimize variability when they suspect that there might be a difference between two or more blocks. For example, in the sit-up study mentioned earlier, if we think that men and women would respond differently to “Do your best” versus “Increase by 10% every day,” we would divide the subjects into two blocks (men,women) and then randomize which subjects in each block get the treatment. When subjects are assigned to groups randomly, and the treatments are assigned randomly, the experiment is said to be a completely randomized design. Some experiments use what is called a matched-pair design. Here one subject is assigned to a treatment group, and another subject is assigned to a control group. But, before the assignment, subjects are paired according to certain characteristics. In earlier years, studies used identical twins, assigning one twin to one group and the other twin to another group. Subjects can be paired on any characteristics such as ages, heights, and weights. Another way to validate studies is to use replication. Here the same experiment is done in another part of the country or in another laboratory. The same study could also be done using adults who are not going to college instead of using college students. Then the results of the second study are compared to the ones in the original study to see if they are the same. The purpose of a statistical study is to gain and process information obtained from the study in order to answer specific questions about the subject being investigated. Statistical researchers use a specific procedure to do statistical studies to obtain valid results. Research Report Writing 1. Formulate the purpose of the study. 2. Identify the variables for the study. 3. Define the population. 4. Decide what sampling method you will use to collect the data. 5. Collect the data. 6. Summarize the data and perform any statistical calculations needed. 7. Interpret the results. EXAMPLE: Experimental Design Researchers randomly assigned 10 people to each of three different groups. Group 1 was instructed to write an essay about the hassles in their lives. Group 2 was instructed to write an essay about circumstances that made them feel thankful. Group 3 was asked to write an essay about events that they felt neutral about. After the exercise, they were given a questionnaire on their outlook on life. The researchers found that those who wrote about circumstances that made them feel thankful had a more optimistic outlook on life. The conclusion is that focusing on the positive makes you more optimistic about life in general. Based on this study, answer the following questions. 1. Was this an observational or experimental study? 2. What is the independent variable? 3. What is the dependent variable? 4. What may be a confounding variable in this study? 5. What can you say about the sample size? SOLUTION 1. This is an experimental study since the variables (types of essays written) were manipulated. 2. The independent variable was the type of essay the participants wrote. 3. The dependent variable was the score on the life outlook questionnaire. 4. Other factors, such as age, upbringing, and income, can affect the results; however, the random assignment of subjects is helpful in eliminating these factors. 5. In this study, the sample uses 30 participants total. Uses and Misuses of Statistics There is another aspect of statistics, and that is the misuse of statistical techniques to sell products that don’t work properly, to attempt to prove something true that is really not true, or to get our attention by using statistics to evoke fear, shock, and outrage. Two sayings that have been around for a long time illustrate this point: “There are three types of lies—lies, damn lies, and statistics.” “Figures don’t lie, but liars figure.” Just because we read or hear the results of a research study or an opinion poll in the media, this does not mean that these results are reliable or that they can be applied to any and all situations. For example, reporters sometimes leave out critical details such as the size of the sample used or how the research subjects were selected. Without this information, you cannot properly evaluate the research and properly interpret the conclusions of the study or survey. Here are some ways that statistics can be misrepresented. 1. Suspect Samples Sometimes researchers use very small samples to obtain information How the subjects in the sample were selected. As stated previously, studies Sample that may not be a representative (convenience sample) Note: When results are interpreted from studies using small samples, convenience samples, or volunteer samples, care should be used in generalizing the results to the entire population. 2. Ambiguous Averages There are four commonly used measures that are loosely called averages. They are the mean, median, mode, and midrange. For the same data set, these averages can differ markedly. People who know this can, without lying, select the one measure of average that lends the most evidence to support their position. 3. Changing the Subject Another type of statistical distortion can occur when different values are used to represent the same data. For example, one political candidate who is running for reelection might say, “During my administration, expenditures increased a mere 3%.” His opponent, who is trying to unseat him, might say, “During my opponent’s administration, expenditures have increased a whopping $6,000,000.” Here both figures are correct; however, expressing a 3% increase as $6,000,000 makes it sound like a very large increase. 4. Detached Statistics A claim that uses a detached statistic is one in which no comparison is made. For example, you may hear a claim such as “Our brand of crackers has one-third fewer calories.” Here, no comparison is made. One-third fewer calories than what? Another example is a claim that uses a detached statistic such as “Brand A aspirin works four times faster.” Four times faster than what? When you see statements such as this, always ask yourself, Compared to what? 5. Implied Connections Many claims attempt to imply connections between variables that may not actually exist. “Eating fish may help to reduce your cholesterol.” Notice the words may help. There is no guarantee that eating fish will definitely help you reduce your cholesterol. “Studies suggest that using our exercise machine will reduce your weight.” The word suggest is used; and again, there is no guarantee that you will lose weight by using the exercise machine advertised. “Taking calcium will lower blood pressure in some people.” Note the word some is used. You may not be included in the group of “some” people. Be careful when you draw conclusions from claims that use words such as may, in some people, and might help. 6. Misleading Graphs Statistical Graphs give a visual representation of data that enables viewers to analyze and interpret data more easily than by simply looking at numbers. However, if graphs are drawn inappropriately, they can misrepresent the data and lead the reader to draw false conclusions. 7. Faulty Survey Questions When analyzing the results of a survey using questionnaires, you should be sure that the questions are properly written since the way questions are phrased can often influence the way people answer them. For example, the responses to a question such as “Do you feel that the North Huntingdon School District should build a new football stadium?” might be answered differently than a question such as “Do you favor increasing school taxes so that the North Huntingdon School District can build a new football stadium?” Each question asks something a little different, and the responses could be radically different. When you read and interpret the results obtained from questionnaire surveys, watch out for some of these common mistakes made in the writing of the survey questions. In summary then, statistics, when used properly, can be beneficial in obtaining much information, but when used improperly, can lead to much misinformation. It is like your automobile. If you use your automobile to get to school or work or to go on a vacation, that’s good. But if you use it to run over your neighbor’s dog because it barks all night long and tears up your flower garden, that’s not so good! E. Computers and Calculators To demo installation of Excel and Jamovi… V. ASSESSMENT 1. Read the following on attendance and grades, and answer the questions. A study conducted at Manatee Community College revealed that students who attended class 95 to 100% of the time usually received an A in the class. Students who attended class 80 to 90% of the time usually received a B or C in the class. Students who attended class less than 80% of the time usually received a D or an F or eventually withdrew from the class. Based on this information, attendance and grades are related. The more you attend class, the more likely it is you will receive a higher grade. If you improve your attendance, your grades will probably improve. Many factors affect your grade in a course. One factor that you have considerable control over is attendance. You can increase your opportunities for learning by attending class more often. a. What are the variables under study? b. What are the data in the study? c. Are descriptive, inferential, or both types of statistics used? d. What is the population under study? e. Was a sample collected? If so, from where? f. From the information given, comment on the relationship between the variables. 2. Read the following information about the number of fatal accidents for the transportation industry in a specific year, and answer each question. a. Name the variables under study. b. Categorize each variable as quantitative or qualitative. c. Categorize each quantitative variable as discrete or continuous. d. Identify the level of measurement for each variable. e. The railroad had the fewest fatalities for the specific year. Does that mean railroads have fewer accidents than the other industries? f. What factors other than safety influence a person’s choice of transportation? g. From the information given, comment on the relationship between the variables. VI. REFERENCE/S Bluman, A.G. (2013). Elementary Statistics: A Step by Step Approach, 11th Edition. McGraw Hill International. Triola, M. (2010) Elementary Statistics. 11th Edition, Addison-Wesley/Pearson Education, Boston. Spiegel, Murray R and Stephens, Larry J.. Schaum's Outline of Statistics. McGraw-Hill, 2007